Category: Uncategorized

Install Ruby and Set Up RVM Programming Environment on Ubuntu 22.04

Introduction

Installing Ruby and setting up the RVM programming environment on Ubuntu is a key step in building a solid development foundation. Ruby, a dynamic programming language, works seamlessly on Ubuntu when paired with RVM (Ruby Version Manager), which simplifies managing multiple Ruby versions. Whether you’re a beginner or familiar with Ubuntu, this guide will walk you through the entire process, from installation to running your first “Hello, World” Ruby program. With this setup, you’ll have a powerful environment for exploring Ruby development and creating robust applications. In this article, we’ll show you how to get Ruby and RVM running smoothly on your Ubuntu 22.04 system.

What is Ruby Version Manager (RVM)?

Ruby Version Manager (RVM) is a tool that helps you easily install and manage different versions of Ruby on your system. It simplifies the process of setting up a Ruby environment, ensuring all necessary libraries are installed, and makes it easy to switch between versions of Ruby if needed.

Step 1 — Using the Terminal

To install Ruby, you’ll be using the command line interface (CLI). The command line is basically a text-only way to interact with your computer, instead of clicking buttons or using a mouse like you do with the graphical user interface (GUI). Instead of clicking on icons and menus, you type commands as text and get feedback in text form. This “non-graphical” interface, also known as a shell, is super powerful because it lets you control your computer directly. It also helps you automate tasks that you do all the time, like moving files or running programs. It’s an essential tool for developers, and once you get the hang of it, you’ll see just how much more efficient it is for managing your development environment. With the shell, you can run commands to do everything from simple file operations to configuring your system.

If you’ve never used the command line before, don’t worry—it might seem a little intimidating at first, but you’ll get comfortable with it pretty quickly. If you need help getting started, you can look up some basic terminal commands or even check out a beginner guide like “An Introduction to the Linux Terminal.” In this tutorial, we’ll walk you through setting up Ruby on your system using the command line on Ubuntu, specifically through installing RVM (Ruby Version Manager). RVM is a great tool that makes managing Ruby versions and dependencies super simple.

Now that your Ubuntu server is up and running, you’ll need to dive into the terminal to install RVM. This tool will take care of installing Ruby, along with all the libraries and dependencies Ruby needs to run. It’ll make getting Ruby up and running on your system way easier.

Make sure to follow each command carefully as the setup process can be a bit tricky for beginners.

$ sudo apt-get update

Once your system is updated, you can proceed with installing RVM.

$ sudo apt-get install curl gpg

$ curl -sSL https://get.rvm.io | bash -s stable

# Output after successful installation of RVM
% rvm is a function
% rvm 1.29.12 (latest stable) installed

Once RVM is installed, you can verify it with:

$ rvm –version

This will display the version of RVM installed on your system.

For more insights on working with the command line, check out this detailed guide on command-line tools for beginners.

Step 2 — Installing RVM and Ruby

RVM (Ruby Version Manager) automates the process of setting up a Ruby environment on your Ubuntu system. This tool makes it easier to install Ruby and manage different versions of Ruby on your system. To get started, we need to install RVM, which will, in turn, install Ruby and its prerequisites.

The most efficient way to install Ruby with RVM is to run the installation script available on the official RVM website. First, you’ll need to use the gpg command to contact a public key server and request the RVM project’s key. This key is used to sign each RVM release, allowing you to verify the legitimacy of the release before downloading it. From your home directory, execute the following command:

$ gpg –keyserver hkp://pool.sks-keyservers.net –recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3 7D2BAF1CF37B13E2069D6956105BD0E739499BDB

If the command above fails, you can try the following alternative commands:

$ curl -sSL https://rvm.io/mpapis.asc | gpg2 –import

$ curl -sSL https://rvm.io/pkuczynski.asc | gpg2 –import

Next, we’ll use curl to download the RVM installation script. If curl is not installed on your system, you can install it by running the following command:

$ sudo apt-get install curl

This will prompt you for your password. When you type your password, it will not be displayed on the screen as a security measure, but rest assured that your keystrokes are being recorded. After entering your password, press ENTER to proceed with the installation of curl. Once installed, use the following command to download the RVM installation script:

$ curl -sSL https://get.rvm.io -o rvm.sh

Now, let’s review the flags used with curl. The -s or --silent flag suppresses the progress meter, so it doesn’t display unnecessary output. The -S or --show-error flag ensures that any errors encountered during the download process are displayed. The -L or --location flag tells curl to follow any redirects automatically, in case the server redirects the request to a different location.

Once the rvm.sh script is downloaded, you can review its contents before executing it by running:

$ less rvm.sh

Use the arrow keys to scroll through the file and press q to exit once you are done. Once you’re satisfied with the script’s contents, you can execute it by running:

$ cat rvm.sh | bash -s stable

This script will create a new directory called .rvm in your home directory. This directory will contain Ruby and all of its related components. Additionally, the script modifies your .bashrc file to add the .rvm/bin directory to your system’s PATH environment variable, making the rvm command available in the terminal. However, this modification won’t take effect in your current terminal session, so you’ll need to reload the terminal configuration by running:

$ source ~/.rvm/scripts/rvm

Now, you can use RVM to install the latest stable version of Ruby by running:

$ rvm install ruby –default

This command will download and install Ruby along with its necessary components. It will also set the installed version of Ruby as the default for your system, ensuring that any existing Ruby installations do not conflict with the new version. Keep in mind that the installation process might take some time, depending on your system’s specifications and internet speed.

If you encounter any issues during the installation, make sure you have Homebrew installed. Homebrew is a package manager that can help resolve missing dependencies. To install Homebrew, run:

$ /bin/bash -c “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)”

During the installation, RVM will automatically fetch any missing prerequisites and install them as needed. It may prompt you for your password during this process.

Once the prerequisites are satisfied, RVM will begin downloading and installing Ruby. You will see output similar to the following:

ruby-2.4.0 – #configure ruby-2.4.0 – #download % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 16.4M 100 16.4M 0 0 4828k 0 0:00:03 0:00:03 –:–:– 4829k

After Ruby has been successfully installed, you can verify the installation by checking the Ruby version with:

$ ruby -v

This command will output the specific version of Ruby you installed. For example:

ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]

Lastly, to ensure that RVM automatically uses the newly installed version of Ruby each time you open a terminal window, you’ll need to make a slight adjustment to your system. RVM modifies the .bash_profile file, but this file is only invoked on login shells. So, you must ensure that your terminal opens as a login shell to apply the changes.

Once these steps are completed, Ruby will be ready to use on your system.

For additional details on managing Ruby environments and dependencies, explore this comprehensive guide on Installing Ruby and Managing Versions with RVM.

Step 3 — Creating a Simple Program

Now, let’s create a basic “Hello, World” program in Ruby. This simple exercise serves as a way to test that your Ruby environment is set up correctly and functioning as expected. It also gives you the chance to get comfortable with writing and running Ruby programs.

To begin, you’ll need to create a new file, which we’ll call hello.rb, using the text editor nano. You can do this by typing the following command in your terminal:

$ nano hello.rb

This will open the nano editor. In the editor, type the following Ruby code:

puts “Hello, World!”

This line of code tells Ruby to output the string “Hello, World!” to the screen when the program is executed. After typing the code, you can exit the editor by pressing CTRL+X. When prompted to save the changes, press Y to confirm, and then press ENTER to save the file with the name hello.rb.

Next, you can run the program by typing the following command in your terminal:

$ ruby hello.rb

Once executed, the program will run and display the following output on the screen:

Hello, World!

This output indicates that your Ruby development environment is functioning correctly. The program successfully executed, and you are now ready to explore more advanced Ruby programming techniques. This simple exercise lays the foundation for working with Ruby, and you can now begin creating larger and more complex projects using this environment.

For more information on starting with Ruby and basic program creation, check out this guide on learning Ruby with practical examples.

Conclusion

In conclusion, setting up Ruby and RVM on Ubuntu 22.04 is an essential step for any developer looking to dive into Ruby programming. By following this step-by-step guide, you’ll have Ruby installed and ready to go in no time, along with the powerful RVM tool to manage multiple Ruby versions. Whether you’re a beginner or experienced with Ubuntu, this guide ensures a smooth installation process and a solid foundation for future Ruby development projects. With Ruby and RVM set up on your system, you can now explore a wide range of programming possibilities and start building powerful applications. Stay tuned for updates on the latest Ruby versions and tools that make programming even more efficient!

Set Up Ruby Development Environment on Windows 10 with WSL and RVM

October 21, 2025
Set Up Multi-Node Kafka Cluster with KRaft Consensus Protocol
Introduction

Setting up a multi-node Kafka cluster with the KRaft consensus protocol is an essential step for building scalable, fault-tolerant data streams. Kafka, known for its high-throughput messaging system, benefits greatly from the KRaft protocol, which eliminates the need for Apache ZooKeeper. This tutorial will walk you through configuring Kafka nodes, connecting them to the cluster, and managing topics and partitions to ensure data availability and resiliency. Whether you’re producing or consuming messages, understanding how to simulate node failures and migrate data is key to maintaining a robust Kafka architecture. In this guide, we’ll show you how to implement these strategies effectively using Kafka and KRaft.

What is Kafka Cluster?

A Kafka cluster is a system of interconnected servers designed to process real-time data streams. It allows the creation of topics where data is stored in partitions, ensuring high availability, scalability, and fault tolerance. The system can manage and organize data flow between producers and consumers, making it suitable for handling large volumes of data with minimal downtime.

Step 1 – Configuring Kafka Nodes

In this step, you’ll set up the three Kafka servers you created earlier to be part of the same KRaft cluster. With KRaft, the nodes handle their own organization and perform admin tasks without needing Apache ZooKeeper. This setup makes everything much more efficient and scalable.

Configuring the First Node

Let’s start by setting up the first node. First, stop the Kafka service on the first cloud server by running the following command:

$ sudo systemctl stop kafka

Next, log in as the kafka user and head to the directory where Kafka is installed. To start editing the Kafka configuration file, run:

$ vi /config/kraft/server.properties

Once you’re in the file, look for these lines:

# The role of this server. Setting this puts us in KRaft mode
process.roles=broker,controller
# The node id associated with this instance’s roles
node.id=1
# The connect string for the controller quorum
controller.quorum.voters=1@localhost:9092

These three parameters are what set up your Kafka node to act as both a broker (responsible for receiving and consuming data) and a controller (responsible for admin tasks). This setup is really handy in larger Kafka deployments where it’s best to keep controllers separate for efficiency and redundancy.

The node.id defines the unique ID for this node within the cluster. Since this is the first node, it’s set to 1. It’s important that each node has a unique ID, so the second and third nodes will have the IDs 2 and 3.

The controller.quorum.voters line ties each node ID to the corresponding address and port for communication. Update this line so that all three nodes are aware of each other. Your updated line should look like this:

[email protected]_domain:9093,[email protected]_domain:9093,[email protected]_domain:9093

Don’t forget to replace your_domain with your actual domain address from the earlier setup steps.

Next, find and update the following lines to specify the listeners and the addresses that Kafka will use to communicate with clients:

listeners=PLAINTEXT://:9092,CONTROLLER://:9093
# Name of listener used for communication between brokers.
inter.broker.listener.name=PLAINTEXT
# Listener name, hostname, and port the broker will advertise to clients. If not set, it uses the value for “listeners”.
advertised.listeners=PLAINTEXT://localhost:9092

Here’s the deal:
- listeners defines where the Kafka node listens for incoming connections.
- advertised.listeners specifies the addresses clients should use to connect.
This setup lets you control which address clients actually use to connect, even though the server may listen on different addresses.

Update these lines to look like this (again, replace your_domain with your actual domain):

listeners=PLAINTEXT://kafka1.your_domain:9092,CONTROLLER://kafka1.your_domain:9093
inter.broker.listener.name=PLAINTEXT
advertised.listeners=PLAINTEXT://kafka1.your_domain:9092

Since this node will be part of a cluster, you’ll explicitly set the addresses to point to the current server (cloud instance).

Now, find the num.partitions setting, which sets the default number of log partitions per topic. More partitions allow for better parallelism in consumption but also increase the number of files across brokers. By default, this is set to 1, but since you have three nodes, set it to a multiple of two:

num.partitions=6

This value of 6 ensures that each node holds two topic partitions.

Next, configure the replication factor for internal topics like consumer offsets and transaction states. Look for these lines:

offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1

Set them to:

offsets.topic.replication.factor=2
transaction.state.log.replication.factor=2

This ensures that at least two nodes will be in sync for managing internal metadata.

Once you’ve updated these lines, save and close the configuration file.

Reinitializing Log Storage

After setting the default partition number and replication factor, you need to reinitialize the log storage. First, remove the existing log files by running:

$ rm -rf /home/kafka/kafka-logs/*

Next, generate a new cluster ID and store it in an environment variable:

$ KAFKA_CLUSTER_ID=”$(bin/kafka-storage.sh random-uuid)”

You can display the cluster ID by running:

$ echo $KAFKA_CLUSTER_ID

The output should look like this:

OutputMjj4bch9Q3-B0TEXv8_zPg

Note down this ID because you’ll need it to configure the second and third nodes.

Finally, format the log storage with the new cluster ID by running:

$ ./bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties

The output will be something like:

Output… Formatting /home/kafka/kafka-logs with metadata.version 3.7-IV4

Configuring the Second and Third Nodes

Setting up the second and third nodes is pretty much the same as setting up the first one. Just remember to use a unique node.id for each node. For the second node, set node.id=2, and for the third node, set node.id=3.

Don’t forget to update the listeners and advertised.listeners settings for each node to point to the correct server.

When regenerating the log storage, reuse the cluster ID from the first node:

$ KAFKA_CLUSTER_ID=”your_cluster_id”

Once you’ve made all the changes, start the Kafka service on all three nodes by running:

$ sudo systemctl start kafka

And that’s it! You’ve successfully configured the three Kafka nodes to be part of the same KRaft cluster. Now you can create topics and start producing and consuming messages across your cluster.

Read more about Kafka node configuration and setup in this guide How to Set Up Apache Kafka on Ubuntu 20.04.

Step 2 – Connecting to the Cluster

In this step, you’ll connect to the Kafka cluster using the shell scripts that come with Kafka. You’ll also create a topic, send some messages, and consume data from the cluster. Plus, there’s a cool part where we simulate a node failure to see how Kafka handles this and keeps your data available.

Kafka has this handy script called kafka-metadata-quorum.sh, which gives you a detailed snapshot of your cluster and its members. To run it, just use the following command:

./bin/kafka-metadata-quorum.sh –bootstrap-controller kafka1.your_domain:9093 describe –status

Here’s the thing: you’re connecting to one of the Kafka nodes using port 9093. This port is for the controller (not the broker, by the way). Make sure you replace kafka1.your_domain with the actual domain that points to one of your Kafka nodes. After running the command, you should see something like this:

ClusterId: G3TeIZoeTSCvG2YOWvPE2w
LeaderId: 3
LeaderEpoch: 2
HighWatermark: 383
MaxFollowerLag: 0
MaxFollowerLagTimeMs: 55
CurrentVoters: [1,2,3]
CurrentObservers: []

This output gives you a quick look at the cluster’s state. For example, node 3 is elected as the leader, and all three nodes—1, 2, and 3—are in the voting pool. They’re all in agreement about who the leader is.

Now that we’ve got the basics out of the way, let’s create a topic called first-topic. You can do this by running:

./bin/kafka-topics.sh –create –topic first-topic –bootstrap-server kafka1.your_domain:9092 –replication-factor 2

Once it’s created, you’ll see this output:

Created topic first-topic.

To check how the partitions are distributed across the nodes, run this command:

./bin/kafka-topics.sh –describe –bootstrap-server kafka1.your_domain:9092 –topic first-topic

Setting the replication-factor to 2 means the topic will be replicated on at least two nodes. This ensures redundancy and fault tolerance. The output will look something like this:

Topic: first-topic
TopicId: 4kVImoFNTQeyk3r2zQbdvw
PartitionCount: 6
ReplicationFactor: 2
Configs: segment.bytes=1073741824
Topic: first-topic Partition: 0 Leader: 3 Replicas: 3,1 Isr: 3,1
Topic: first-topic Partition: 1 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: first-topic Partition: 2 Leader: 2 Replicas: 2,3 Isr: 2,3
Topic: first-topic Partition: 3 Leader: 1 Replicas: 1,3 Isr: 1,3
Topic: first-topic Partition: 4 Leader: 3 Replicas: 3,2 Isr: 3,2
Topic: first-topic Partition: 5 Leader: 2 Replicas: 2,1 Isr: 2,1

Here, each partition has a leader and replicas. The Isr (in-sync replicas) set shows which replicas are in sync with the leader. By default, Kafka considers replicas in sync if they’re caught up within the last 10 seconds, though this time can be customized per topic.

Okay, now we’re ready to produce some messages. Use the kafka-console-producer.sh script to start the producer:

./bin/kafka-console-producer.sh –topic first-topic –bootstrap-server kafka1.your_domain:9092

Once you run this, you’ll see a blank prompt, meaning the producer is waiting for you to enter something. Type Hello World! and hit ENTER:

Hello World!
>

Now you’ve successfully sent a message to Kafka! You can keep typing messages to test it out. When you’re done, press CTRL+C to exit the producer.

Next, you’ll consume those messages using the kafka-console-consumer.sh script:

./bin/kafka-console-consumer.sh –topic first-topic –from-beginning –bootstrap-server kafka1.your_domain:9092

You should see the message you just produced:

Hello World!
…

Simulating Node Failure

Now, here’s where the fun starts: let’s simulate a failure on one of the Kafka nodes. To do this, stop the Kafka service on the third node by running:

sudo systemctl stop kafka

Next, let’s check the status of the first-topic again by running:

./bin/kafka-topics.sh –describe –bootstrap-server kafka1.your_domain:9092 –topic first-topic

The output will look like this:

Topic: first-topic
TopicId: 4kVImoFNTQeyk3r2zQbdvw
PartitionCount: 6
ReplicationFactor: 2
Configs: segment.bytes=1073741824
Topic: first-topic Partition: 0 Leader: 1 Replicas: 3,1 Isr: 1
Topic: first-topic Partition: 1 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: first-topic Partition: 2 Leader: 2 Replicas: 2,3 Isr: 2
Topic: first-topic Partition: 3 Leader: 1 Replicas: 1,3 Isr: 1
Topic: first-topic Partition: 4 Leader: 2 Replicas: 3,2 Isr: 2
Topic: first-topic Partition: 5 Leader: 2 Replicas: 2,1 Isr: 2,1

Notice that node 3 is still listed as a replica for some partitions, but it’s missing from the ISR (in-sync replicas) because it’s down. But once it comes back up, it will sync with the other nodes and get back in sync with the partition replicas.

To see if the messages are still available, run the consumer again:

./bin/kafka-console-consumer.sh –topic first-topic –from-beginning –bootstrap-server kafka1.your_domain:9092

You’ll find the messages are still there:

Hello World!
…

Thanks to the replicas, the first two nodes have taken over and are continuing to serve up the messages to the consumer.

Finally, to complete the simulation, restart Kafka on the third node:

sudo systemctl start kafka

And just like that, you’ve seen how Kafka gracefully handles a node failure and keeps your data available. Pretty neat, right? Now you’re ready for the next step, where we’ll cover how to exclude a node from the cluster in a controlled way.

Read more about connecting to Kafka clusters and managing topic creation in this guide Apache Kafka Quickstart Guide.

Step 3 – Migrating Data Between Nodes

In this step, you will learn how to move topics between nodes in a Kafka cluster. This is super helpful when you add new nodes to an existing cluster with topics, because, you know, Kafka doesn’t automatically move partitions to these new nodes. It’s also really useful if you need to remove nodes, as Kafka doesn’t move partitions to the remaining nodes automatically either. By manually migrating data, you can make sure that all the partitions are balanced and that your data gets redistributed where it’s needed.

Kafka provides this neat script called kafka-reassign-partitions.sh. This script lets you create, run, and verify plans for reassigning partitions. You’ll use it to create a plan to move the partitions of the first-topic to the first two nodes in your cluster.

Defining Topics to Migrate

The script needs a JSON file to define which topics you want to migrate. So, you’ll need to create and edit a file called topics-to-move.json with this content:

{
“topics”: [
{
“topic”: “first-topic”
}
],
“version”: 1
}

This file tells Kafka which topic to migrate (in this case, first-topic) and also specifies the version of the reassign plan. After adding this, save and close the file.

Generating the Migration Plan

Now that you’ve defined the topic, you can generate the migration plan. Run this command, but remember to replace kafka1.your_domain with the actual domain pointing to one of your Kafka nodes:

$ ./bin/kafka-reassign-partitions.sh –bootstrap-server kafka1.your_domain:9092 –topics-to-move-json-file topics-to-move.json –broker-list “1,2” –generate

In this command, the --broker-list "1,2" part specifies that the partitions should be reassigned to brokers 1 and 2. The output should look something like this:

Current partition replica assignment
{
“version”: 1,
“partitions”: [
{
“topic”: “first-topic”,
“partition”: 0,
“replicas”: [3, 1],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 1,
“replicas”: [1, 2],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 2,
“replicas”: [2, 3],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 3,
“replicas”: [1, 3],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 4,
“replicas”: [3, 2],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 5,
“replicas”: [2, 1],
“log_dirs”: [“any”, “any”]
}
]
}

This output shows the current assignment of replica partitions across brokers. Each partition has multiple replicas, and each replica is stored on different brokers.

Defining the Proposed Reassignment Plan

Now that you’ve got the current partition assignments, you can define how you want the partitions to be reassigned. You can update the partition replica assignments like this:

{
“version”: 1,
“partitions”: [
{
“topic”: “first-topic”,
“partition”: 0,
“replicas”: [2, 1],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 1,
“replicas”: [1, 2],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 2,
“replicas”: [2, 1],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 3,
“replicas”: [1, 2],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 4,
“replicas”: [2, 1],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 5,
“replicas”: [1, 2],
“log_dirs”: [“any”, “any”]
}
]
}

This updated configuration tells Kafka where each partition’s replicas should go. For example, partition 0 will now have replicas on brokers 2 and 1, and so on.

Saving and Executing the Plan

Now that you’ve got the reassignment plan, save it to a new file called migration-plan.json and open it to edit:

$ vi migration-plan.json

Add the second configuration reflecting the reassignment you’ve defined above. After saving and closing the file, run the following command to execute the migration plan:

$ ./bin/kafka-reassign-partitions.sh –bootstrap-server kafka1.your_domain:9092 –reassignment-json-file migration-plan.json –execute

The output will show the reassignment process, including the current partition replica assignments and confirmation that the migration was successful:

Current partition replica assignment
{
“version”: 1,
“partitions”: [
{
“topic”: “first-topic”,
“partition”: 0,
“replicas”: [3, 1],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 1,
“replicas”: [1, 2],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 2,
“replicas”: [2, 3],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 3,
“replicas”: [1, 3],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 4,
“replicas”: [3, 2],
“log_dirs”: [“any”, “any”]
},
{
“topic”: “first-topic”,
“partition”: 5,
“replicas”: [2, 1],
“log_dirs”: [“any”, “any”]
}
]
}

This confirms that the migration plan has been executed, and the partitions have been successfully reassigned.

Verifying the Migration

To check the status of the partition migration, run the following command:

$ ./bin/kafka-reassign-partitions.sh –bootstrap-server kafka1.your_domain:9092 –reassignment-json-file migration-plan.json –verify

After a little while, the output will show that the reassignment of all partitions is complete:

Status of partition reassignment:
Reassignment of partition first-topic-0 is completed.
Reassignment of partition first-topic-1 is completed.
Reassignment of partition first-topic-2 is completed.
Reassignment of partition first-topic-3 is completed.
Reassignment of partition first-topic-4 is completed.
Reassignment of partition first-topic-5 is completed.

This means all partitions have been successfully reassigned to the new brokers.

Final Verification

Finally, to make sure everything is in place, you can describe the first-topic again to ensure that no partitions are still on the old broker (broker 3, in this case). Run this command:

$ ./bin/kafka-topics.sh –describe –bootstrap-server kafka1.your_domain:9092 –topic first-topic

The output will now show that only brokers 1 and 2 are present as replicas and ISR (In-Sync Replicas), confirming that the migration was successful:

Topic: first-topic TopicId: 4kVImoFNTQeyk3r2zQbdvw
PartitionCount: 6 ReplicationFactor: 2 Configs: segment.bytes=1073741824
Topic: first-topic Partition: 0 Leader: 2 Replicas: 2,1 Isr: 1,2
Topic: first-topic Partition: 1 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: first-topic Partition: 2 Leader: 2 Replicas: 2,1 Isr: 2,1
Topic: first-topic Partition: 3 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: first-topic Partition: 4 Leader: 2 Replicas: 2,1 Isr: 2,1
Topic: first-topic Partition: 5 Leader: 1 Replicas: 1,2 Isr: 2,1

And just like that, you’ve successfully moved the partitions and verified that they’re now properly distributed across the brokers in your Kafka cluster.

Learn more about Kafka partition management and data migration strategies in this Apache Kafka Data Migration Documentation.

Conclusion

In conclusion, setting up a multi-node Kafka cluster with the KRaft consensus protocol is a powerful way to ensure fault tolerance and scalability in your data streaming architecture. By configuring Kafka nodes and efficiently managing topics, partitions, and data migration, you can build a robust, highly available Kafka cluster. The KRaft protocol eliminates the need for Apache ZooKeeper, streamlining the process while providing high performance. As you continue working with Kafka, it’s essential to understand how to handle node failures and maintain data integrity, ensuring your cluster remains resilient. Looking ahead, with the evolving capabilities of Kafka and KRaft, further improvements in distributed systems and data processing are expected, making it a crucial tool for developers and data engineers.

Master Kafka Management: Use KafkaAdminClient, kcat, Cruise Control
October 21, 2025
Master Kafka Management: Use KafkaAdminClient, kcat, Cruise Control
Introduction

Managing an Apache Kafka cluster efficiently requires a deep understanding of tools like KafkaAdminClient, kcat, and Cruise Control. These tools allow you to programmatically manage resources, automate task handling, and optimize cluster performance. With KafkaAdminClient, you can manage topics, partitions, and other crucial resources, while kcat offers a lightweight, Java-free way to access the cluster. Meanwhile, Cruise Control ensures workload balance and reliability by constantly monitoring and adjusting the cluster. In this tutorial, we’ll guide you through using these powerful tools to enhance the performance and scalability of your Kafka infrastructure.

What is Kafka AdminClient API?

The Kafka AdminClient API allows you to manage and interact with a Kafka cluster programmatically. It helps in performing administrative tasks such as creating, listing, and deleting topics, as well as retrieving information about the cluster. This tool is useful for handling Kafka resources more efficiently without relying on command-line scripts.

Step 1 – Utilizing Kafka AdminClient

So, you’ve already set up a Java project with all the dependencies needed to work with Kafka, right? Well, now it’s time to create a class that uses Kafka’s AdminClient class to manage your cluster.

First, you’ll navigate to the folder where the dokafka project is sitting. If you check the project structure, you’ll see that the source code is under src/main/java/com/dokafka. That’s where you’ll save the new class—name it AdminClientDemo.java.

Go ahead and open it up for editing by running:

$ nano src/main/java/com/dokafka/AdminClientDemo.java

Now, add these lines of code:

package com.dokafka;

import org.apache.kafka.clients.admin.*;
import org.apache.kafka.common.Node;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.*;

public class AdminClientDemo {
private static final Logger log = LoggerFactory.getLogger(AdminClientDemo.class);

public static void main(String[] args) {
String bootstrapServers = “kafka1.your_domain:9092”;

Properties properties = new Properties();
properties.put(“bootstrap.servers”, bootstrapServers);
final KafkaAdminClient client = AdminClient.create(properties);

try {
Collection nodes = client.describeCluster().nodes().get();
if (nodes == null)
log.info(“There seem to be no nodes in the cluster!”);
else
log.info(String.format(“Count of nodes: %sn”, nodes.size()));
} catch (Exception e) {
log.error(“An error occurred”, e);
}
}
}

Here’s the deal—first, you define the AdminClientDemo class and import all the classes that you’ll be using. You’ll also create a Logger to help you log events as your code runs. In the main() method, you start by setting the Kafka cluster address. Just make sure to replace kafka1.your_domain with your actual cluster address or IP.

Next, you’ll create a Properties object to hold your configuration settings for the Kafka consumer. This is where you tell Kafka where your cluster is located by setting bootstrap.servers.

Once that’s done, you create an AdminClient by calling AdminClient.create(properties). This is the magic that allows you to perform admin tasks within Kafka—like listing, creating, and deleting topics, partitions, and offsets.

Inside the try block, you use describeCluster() to get information about your cluster. Specifically, you call nodes() to grab all the nodes in the cluster. If there are no nodes, you’ll log a message that says there are none. Otherwise, you’ll log the number of nodes.

Once you’re done, save and close the file. Next, you’ll create a script that compiles and runs your AdminClientDemo.java. Save it as run-adminclient.sh.

To edit the file, run:

$ nano run-adminclient.sh

Add the following to the file:

#!/bin/bash
mvn clean
mvn package
java -cp target/dokafka-1.0-SNAPSHOT.jar:target/lib/* com.dokafka.AdminClientDemo

Save and close the file, then make it executable by running:

$ chmod +x run-adminclient.sh

Finally, give it a try by running:

./run-adminclient.sh

The output will be long, but at the end, you should see something like this:

[main] INFO com.dokafka.AdminClientDemo – Count of nodes: 3

This tells you that AdminClientDemo has successfully connected to Kafka and pulled the node count.

Creating and Listing Topics

Next up, let’s create a topic and list all the topics in the cluster. KafkaAdminClient has the createTopics() and listTopics() methods, and you’ll be using them for this task.

Open AdminClientDemo.java again and update the code as follows:

try {
NewTopic newTopic = new NewTopic(“newTopic”, 1, (short) 1);
CreateTopicsResult result = client.createTopics(
Collections.singleton(newTopic)
);
result.all().get();

ListTopicsOptions options = new ListTopicsOptions();
options.listInternal(true);
Collection topics = client.listTopics(options).listings().get();
for (TopicListing topic: topics) {
log.info(“Topic: ” + topic.name());
}
} catch (Exception e) {
log.error(“An error occurred”, e);
}

In this part of the code, you first create a NewTopic instance. This represents the new topic that you want to add to your Kafka cluster. You give it the name “newTopic,” and you specify that it will have 1 partition and 1 replica. The replica count needs to be cast as a short, by the way.

You use createTopics() to send this topic to the Kafka cluster. Since the operation is asynchronous, you call result.all().get() to make sure the operation completes before moving on.

Then, you set up a ListTopicsOptions instance, which controls how topics will be retrieved from the cluster. You enable listInternal(true) so you can include internal topics Kafka uses behind the scenes. After that, you fetch and loop through the list of topics, logging each topic’s name.

Once you’ve done that, save and close the file. Then, run the script again:

./run-adminclient.sh

At the end of the output, you should see the list of topics:

[main] INFO com.dokafka.AdminClientDemo – Topic: newTopic
[main] INFO com.dokafka.AdminClientDemo – Topic: java_demo

Deleting Topics

Let’s say you want to delete a topic—no problem! KafkaAdminClient has the deleteTopics() method to do just that. In this example, you’ll delete the topic “newTopic.”

Open AdminClientDemo.java once more and replace the code for topic creation with this:

DeleteTopicsResult deleted = client.deleteTopics(Collections.singleton(“newTopic”));
deleted.all().get();
log.info(“Topic newTopic deleted!”);

ListTopicsOptions options = new ListTopicsOptions();
options.listInternal(true);
Collection topics = client.listTopics(options).listings().get();
for (TopicListing topic: topics) {
log.info(“Topic: ” + topic.name());
}

Here, you pass in a collection containing the name of the topic you want to delete—in this case, “newTopic.” Just like when creating topics, this operation is asynchronous, so you wait for it to finish by calling deleted.all().get(). After that, you log that the topic was deleted. Finally, the script fetches and lists the remaining topics, ensuring that “newTopic” is no longer on the list.

Once you’ve updated the file, save and close it, and run the script again:

./run-adminclient.sh

The output will confirm that “newTopic” was deleted and will list the remaining topics:

[main] INFO com.dokafka.AdminClientDemo – Topic newTopic deleted!
[main] INFO com.dokafka.AdminClientDemo – Topic: java_demo

Now you’ve successfully used the KafkaAdminClient to manage topics in your Kafka cluster. You retrieved cluster information, listed topics, created new ones, and even deleted them programmatically. You’re officially a Kafka admin now!

Read more about managing Kafka clusters with the KafkaAdminClient in this detailed guide: Kafka AdminClient Tutorial for Managing Kafka Clusters

Step 2 – Using kcat to Manage the Cluster

In this step, you’ll learn how to download and install kcat, which is a command-line tool for accessing and managing Kafka clusters without the need for Java. This tool is super lightweight and comes in handy for carrying out essential Kafka tasks without having to write any code. Let’s dive in, shall we?

First, you need to install the right package for your operating system. If you’re on macOS, kcat is pretty easy to install via Homebrew, which is a package manager for macOS. Just run this simple command:

$ brew install kcat

On Debian and Ubuntu systems, you can grab kcat via the trusty apt package manager. Here’s the command for that:

$ sudo apt install kafkacat

Now, just a little side note: kafkacat is the old name for kcat, but it’s still available in the package manager for compatibility purposes. If you’re on a different Linux distribution or OS, check the official documentation for installation details.

Producing and Consuming Messages

One of the things Kafka is awesome at is streaming data from a topic, and the best part? kcat makes it super easy. You can even stream from one or more topics at once, which is pretty useful when you’re keeping an eye on Kafka topics in real-time.

Let’s say you want to stream messages from several topics at once. You’d use this command:

$ kcat -b your_broker_address:9092 first_topic second_topic …

This will stream messages from the listed topics and display them in the console. For example, to stream messages from the java_demo topic to your console, use this command:

$ kcat -b kafka1.your_domain:9092 java_demo

This will connect to your Kafka broker (in this case, kafka1.your_domain:9092) and stream messages from the java_demo topic. If you’ve used the kafka-console-consumer.sh script before, this does the same thing. You can also stream from the beginning of the topic with kcat by using the -t flag:

$ kcat -b kafka1.your_domain:9092 -t java_demo

When you run this, the output might look like this, showing you the messages you’ve produced earlier:

% Auto-selecting Consumer mode (use -P or -C to override)
% Reached end of topic java_demo [1] at offset 0
% Reached end of topic java_demo [2] at offset 0
Hello World!
% Reached end of topic java_demo [0] at offset 0
% Reached end of topic java_demo [3] at offset 0
% Reached end of topic java_demo [4] at offset 1
% Reached end of topic java_demo [5] at offset 0

If you want the consumed messages in JSON format, simply add the -J flag:

$ kcat -b kafka1.your_domain:9092 -t java_demo -J

This will give you the output in JSON, showing more details like the timestamp, broker ID, partition number, and more:

% Auto-selecting Consumer mode (use -P or -C to override)
% Reached end of topic java_demo [2] at offset 0
% Reached end of topic java_demo [0] at offset 0
% Reached end of topic java_demo [1] at offset 0
{“topic”:”java_demo”,”partition”:4,”offset”:0,”tstype”:”create”,”ts”:1714922509999,”broker”:1,”key”:null,”payload”:”Hello World!”}
% Reached end of topic java_demo [3] at offset 0
% Reached end of topic java_demo [5] at offset 0
% Reached end of topic java_demo [4] at offset 1

Producing Messages

When you’re ready to send some messages to your topic, just switch kcat into producer mode with the -P flag. This will allow you to send messages straight to the topic from the command line, just like with the kafka-console-producer.sh script.

To send messages to the java_demo topic, you’d run this:

$ kcat -b kafka1.your_domain:9092 -t java_demo -P

Now, once you’re in producer mode, you can type your messages one by one, hitting ENTER after each. When you’re done, hit CTRL + C followed by ENTER to stop the producer and return to your command prompt.

The cool part? kcat lets you tweak how the output looks with templates. This means you can display more info about each message, like the topic name, partition number, offset, key, and the message payload. To set this up, use the -f flag, like this:

$ kcat -b kafka1.your_domain:9092 -t java_demo -f ‘Topic %t[%p], offset: %o, key: %k, payload: %S bytes: %sn’

This command will show you all the messages from the start of the java_demo topic, but with extra details included:

% Auto-selecting Consumer mode (use -P or -C to override)
% Reached end of topic java_demo [2] at offset 0
% Reached end of topic java_demo [1] at offset 0
Topic java_demo[4], offset: 0, key: , payload: 12 bytes: Hello World!
% Reached end of topic java_demo [0] at offset 0
% Reached end of topic java_demo [3] at offset 0
% Reached end of topic java_demo [4] at offset 1
% Reached end of topic java_demo [5] at offset 0

This is really handy for tracking messages by topic, partition, offset, and message content.

Listing Cluster Metadata

Need a quick look at the metadata of your Kafka cluster? No worries, kcat has you covered. You can list everything in your cluster, from brokers to topics and partitions, with the -L flag. Here’s the command for that:

$ kcat -b kafka1.your_domain:9092 -L

The output will look something like this, showing you all the important info about brokers, topics, and partitions:

3 brokers: broker 1 at kafka1.your_domain:9092
broker 2 at kafka2.your_domain:9092 (controller)
broker 3 at kafka3.your_domain:9092
1 topic: topic “java_demo” with 6 partitions: partition 0, leader 3, replicas: 3,1 isrs: 3,1 partition 1, leader 1, replicas: 1,2 isrs: 1,2 partition 2, leader 2, replicas: 2,3 isrs: 2,3 partition 3, leader 2, replicas: 2,1 isrs: 2,1 partition 4, leader 1, replicas: 1,3 isrs: 1,3 partition 5, leader 3, replicas: 3,2 isrs: 3,2

Here, you’ll see the broker IDs, the leader of each partition, the replicas, and the in-sync replica set (ISR). This metadata is key for understanding how your cluster is structured and how replication is working.

If you prefer a more structured output, just add the -J flag, and kcat will give you the metadata in JSON format:

$ kcat -b kafka1.your_domain:9092 -L -J

The JSON format is much easier to parse programmatically:

{
“originating_broker”: {
“id”: 2,
“name”: “kafka2.your_domain:9092/2”
},
“query”: {
“topic”: “*”
},
“controllerid”: 3,
“brokers”: [
{
“id”: 1,
“name”: “kafka1.your_domain:9092”
},
{
“id”: 2,
“name”: “kafka2.your_domain:9092”
},
{
“id”: 3,
“name”: “kafka3.your_domain:9092”
}
],
“topics”: [
{
“topic”: “java_demo”,
“partitions”: [
{
“partition”: 0,
“leader”: 3,
“replicas”: [
{“id”: 3}
],
“isrs”: [
{“id”: 3}
]
}
]}
}
]

So there you go! In this step, you installed kcat, a powerful tool for accessing and managing Kafka clusters without needing Java. You learned how to retrieve cluster metadata, produce and consume messages, and even use advanced features like custom templates and JSON output. Now, you’re all set to dive deeper into managing your Kafka cluster with more tools like Cruise Control!

Read more about efficiently managing Kafka clusters with tools like kcat in this comprehensive guide: <a href="https://example.com/kcat-kafka-ma

Step 3 – Automating Rebalances with Kafka Cruise Control

Cruise Control is an open-source project developed by LinkedIn that keeps a close eye on your Kafka brokers within a cluster, rebalancing the workloads to make sure resources are used efficiently and throughput is optimized. It’s like having a personal manager for your Kafka cluster that does all the heavy lifting to keep things running smoothly. Here’s the thing: Cruise Control automatically balances partition loads and optimizes system resources. Pretty cool, right?

By default, Cruise Control comes with pre-configured targets, or “goals,” that guide how it optimizes things. These goals are like guidelines that help ensure your cluster operates efficiently by balancing key resources—think CPU, disk, and network usage—while also ensuring there are enough replicas for each topic and partition. Some of the big goals it focuses on include:
- Ensuring each topic has the correct number of replicas.
- Keeping CPU, network, and disk usage balanced across brokers.
- Making sure that each partition is properly distributed across brokers to make the most of available capacity.
And hey, if you have special requirements for your Kafka cluster, don’t worry. Cruise Control lets you create custom goals that fit your unique needs. So, if your setup has specific performance metrics or requirements, you can tweak Cruise Control to match them.

Compiling and Installing Cruise Control

To get Cruise Control up and running, you first need to compile it from the source. Here’s how you can do that:

Start by cloning the official Git repository for Cruise Control:

$ git clone https://github.com/linkedin/cruise-control.git

Once it’s cloned, go ahead and navigate into the Cruise Control directory:

$ cd cruise-control

Now, to compile the project, use Gradle:

$ ./gradlew jar

The build process will take a few minutes, depending on your system’s speed. Once it’s done, you’ll see an output like this, indicating that everything went smoothly:

BUILD SUCCESSFUL in 2m 41s
17 actionable tasks: 17 executed

At this point, Cruise Control is compiled and ready to go, along with its metrics reporter. You’ll find the reporter located in the cruise-control-metrics-reporter/build/libs/ directory as a JAR file. This reporter plays a key role in sending metrics about your brokers into a topic on the Kafka cluster that Cruise Control can monitor.

Next up, you need to copy all dependencies to the target directory by running:

$ ./gradlew jar copyDependantLibs

The output will look something like this:

BUILD SUCCESSFUL in 15s
17 actionable tasks: 1 executed, 16 up-to-date

Configuring the Kafka Brokers

With the Cruise Control components compiled, it’s time to configure your Kafka brokers to use the metrics reporter. This is what allows Cruise Control to monitor broker performance and make decisions about rebalancing based on the data it gathers.

To get started, copy the metrics reporter JAR file into the libs/ directory where Kafka is installed:

$ cp cruise-control-metrics-reporter/build/libs/* /home/kafka/kafka/libs/

Now, you need to modify the Kafka broker configuration file to use the Cruise Control metrics reporter. To do this, open the server.properties file for editing:

$ nano /home/kafka/kafka/config/kraft/server.properties

At the end of the file, add this line:

metric.reporters=com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter

Save the file, then restart the broker to apply the changes:

$ sudo systemctl restart kafka

Don’t forget to repeat this process for each broker in your cluster, so that the Cruise Control metrics reporter is active on all of them.

Once a few minutes have passed, you can verify everything is set up properly by listing the topics in the cluster using kcat. You should see a topic called __CruiseControlMetrics that stores the metrics data collected by the reporter. Run this command to list the topics:

$ kcat -b kafka1.your_domain:9092 -L

The output should look something like this:

topic “__CruiseControlMetrics” with 6 partitions:
partition 0, leader 3, replicas: 3,2 isrs: 3,2
partition 1, leader 2, replicas: 2,3 isrs: 2,3
partition 2, leader 3, replicas: 3,2 isrs: 3,2
partition 3, leader 2, replicas: 2,3 isrs: 2,3
partition 4, leader 2, replicas: 2,3 isrs: 2,3
partition 5, leader 3, replicas: 3,2 isrs: 3,2

Configuring Broker Capacity

For Cruise Control to effectively optimize your brokers, it needs to know the hardware specs of each broker in the cluster. This information is stored in a file called capacity.json, which is located under the config/ directory.

To modify this file, open it for editing:

$ nano config/capacity.json

You’ll see something like this in the default configuration:

{
“brokerCapacities”:[
{
“brokerId”: “-1”,
“capacity”: {
“DISK”: “100000”,
“CPU”: “100”,
“NW_IN”: “10000”,
“NW_OUT”: “10000”
},
“doc”: “This is the default capacity. Capacity unit used for disk is in MB, cpu is in percentage, network throughput is in KB.”
},
{
“brokerId”: “0”,
“capacity”: {
“DISK”: “500000”,
“CPU”: “100”,
“NW_IN”: “50000”,
“NW_OUT”: “50000”
},
“doc”: “This overrides the capacity for broker 0.”
}
]
}

Now, update the file for each broker in your cluster to reflect the correct specifications. For example, if you have three Kafka brokers, the file might look like this:

{
“brokerCapacities”:[
{
“brokerId”: “-1”,
“capacity”: {
“DISK”: “100000”,
“CPU”: “100”,
“NW_IN”: “10000”,
“NW_OUT”: “10000”
},
“doc”: “This is the default capacity. Capacity unit used for disk is in MB, cpu is in percentage, network throughput is in KB.”
},
{
“brokerId”: “1”,
“capacity”: {
“DISK”: “100000”,
“CPU”: “100”,
“NW_IN”: “10000”,
“NW_OUT”: “10000”
},
“doc”: “”
},
{
“brokerId”: “2”,
“capacity”: {
“DISK”: “100000”,
“CPU”: “100”,
“NW_IN”: “10000”,
“NW_OUT”: “10000”
},
“doc”: “”
},
{
“brokerId”: “3”,
“capacity”: {
“DISK”: “100000”,
“CPU”: “100”,
“NW_IN”: “10000”,
“NW_OUT”: “10000”
},
“doc”: “”
}
]
}

Make sure the disk capacities match your cluster’s server specs, then save and close the file.

Configuring Cruise Control for KRaft Mode

Now that the broker capacities are set up, you need to configure Cruise Control to connect to your Kafka cluster in KRaft mode (without ZooKeeper). To do that, you’ll edit the cruisecontrol.properties file under the config/ directory:

Open the cruisecontrol.properties file:

$ nano config/cruisecontrol.properties

Find the line with the bootstrap.servers property, which specifies which broker to connect to. Replace it with the address of your Kafka broker:

bootstrap.servers=kafka1.your_domain:9092

Next, find the kafka.broker.failure.detection.enable parameter and enable KRaft mode:

# Switch to KRaft mode
kafka.broker.failure.detection.enable=true

Finally, find the capacity.config.file parameter, which specifies the path to the broker capacity configuration file. Uncomment the capacity.config.file=config/capacity.json line and comment out the capacity.config.file=config/capacityJBOD.json line:

# The configuration for the BrokerCapacityConfigFileResolver (supports JBOD, non-JBOD, and heterogeneous CPU core capacities)
capacity.config.file=config/capacity.json
#capacity.config.file=config/capacityJBOD.json

Save and close the file when you’re done.

Starting Cruise Control

With everything set up, you can finally start Cruise Control by running this command in a separate terminal:

$ ./kafka-cruise-control-start.sh config/cruisecontrol.properties

Now Cruise Control will start monitoring and optimizing your Kafka cluster in real-time. You’ll see continuous output in your terminal as it balances the cluster’s workloads.

Using the Cruise Control CLI

Cruise Control has a REST API running on port 9090 that you can use for configuration and administrative tasks. But here’s the kicker: the project also provides cccli, a Python tool that wraps this API and makes it much easier to use.

First, navigate to your Python virtual environment (which you set up earlier):

$ cd ~/venv

Activate it by running:

$ ./bin/activate

Then, install the cruise-control-client package using pip:

$ pip install cruise-control-client

After the installation, the cccli command is ready to use. You can now interact with the Cruise Control REST API using this command. For example, to fetch stats about your cluster’s current load, run:

$ cccli -a localhost:9090 load

This will give you detailed metrics on the cluster’s performance, like disk usage, CPU load, and network throughput.

Enabling Auto-Healing for Broker Failures

Cruise Control can be configured to automatically heal the cluster in case of broker failure, goal violations, or metric anomalies. To enable self-healing for broker failures, run this command:

$ cccli -a localhost:9090 admin –enable-self-healing-for broker_failure

The command will show the old and new states of the setting, letting you know that self-healing is now enabled:

{
selfHealingEnabledBefore: {BROKER_FAILURE=false},
selfHealingEnabledAfter: {BROKER_FAILURE=true}
}

With Cruise Control up and running, your Kafka cluster will be monitored and optimized for efficiency and reliability. Plus, you’ve learned how to install and use the cccli tool to manage Cruise Control through the command line.

Read more about optimizing Kafka cluster performance with Cruise Control and its automation features in this detailed guide: Automating Kafka Rebalances with Cruise Control

Conclusion

In conclusion, mastering Kafka management with tools like KafkaAdminClient, kcat, and Cruise Control can significantly improve your Apache Kafka cluster’s efficiency and performance. By using KafkaAdminClient, you can easily manage resources such as topics and partitions, while kcat offers a Java-free way to interact with your cluster. Cruise Control takes it a step further by optimizing workloads and ensuring balanced performance across your Kafka brokers. With the step-by-step instructions provided in this tutorial, you now have the knowledge to enhance your Kafka infrastructure, ensuring better scalability and reliability. As Kafka continues to evolve, integrating these tools will be crucial in maintaining efficient, high-performing clusters in the future.

Optimize RAG Applications with Large Language Models and GPU (2025)
October 21, 2025
Retrieve DNS Information with DIG on Windows, macOS, Linux
Introduction

If you’re looking to retrieve DNS information from hostnames or IP addresses, the “dig” tool from BIND is an essential command-line utility. Whether you’re using Windows, macOS, or Linux, mastering dig can simplify DNS queries and provide valuable insights into domain configurations. This article will guide you through installing dig on these operating systems and show you how to use it for common DNS record queries like A, NS, MX, and SOA. With clear, step-by-step instructions, you’ll be ready to leverage dig for all your DNS troubleshooting needs.

What is dig?

Dig is a tool used to look up DNS information about websites or IP addresses. It helps users find details like the IP address of a website, the names of the servers that manage its domain, and other related information. This tool works through a command-line interface, and users can install it on different operating systems, including Windows, macOS, and Linux. Once installed, it can be used to run various queries to check DNS records.

Installing dig

You can install dig on most operating systems by downloading the latest version of BIND 9 from BIND’s website, or by using a package manager from the command line. This makes it really easy to get dig set up no matter what operating system you’re on, allowing you to pull DNS info from different systems. Here’s how you can install dig on Windows, macOS, and Linux.

Windows Installation

To install dig on a Windows machine, start by heading over to BIND’s website and downloading the latest version of BIND 9. Once the installation file is downloaded, extract it to a folder on your system. Then, double-click the BINDinstall icon in the folder to kick off the installation process.

When the BIND 9 Installer screen pops up, check that the target directory is set to C:Program FilesISC BIND 9 (or C:Program Files (x86)ISC BIND 9 if you’re running an x86 version). Don’t forget to tick the “Tools Only” box before moving on and hitting Install.

Once BIND 9 is installed, the next step is to make sure you can use dig from the command line. You’ll need to add BIND to your system’s PATH. To do this, go into the Windows Control Panel, click on System Properties, and then head over to the Advanced tab. From there, click on Environment Variables.

In the System Variables section, find the Path variable, select it, and click Edit. In the Edit environment variable screen, click New and enter the path to the BIND 9 bin folder: C:Program FilesISC BIND 9bin (or C:Program Files (x86)ISC BIND 9 for x86 systems). After you add the path, click OK to close all the windows and confirm the change.

Now that you’ve set the path, open up a new Command Prompt window and check if dig is installed by running the command $ dig -v. This should show the version info for dig. If it doesn’t, it probably means there’s an issue with your PATH configuration, so you might want to go back and double-check that the path is set up correctly.

macOS Installation

On macOS, dig is usually already installed, so you can just jump into the Terminal and use it right away. To check if dig’s already set up, run dig -v in the Terminal. If it returns the version info, you’re good to go!

If you don’t see the version info and get an error message instead, it means you’ll need to install BIND manually. Don’t worry, it’s easy. First, you need to make sure Homebrew is installed on your system. You can check by running the command brew -v in the Terminal. If Homebrew is missing, you can install it with this command:

/usr/bin/ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)”

Once Homebrew is installed, you can install BIND by running:

brew install bind

After BIND is installed, run dig -v again in the Terminal. This time, you should see the version number appear, confirming that dig is installed and ready to go on your macOS system.

Linux Installation

On most Linux distributions, dig is already installed by default, so you can usually just open a terminal and get started. To check if dig is installed, open your terminal and run dig -v. If you see the version info, then dig is good to go. If not, you’ll need to install it manually.

To install dig on Linux, you’ll need to get the dnsutils package, which includes dig along with other DNS tools. If you’re using a system with the apt-get package manager (like Ubuntu or Debian), run the following commands:

$ sudo apt-get update
$ sudo apt-get install dnsutils

These commands update your package list and install the required tools. Once that’s done, run dig -v in the terminal again, and you should see the version info appear, confirming that dig is installed successfully.

With these installation steps, you’ll be able to use dig on Windows, macOS, or Linux to grab DNS information and troubleshoot domain issues like a pro.

Read more about setting up and configuring BIND and dig for DNS queries in this comprehensive guide on how to install and use the dig tool for troubleshooting DNS records DNS and BIND (4th Edition).

Windows Installation

To install dig for Windows, start by going to BIND’s website and downloading the latest version of BIND 9. This version comes with the dig tool and all the other necessary parts for DNS querying. Once the file is downloaded, extract it to a folder of your choice. After extracting, double-click the BINDinstall icon in the folder to start the installation process.

When you see the BIND 9 Installer screen, make sure the target directory is set correctly. The default directory is C:Program FilesISC BIND 9, but if you’re using a 32-bit system (x86 architecture), it might be C:Program Files (x86)ISC BIND 9. Don’t forget to check the “Tools Only” box so you only install the essential tools, like dig, and not any extra services. Once everything looks good, click the Install button to move forward with the installation.

Once BIND 9 is installed, the next step is to add BIND to your system’s PATH variable. This is so you can access dig from any command prompt window, no matter where you are. To do this, go into your Windows Control Panel, find System Properties, and click on the Advanced tab. Then, click on Environment Variables.

In the Environment Variables section, scroll down under System Variables and find the “Path” variable. Select it, then click Edit. In the Edit environment variable window, click New and add the path to the BIND 9 installation directory. By default, it should be C:Program FilesISC BIND 9bin (or C:Program Files (x86)ISC BIND 9bin if you’re using an x86 system). After adding the path, click OK to save the change.

Once that’s done, click OK again in the Edit Variables window, then in the System Properties window. Now that the path is updated, it’s time to test the installation of dig. Open a new Command Prompt window and type:

$ dig -v

to check if it’s working. If it shows the version info for dig, you are all set!

For a detailed guide on configuring DNS tools and their installation on Windows, check out this insightful resource How to Install and Configure BIND9 on Windows.

macOS Installation

On macOS, dig is usually already installed, which is great because that means you can jump straight into using it through the Terminal without any extra setup. To check if dig is already good to go, just open the Terminal and type dig -v. If everything is working as it should, this will show you the version information for dig, confirming it’s installed and ready.

But here’s the thing—if you don’t see the version info and get an error instead, it means dig isn’t installed yet. No worries though! You can still get it by installing BIND, which comes with dig. BIND is open-source software that handles DNS stuff, and it includes all the tools you need for DNS queries, including dig.

To get BIND on your macOS, you’ll need to use Homebrew. Homebrew is this awesome package manager that makes installing software on macOS super easy from the command line. To check if you already have Homebrew, type brew -v in your Terminal. If it’s installed, you’ll see the version number pop up.

If it’s not installed, don’t stress! You can install it by running this command in your Terminal:

$ /usr/bin/ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)”

This will download and set up Homebrew on your macOS system. Once Homebrew is ready to go, you can install BIND by running this command:

$ brew install bind

After BIND is installed, run dig -v again in your Terminal to make sure dig is now working. If it shows the version info, then boom! You’re all set to start using dig for DNS queries. With BIND and dig up and running, you’re ready to troubleshoot DNS issues and grab DNS records with ease.

For more details on using Homebrew for software installation on macOS, check out this comprehensive guide How to Install Homebrew on macOS.

Linux Installation

On most Linux systems, dig is usually already installed as part of the operating system’s package set. This is super handy because it means you can start using dig right away from the command line without any extra installation steps. To quickly check if dig is ready to go, all you need to do is run the command $ dig -v in your terminal. If it spits out version information about dig, then you’re all set—it’s already installed and ready to rock.

But here’s the thing—if you don’t see version info or get an error message instead, that means dig isn’t installed. Don’t worry though, you can fix this by installing the necessary package that includes dig. On many Linux distributions, dig is bundled in with the dnsutils package, which has a bunch of DNS tools like dig, nslookup, and more.

To install dnsutils on a system that uses the apt-get package manager (like Ubuntu or Debian), just run these commands:

$ sudo apt-get update
$ sudo apt-get install dnsutils

The first command makes sure your package list is up to date, so you’re getting the latest versions of available software. The second command installs dnsutils, which will give you dig and other handy DNS tools. Once that’s done, you can run the $ dig -v command again to make sure dig is installed and working properly.

After installation, dig will be ready for use, and you can start querying DNS records to troubleshoot domain issues or grab any other relevant DNS data. It’s a super useful tool, especially if you’re a network administrator, developer, or just someone who works with DNS queries on Linux.

For more detailed guidance on managing DNS utilities on Linux, take a look at this helpful article How to Install and Use DNSUtils on Linux.

Common dig Commands

Here are some common dig commands that you can use to grab DNS info about a hostname. These commands let you do different kinds of DNS queries, depending on the specific records you want to get. Just run any of these examples in your terminal to see the output and get DNS details about a hostname.

dig <hostname>

Example: $ dig example.com

Description: This one gets the A records (Address records) for the hostname you specified. Essentially, it gives you the IP addresses linked to the domain name you’re asking about.

dig <hostname> any

Example: $ dig example.com any

Description: This command pulls up all the available DNS records for the hostname, including A, NS (Name Server), SOA (Start of Authority), and more. It’s like a full snapshot of all the DNS data for the domain.

dig @<name server address> <hostname> <record type>

Example: $ dig @ns1.caasify.com example.com MX

Description: With this command, you’re asking a specific name server for DNS records directly, instead of going through your default ISP’s resolver. You can even specify a record type (like MX for mail exchange) if you just want to get that info for the hostname.

dig <hostname> <record type>

Example: $ dig example.com NS

Description: This command asks for DNS records of a specific type. In this case, it grabs the NS (Name Server) records, which tell you which servers are in charge of managing the domain.

dig <hostname> +short

Example: $ dig example.com +short

Description: This one gives you a simplified output, just showing the IP addresses for all A records associated with the hostname. It leaves out the extra stuff like TTL (Time to Live) or additional sections, just giving you the essentials.

dig <hostname> +trace

Example: $ dig example.com +trace

Description: Adding the +trace option tells dig to follow the query all the way from the root name servers, showing you how each server in the chain is involved in resolving the query. It’s a useful way to see the whole DNS resolution process.

These dig commands give you one or more sections of info about the DNS records for a hostname, depending on the query syntax. For example, when running the command $ dig example.com, the results could look something like this:

; <> DiG 9.10.6 <> example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50169
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 5
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;example.com. IN A
;; ANSWER SECTION:
example.com. 6108 IN A 93.184.216.34
;; AUTHORITY SECTION:
example.com. 52437 IN NS b.iana-servers.net.
example.com. 52437 IN NS a.iana-servers.net.
;; ADDITIONAL SECTION:
a.iana-servers.net. 195 IN A 199.43.135.53
a.iana-servers.net. 195 IN AAAA 2001:500:8f::53
b.iana-servers.net. 195 IN A 199.43.133.53
b.iana-servers.net. 195 IN AAAA 2001:500:8d::53

The most important sections you’ll usually care about are:

Question Section

This just confirms the query you made to the DNS. In the example, $ dig asked about the A records for “example.com.”

Answer Section

This is where you get the actual records returned by the query. It’s usually the most important part. In the example, $ dig returned the A record for the IP address 93.184.216.34.

Authority Section

Here, you’ll see the authoritative name servers that handle the DNS records for the queried hostname. This can be helpful if you need to check the current delegation of a hostname.

Additional Section

This section includes any extra info that the resolver might send along with the original query answer. For example, it could show you the IP addresses of the name servers for “example.com” in addition to the A record information.

For a deeper dive into DNS commands and their use, check out this detailed guide on common DNS queries and troubleshooting techniques Common DNS Queries and Troubleshooting Techniques.

Understanding dig Command Output

The dig command is pretty handy because it gives you a detailed breakdown of a hostname’s DNS records. Depending on what you ask it to do, you’ll get different sections of information. Each part of the output reveals something useful, helping you understand how DNS queries work and what records belong to a domain. Here’s an example of the results you might see when you run dig example.com:

; <> DiG 9.10.6 <> example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50169
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 5
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;example.com. IN A
;; ANSWER SECTION:
example.com. 6108 IN A 93.184.216.34
;; AUTHORITY SECTION:
example.com. 52437 IN NS b.iana-servers.net.
example.com. 52437 IN NS a.iana-servers.net.
;; ADDITIONAL SECTION:
a.iana-servers.net. 195 IN A 199.43.135.53
a.iana-servers.net. 195 IN AAAA 2001:500:8f::53
b.iana-servers.net. 195 IN A 199.43.133.53
b.iana-servers.net. 195 IN AAAA 2001:500:8d::53

So, what does all this mean? Let’s break down the most important sections of the dig output:
- Question Section: This just confirms what you asked for. In this example, you’re asking for the A record (which is the IP address) for example.com. It’s basically a confirmation of the query you sent out to the DNS.
- Answer Section: Here’s where the magic happens—the answer you were looking for. In this case, dig gave you the A record for example.com, which is the IP address 93.184.216.34. This is the key info you’re usually after when you’re checking DNS records for a domain.
- Authority Section: This part shows you which name servers are the “authoritative” ones for the domain’s DNS records. Basically, these servers have the official answers. So, in this case, a.iana-servers.net and b.iana-servers.net are in charge of the example.com DNS records.
- Additional Section: This section is like the bonus round of DNS info. It gives you extra details that might be helpful, like IP addresses for the authoritative name servers. In this case, you get both IPv4 and IPv6 addresses for a.iana-servers.net and b.iana-servers.net.
By understanding these sections, you can get a full picture of what’s going on with the DNS resolution process. Whether you’re troubleshooting DNS issues or just checking on a domain’s records, this breakdown will help you make sense of what’s being returned.

For a deeper understanding of DNS queries and how to interpret results, you can explore this guide on interpreting DNS query outputs Understanding DNS Query Types and Outputs.

Conclusion

In conclusion, the “dig” tool from BIND is a powerful command-line utility that allows you to easily retrieve DNS information on Windows, macOS, and Linux systems. Whether you need to check A, NS, MX, or SOA records, mastering dig is essential for any network administrator or developer working with DNS. By following the step-by-step instructions in this guide, you can quickly set up dig and begin troubleshooting domain-related issues with ease. As DNS queries continue to evolve, tools like dig will remain invaluable for gaining deeper insights into domain configurations and ensuring network reliability.

Master Linux SED Command: Text Manipulation and Automation Guide
October 21, 2025
Set Up Stable Diffusion on GPU Droplet with WebUI by AUTOMATIC1111
Introduction

Setting up Stable Diffusion on a GPU Droplet with the WebUI by AUTOMATIC1111 can significantly enhance your AI image generation workflow. With the power of GPU resources from DigitalOcean’s Droplet, users can easily harness Stable Diffusion’s potential to generate high-quality, detailed images. This guide will walk you through all the necessary steps, from creating a GPU Droplet to configuring Stable Diffusion with the WebUI. Whether you’re working with positive and negative prompts or optimizing GPU utilization, you’ll find everything you need to get started in this step-by-step tutorial.

What is Stable Diffusion?

Stable Diffusion is an AI tool that helps generate images based on text descriptions. It allows users to create detailed images by writing prompts that specify what they want to see. The tool uses both positive prompts (for what to include) and negative prompts (to exclude unwanted elements). This makes it easy for anyone to create custom images, such as those depicting marine life, by simply typing what they want in plain language.

Step 1-Set Up the GPU Droplet

Alright, let’s get things rolling! First off, you’ll need to create a Cloud Server that has GPU capabilities. The process is pretty simple. Log into your Caasify account and head over to the Cloud section. From there, you can start creating a new Cloud Server. When choosing the server plan, be sure to pick one that includes GPU resources. Don’t worry about going overboard with specs; a basic GPU plan should be more than enough for running Stable Diffusion and generating images. It gives you the power needed for tasks that require a bit more oomph, like image processing.

Now that your Cloud Server is up and running, let’s talk security for a minute. Using the root user for everything isn’t the best move, you know? It’s much safer to create a new user with limited privileges. This keeps your server more secure, especially as you start setting things up. To do this, just run these commands:

$ adduser do-shark

Then, give this new user sudo privileges, so they can perform admin tasks when needed:

$ usermod -aG sudo do-shark

Next, switch over to the new user by executing:

$ su do-shark

And finally, head to the home directory of the new user:

$ cd ~/

By doing this, you’re making sure you’re following security best practices right from the start. It’s a small but crucial step in making sure your GPU Cloud Server setup is secure and easy to manage.

For detailed guidance on setting up GPU Droplets, check out this comprehensive resource on how to configure and optimize your cloud server for demanding tasks like Stable Diffusion: GPU Droplet Setup Guide.

Step 2-Install Dependencies

Alright, now that you’re logged into your Cloud Server, it’s time to get everything updated and ready for the next step. First thing’s first—let’s make sure your server’s package list is up to date. This is crucial to ensure you’ve got access to the latest software versions and security fixes. So, run this command to refresh everything:

$ sudo apt update

This command updates the list of available packages from the software repositories, so your Cloud Server knows about the latest versions and any important patches.

Now, we’re getting to the fun part. You need to install a few key tools and libraries that’ll help get Stable Diffusion up and running smoothly. These include wget (which you’ll use to download files), git (for version control), python3 (to run Python apps), and python3-venv (for managing Python virtual environments). These are all super important for ensuring everything runs smoothly and your image generation process is stable.

To install everything, run this command:

$ sudo apt install -y wget git python3 python3-venv

This will install the packages you need and their necessary components. The -y flag makes sure the installation process happens without you having to manually approve each step—so you can sit back and relax while it gets done.

Once this step is finished, your Cloud Server will be all set up and ready for the next phase in configuring Stable Diffusion. Time to move on!

For a step-by-step guide on installin

Step 3-Clone the Stable Diffusion Repository

Alright, now that we’re moving forward with setting up Stable Diffusion, the next step is to grab the official repository from GitHub. This repository has all the code and resources you’ll need to run Stable Diffusion using the WebUI by AUTOMATIC1111. By cloning it, you’re essentially downloading all the necessary files and configurations to your Cloud Server.

To do this, you’ll want to run this command in your terminal:

$ git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git

What this does is create a local copy of the repository right in your current directory. Once that’s done, you’ll need to jump into that directory to continue with the setup. Just run:

$ cd stable-diffusion-webui

Now, you’re inside the stable-diffusion-webui folder, where all the files for the Stable Diffusion WebUI are stored. From here, you can keep going with the setup, which includes configuring your environment and getting Stable Diffusion running.

Cloning the repository like this makes sure you’ve got the most recent version of the WebUI, plus any updates or bug fixes that come along. So, you’re all set up with the latest version, and ready to proceed!

For a complete guide on cloning repositories and getting started with Stable Diffusion, refer to this comprehensive tutorial: Cloning Repositories and Setting Up Stable Diffusion (2025).

Step 4-Configure and Run Stable Diffusion

Alright, now that you’ve cloned the Stable Diffusion repository, let’s dive into configuring the environment and getting that Stable Diffusion WebUI up and running. Here’s the thing, this involves setting up a Python virtual environment, installing some dependencies, and making sure your system is fully optimized for GPU acceleration to really boost performance.

Set Up a Python Virtual Environment

You’ll want to isolate the Python packages needed for Stable Diffusion to avoid any clashes with other projects or system-wide packages. So, let’s set up that virtual environment. It’s like creating a little “sandbox” for all the tools Stable Diffusion needs.

To do this, start by creating the virtual environment with:

python3 -m venv venv

Next, you’ll activate it:

source venv/bin/activate

Now you’re in the virtual environment, and any Python packages you install will stay within this little world, safe from your other projects.

Once that’s done, you can install the necessary dependencies by running this command:

pip install -r requirements.txt

This will grab all the packages you need to get Stable Diffusion up and running smoothly.

Rebuild xFormers with CUDA Support

Now, here’s where the GPU magic happens. To really take advantage of your Cloud Server’s GPU, we need to rebuild xFormers with CUDA support. CUDA makes everything run faster, especially on NVIDIA GPUs. So, let’s make sure xFormers is all set for GPU acceleration.

First, uninstall the current version of xFormers:

pip uninstall xformers

Then, install the version that’s optimized for CUDA support by running:

pip install xformers –extra-index-url https://download.pytorch.org/whl/nightly/cu118

This will get your system ready to take full advantage of that shiny GPU you’ve got, giving you faster performance with Stable Diffusion.

Optional: Monitor GPU Utilization with gpustat

If you want to keep an eye on how your GPU is performing while Stable Diffusion is running, there’s a handy tool called gpustat. It gives you real-time info on your GPU’s memory usage, temperature, and overall load. It’s pretty useful to make sure your GPU is doing what it’s supposed to do and to catch any potential performance hiccups.

Here’s how you can set it up:

First, install gpustat by running:

pip install gpustat

After that, you can start tracking your GPU by opening a new terminal window and running:

gpustat –color -i 1

This will show you all the important details about your GPU’s memory usage, temperature, and load, refreshing every second.

Monitoring the GPU like this ensures that Stable Diffusion is using your Cloud Server’s GPU to its fullest potential, speeding up the image generation process.

And with that, you’re all set up to run Stable Diffusion on your Cloud Server with full GPU support!

For further details on configuring and running Stable Diffusion with ease, check out this detailed guide: Complete Guide to Configuring and Running Stable Diffusion (2025).

Monitor GPU Utilization

When you’re running resource-heavy applications like Stable Diffusion, monitoring your GPU usage is super important. It helps ensure that your Cloud Server is working at full capacity and that things are running smoothly. One of the easiest and most effective tools for keeping an eye on your GPU is gpustat. It’s a simple Python-based command-line tool that gives you real-time updates about your GPU performance.

How to Install and Use gpustat

Let’s walk through getting gpustat set up so you can start tracking your GPU.

First things first, you need to install gpustat on your system. Open up your terminal and run the following command:

$ pip install gpustat

This will install the latest version of gpustat and all its necessary dependencies, so you’re all set to go.

Monitor GPU Utilization

Once gpustat is installed, you can start tracking your GPU’s performance in real-time. Just open a separate terminal window and run this command:

$ gpustat –color -i 1

The --color option will give you a colorized output, making everything much easier to read and understand. The -i 1 flag sets the update interval to 1 second, meaning you’ll get a fresh readout every second so you can closely monitor any changes in your GPU’s performance.

What You Can Monitor with gpustat

Now that you’ve got gpustat up and running, here’s what you can track:
- Memory Usage: This shows how much GPU memory is being used and how much is still available. It’s super important to check this, especially when you’re running something like Stable Diffusion, which can eat up a lot of memory during image generation.
- GPU Temperature: This tells you the current temperature of your GPU. It’s crucial to keep an eye on this to avoid overheating. If the GPU gets too hot, it might throttle performance or even get damaged, so monitoring the temp helps you avoid that.
- Current Load: This gives you a snapshot of how much work the GPU is doing. It shows how heavily the GPU is being used, which can help you tell whether it’s fully utilized or just hanging out, not doing much.
- Processes Using the GPU: You’ll also see a list of the processes using your GPU. This way, you can figure out which apps or tasks are demanding the most GPU resources.
By regularly checking your GPU with gpustat, you can make sure Stable Diffusion is running at its full potential, which means faster, more efficient image generation. So, keeping tabs on your GPU’s performance is a great way to make the most of your Cloud Server’s capabilities and speed things up!

To learn more about optimizing GPU usage for Stable Diffusion, check out this in-depth resource: Optimizing GPU Utilization for Better Image Generation (2025).

Optional: Installing a Model From a Direct Link

If you happen to have a direct download link for a model, installing it is super easy using the wget command. This method comes in handy, especially if you already have the URL for a specific model file, like the SDXL model, which is often used in Stable Diffusion to create high-quality images.

Steps to Download and Install the SDXL Model

Download the Model: First things first, fire up your terminal and run this command to grab the SDXL model from the direct link:

$ wget -O models/Stable-diffusion/stable-diffusion-xl.safetensors “https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors”

Here’s what’s happening in the command:
- The -O flag specifies the output file name and where to save the model.
- The URL in the quotes is the direct link to the SDXL model hosted on the official Hugging Face repository.
Save the Model in the Right Directory: Once the download starts, it will save the model in the models/Stable-diffusion/ directory of your current working folder. The model will be named stable-diffusion-xl.safetensors (just as we told it to in the command).

This is important because Stable Diffusion needs to know exactly where to look for the model, so this structure will make sure everything stays in place.

Use the Model in Your Setup: When the download finishes, your SDXL model will be ready to roll! Now you can continue with your Stable Diffusion setup and start generating images using this model. Just double-check that your environment is all set up to support it.

By following these simple steps, you can easily download and install any compatible models directly from a URL. It’s a real time-saver and makes it simple to integrate fresh models into your Stable Diffusion workflow whenever you want.

For a more detailed guide on installing models from direct links and man

Run the WebUI

Now that you’ve gone through all the setup steps, it’s time to get the Stable Diffusion WebUI up and running. This is the interface that lets you interact with the Stable Diffusion model and start generating some cool images!

To launch the WebUI, just type this command into your terminal:

./webui.sh –share –xformers –api –enable-insecure-extension-access

Here’s a breakdown of what these options do:
- –share: This option lets you share your WebUI interface over the internet using Gradio. It’s pretty handy if you want to access it from any device, or if you want to share it with friends or collaborators.
- –xformers: This activates xFormers, a library that helps with efficient GPU acceleration. It ensures your GPU is fully utilized for faster image generation, which is especially useful when you’re working with something complex like Stable Diffusion.
- –api: Enabling the API allows external apps to communicate with your WebUI. This is useful if you want to automate some tasks or connect it with other tools.
- –enable-insecure-extension-access: This flag lets you use extensions that might not be secure but are necessary for certain features. Just make sure you trust the extensions you’re enabling before using this one!
Once the WebUI starts, your terminal will print out a URL that looks something like this:

https://[HASHING].gradio.live

Go ahead and open your browser, and just pop that URL in to access your interface. Keep in mind, though, this link will only work for 72 hours, so if you need long-term access, you might want to set up a custom domain or find another way to keep it around longer.

For a detailed guide on running and optimizing the WebUI interface, check out this informative resource: Stable Diffusion WebUI Setup and Optimization (2025).

Installing a Model Using CivitAI Browser Extension

Once you’ve got the web-ui.sh script up and running, installing models becomes a piece of cake with the CivitAI Browser extension. Here’s how you can easily integrate the extension and start installing models straight from CivitAI:

Navigate to the “Extensions” Tab in the WebUI

Now that your WebUI is good to go, find and click on the “Extensions” tab in the interface to manage the extensions.

Go to the “Available” Sub-tab

Inside the Extensions tab, switch to the “Available” sub-tab. This is where you’ll find a list of all the extensions you can install.

Load Available Extensions

Hit the orange Load from button to grab and display the available extensions from the repository. This ensures everything is up to date and ready for installation.

Search for the CivitAI Browser+ Extension

In the search bar, type CivitAI Browser+ and press enter. Once it pops up in the list, click on the Install button to begin the installation.

Activate the Extension

After installation, go to the “Installed” sub-tab within the Extensions section. Here, click the Apply button and restart the WebUI to get the extension working. This step activates the new functionality, allowing you to use CivitAI Browser+.

Restart the WebUI

When you click the restart button, you might see the message “Reloading” on your console for a moment. Don’t worry, that’s just the WebUI doing its thing. Patience is key here!

Access the New CivitAI Browser+ Tab

After the restart, you’ll see a shiny new tab called “CivitAI Browser+” in the interface. This tab is dedicated to helping you search for and install models directly from CivitAI, making it super easy to expand your Stable Diffusion setup.

Install a Model from CivitAI

For this demo, search for “Western Animation” within the CivitAI Browser+ tab, and pick a model that suits your project. In our case, select the one with the Superman thumbnail. Then, just click to install the model.

By following these steps, you’ll be able to quickly integrate new models into your setup and boost your capabilities with Stable Diffusion. The CivitAI Browser+ extension really makes it a breeze to find, search for, and install models directly from the WebUI.

For a detailed guide on installing models using the CivitAI Browser Extension, check out this helpful resource: CivitAI Browser Extension Setup for Easy Model Installation (2025).

Installing a Model Using CivitAI Browser Extension

After running the $ web-ui.sh script successfully, you can easily install a model by using the CivitAI Browser extension. Follow these detailed steps to integrate the extension into your WebUI interface and begin using it to search for and install models from CivitAI:

Navigate to the “Extensions” Tab in the WebUI

First, open the WebUI interface and locate the “Extensions” tab in the navigation menu. This section is where you can manage all available extensions within the WebUI.

Go to the “Available” Sub-tab

Within the Extensions tab, navigate to the “Available” sub-tab. This section displays all extensions that are ready for installation. These extensions can be loaded from a repository, including the CivitAI Browser+ extension.

Load Available Extensions

In this sub-tab, you’ll see an orange button labeled “Load from.” Click this button to load and display the available extensions from the repository. This ensures that the WebUI is up to date with the most current list of extensions.

Search for the CivitAI Browser+ Extension

In the search bar that appears, type “CivitAI Browser+” to find the extension. Once you see it in the list of available extensions, click the “Install” button next to it. This initiates the installation of the extension to your WebUI.

Activate the Extension

Once the installation process is complete, go to the “Installed” sub-tab within the Extensions section. You should see the newly installed CivitAI Browser+ extension listed there. Click the “Apply” button to activate it, and then restart the WebUI to fully integrate the extension into the interface.

Restart the WebUI

After applying the extension, your console will display a “Reloading” message as the WebUI restarts. This is a normal process that ensures the extension is properly loaded and functioning. Once the restart is complete, you will be provided with a new URL link (e.g., https://[HASHING].gradio.live) that will be used to access your WebUI. This link will remain active for 72 hours.

Access the CivitAI Browser+ Tab

After the WebUI restarts successfully, you will now see a new tab labeled “CivitAI Browser+” in the interface. This tab is dedicated to searching for and installing models directly from the CivitAI platform, allowing for seamless model integration into your workflow.

Search and Install a Model from CivitAI

In the CivitAI Browser+ tab, type “Western Animation” into the search field to find models related to this category. Once you locate the model you wish to use, look for the one with the Superman thumbnail for this particular demo. Click to install the selected model.

By following these detailed steps, you can easily expand your model library and enhance your Stable Diffusion setup using the CivitAI Browser+ extension. This extension allows for smooth integration and efficient model installation, ensuring that you can quickly access the resources you need for image generation tasks.

To dive deeper into effective AI art generation and prompt writing techniques, check out this guide on Your First Gen-AI Art: Stable Diffusion Prompt Writing Tutorial (2025).

How to Write Prompts

Prompts play a crucial role in the image generation process. They guide the AI by specifying the desired outcome. Positive prompts provide instructions on what to include in the image, while negative prompts help eliminate undesired elements. Both types of prompts are essential for refining the output and achieving high-quality results.

Writing Positive Prompts

Positive prompts are key in guiding the AI to generate the exact image you envision. These prompts use descriptive language, where you can either provide simple sentences or comma-separated keywords to convey the features you want the AI to focus on. The more specific and clear your prompt, the more likely you are to get accurate results.

For example, if you want the AI to generate an image of a sea turtle swimming over a coral reef, you could write the following prompt:

Full prompt: “a sea turtle swimming over a coral reef”

Or, you can simplify it into keywords that describe the main features of the image:

Keywords: “sea turtle, swimming, coral reef, ocean”

Similarly, if you want an image of a school of colorful fish swimming in the ocean, you can provide a prompt like:

Full prompt: “a school of colorful fish swimming in the ocean”

Keywords: “colorful fish, swimming in the ocean, school of fish, tropical fish”

These prompts help the AI understand the key elements of your image, such as the subject (sea turtle, fish), the environment (coral reef, ocean), and the specific details (colorful, swimming).

Using Negative Prompts

Negative prompts are just as important as positive ones because they help to filter out unwanted elements from the generated image. By specifying what you do not want to see, negative prompts allow you to avoid issues such as low-quality images, incorrect anatomy, or irrelevant elements. Negative prompts are particularly useful when generating multiple images or when you want to exclude specific objects or attributes.

Common negative prompts to help refine your image output include terms that avoid poor-quality results, such as:
- Low quality: “lowres, blurry, bad anatomy, text, error, cropped, worst quality, jpeg artifacts, watermark, signature”
For instance, if you want to generate marine life images without any artifacts or blurriness, you could add the following negative prompts:

Negative prompts: “lowres, blurry, bad anatomy, text, error”

You can also exclude specific objects or people that might be irrelevant to your marine life scene. For example, you might not want human figures or buildings appearing in the image:

Excluding elements: “nsfw, weapon, blood, human, car, city, building”

By carefully selecting both positive and negative prompts, you can significantly improve the quality of your generated images, ensuring that they align with your vision while filtering out unnecessary distractions.

To explore more about prompt wri

How to Use txt2image in Stable Diffusion

Stable Diffusion WebUI’s txt2image feature is a powerful tool that lets you generate images just by describing what you want to see. It’s like having a supercharged art assistant! By using both positive and negative prompts, you can guide the AI to create high-quality, detailed images exactly how you envision them. Here’s how you can make the most of this feature:

Enter Positive and Negative Prompts

First things first: let’s get those prompts in. In the left text box of the WebUI, you’ll enter positive prompts to describe the image you want the AI to create. For example, if you want to generate an image of marine life, you could use a prompt like:

Positive prompt example: "colorful fish, coral reef, underwater, ocean, vibrant colors"

This tells the AI exactly what to include in the image. On the flip side, negative prompts are super important, too. They help you exclude things you don’t want to see in your image. For instance, if you don’t want the image to have any blurry details or strange anatomy, you can add these as negative prompts:

Negative prompt example: "lowres, bad anatomy, text, blurry, weapon, human"

By using both positive and negative prompts, you can guide the AI to generate the best possible image, free of unwanted distractions.

Select Sampling Method

Next, we need to select a sampling method. Think of sampling methods like different styles of art—some give a more detailed or clearer image than others. For the best results, try using methods like:

Sampling method examples: "DPM++ 2M SDE Heun" or "Euler a"

These methods work really well for creating sharp, rich images. You can always experiment with others, too, to see which one gives you the best results for your needs!

Set Image Dimensions and Steps

Once you’ve chosen the sampling method, it’s time to set your image dimensions and the number of sampling steps. The dimensions determine the resolution (or size) of the image, and the sampling steps control how detailed the image will be. For example, setting the width and height to 1024×512 is a good starting point. It gives you a resolution of 1024 pixels wide by 512 pixels tall, which works for most image generation tasks. Recommended settings:
- Width and height: 1024x512
- Sampling steps: 30
You can also check the “Hires. fix” option to make the details pop even more, especially when you’re generating things like marine life.

Generate the Image

After you’ve got all your settings dialed in, hit the “Generate” button at the top right of the WebUI. The AI will start working its magic based on your prompts. When it’s done, you can save your image or tweak it a bit if needed.

Common Syntax and Extensions

Stable Diffusion WebUI supports different syntaxes and extensions that can really fine-tune how the AI generates images. Here are some useful ways to get even more precise:

Attention/Emphasis

Want to emphasize something specific in your prompt? You can do that by using parentheses. For example, if you want the AI to focus on the color of a dolphin, you could write:

Example: "dolphin, ((blue)), ocean, swimming"

By putting “blue” in double parentheses, you’re telling the AI to pay extra attention to that detail.

Prompt Switching

This is super handy if you want to switch between prompts while the AI is working. For example, you could generate an image that’s 10% more likely to feature a whale than a shark. You’d write something like:

Example: "[shark : whale : 10] swimming in the ocean"

This syntax lets you play around and adjust your image dynamically.

Example Prompts

Now, let’s see how all this works with some example prompts related to marine life:
- Generate an octopus underwater:
  - Positive prompt: "octopus, underwater, ocean, coral reef, vibrant colors"
  - Negative prompt: "lowres, blurry, bad anatomy, text, human"
- Generate a dolphin jumping out of the water:
  - Positive prompt: "dolphin, jumping out of the water, ocean, sunset, splash, realistic"
  - Negative prompt: "lowres, bad anatomy, blurry, text, car, building"
- Generate a shark swimming in deep water:
  - Positive prompt: "shark, swimming, deep ocean, dark blue water, scary, realistic"
  - Negative prompt: "lowres, bad anatomy, blurry, text, human, building"
By carefully combining positive and negative prompts, you can make the AI create exactly what you’re looking for, down to the smallest details.

In the end, by playing around with different prompts, sampling methods, and settings, you can unlock the full potential of Stable Diffusion and create some truly amazing images. Happy generating!

For a more in-depth guide on using text-based image generation techniques, check out this comprehensive article on How to Use txt2image in Stable Diffusion (2025).

Common Syntax and Extensions

Stable Diffusion WebUI has a bunch of cool tools and options that can seriously step up your image generation game. These features let you tweak your prompts in just the right way, so you can get the exact images you want, with the details you care about. Let’s dive into some of the most helpful syntaxes and techniques for improving your results:

Attention/Emphasis

A nifty feature to refine your image output is using parentheses to emphasize certain parts of your prompt. When you put words in double parentheses, you’re telling the AI to give extra attention to those specific elements. This is great when you want something to stand out in the final image.

For example, if you want to make sure a dolphin really pops out in blue, you could write:

“dolphin, ((blue)), ocean, swimming”

Here, the AI will focus more on the color blue for the dolphin, making sure that detail comes through strong in the image.

Prompt Switching

Another awesome feature in Stable Diffusion WebUI is prompt switching. It lets you switch between different prompts during the image generation process. This is useful when you want to mix things up or experiment with how different elements can come together in your image. By adjusting the probability of certain elements, you can steer the AI to create images with more of one thing and less of another.

For instance, you can use this syntax to make a whale 10 times more likely to appear than a shark:

“[shark : whale : 10] swimming in the ocean”

This way, the AI will prioritize generating a whale, but you’ll still have the option of a shark in there, giving you a nicely balanced result.

Example Prompts

Now, let’s look at how you can craft your prompts to get the best results for things like marine life. Here’s how to combine positive and negative prompts effectively:

Generate an octopus underwater:

Positive prompt: “octopus, underwater, ocean, coral reef, vibrant colors”

Negative prompt: “lowres, blurry, bad anatomy, text, human”

Generate a dolphin jumping out of the water:

Positive prompt: “dolphin, jumping out of the water, ocean, sunset, splash, realistic”

Negative prompt: “lowres, bad anatomy, blurry, text, car, building”

Generate a shark swimming in deep water:

Positive prompt: “shark, swimming, deep ocean, dark blue water, scary, realistic”

Negative prompt: “lowres, bad anatomy, blurry, text, human, building”

In these examples, you can see how positive prompts guide the AI to include specific features, like the vibrant colors of the octopus or the realistic splash of a jumping dolphin. On the other hand, negative prompts help you keep things clean by filtering out undesirable elements—like making sure the dolphin doesn’t come out blurry or with the wrong anatomy.

By mixing and matching both positive and negative prompts, you can fine-tune your results and make sure your images turn out just how you want them.

In Conclusion

With these syntax tools and extensions, you can really dial in Stable Diffusion WebUI to match your creative vision. Whether it’s emphasizing a certain feature or switching between prompts to explore different options, these tips will help you get the most out of your image generation process. Enjoy experimenting!

For more detailed information on optimizing your prompts and image generation techniques, refer to this useful guide on Common Syntax and Extensions in Stable Diffusion (2025).

Conclusion

In conclusion, setting up Stable Diffusion on a GPU Droplet with the WebUI by AUTOMATIC1111 allows you to harness powerful AI image generation capabilities with ease. By following the steps outlined, you can efficiently manage GPU resources, install necessary dependencies, and start generating high-quality images using positive and negative prompts. Whether you’re creating detailed images or experimenting with different models, this setup will ensure optimal performance. As AI and image generation tools continue to evolve, leveraging platforms like DigitalOcean’s GPU Droplet for Stable Diffusion will become increasingly valuable for developers and creators looking to enhance their workflows.

Optimize PyTorch GPU Performance with CUDA and cuDNN
October 20, 2025
Train LoRA Model with Stable Diffusion XL: Fast Setup & Guide
Introduction

Training a LoRA model with Stable Diffusion XL (SDXL) has become a popular approach for text-to-image synthesis. Whether you’re looking to create highly detailed images or generate unique styles, using LoRA with SDXL can dramatically enhance your results. This guide walks you through the setup and necessary steps to train your own LoRA model, using tools like the Fast Stable Diffusion project, AUTOMATIC1111 Web UI, and ComfyUI. We’ll dive into everything from selecting the right hardware to fine-tuning models with images and captions, ensuring you’re ready to harness the power of LoRA and SDXL for your creative or technical needs.

What is LoRA model training with Stable Diffusion XL?

This solution helps users train customized models using images and captions to improve text-to-image generation. By using a method called LoRA (Low-Rank Adaptation), users can fine-tune a large pre-trained model for specific subjects or styles. The trained models can then be used for generating new images based on given prompts, making the process more versatile and cost-effective. It simplifies the task of creating specialized image generation models without needing extensive computational resources.

Prerequisites

Hardware Requirements: To train the model properly, you’ll need a compatible GPU with enough Video RAM (VRAM). I’d recommend at least 16 GB of VRAM to make sure everything runs smoothly, especially when you’re working with big datasets or complex models. You’ll also need at least 32 GB of system RAM, or more if possible. That’ll help with the heavy lifting during training and prevent your system from crashing or slowing down. This combination of GPU and RAM will keep your training process flowing without any memory hiccups.

Software Requirements: You’re going to need Python, and specifically version 3.7 or higher, to get things going. This version of Python works with the deep learning tools and libraries you’ll use during training. You’ll also need some essential libraries like PyTorch and Transformers. These help run the neural networks and utilize pre-trained models for fine-tuning. On top of that, the LoRA (Low-Rank Adaptation) library, like PEFT, is key for implementing low-rank adaptation in Stable Diffusion models. It helps make the training process more efficient and adaptable to different models and tasks.

Data Preparation: Before you dive into training, you’ll need a well-prepared dataset that fits your specific diffusion task. The dataset should be formatted properly to ensure everything runs smoothly and without errors. You’ll also need some data preprocessing tools to clean and organize everything. This involves removing irrelevant or messy data and making sure your images and captions are lined up properly. Data preparation is super important for getting high-quality results, so don’t skip this step!

Familiarity: You don’t need to be a deep learning expert, but a basic understanding of how deep learning and model training works will definitely help you out. It’ll help you get the hang of the model, understand the optimization process, and troubleshoot any issues that come up during training. Also, you’ll need some experience with Python and command-line interfaces since you’ll be running commands and managing libraries through the command line. If you’re familiar with handling Python environments, libraries, and scripts, you’ll be able to breeze through the setup and training steps.

Read more about system requirements for deep learning models in the TensorFlow installation guide.

Low-Rank Adaptation (LoRA) Models

So, here’s the deal with LoRA—it stands for Low-Rank Adaptation. It’s a neat little trick that lets you make big pre-trained models, like Stable Diffusion, work even better without needing to completely retrain them. Think of it like adding a turbo boost to your car—you’re not replacing the whole engine, just tweaking some parts to make it faster and more efficient. With LoRA, you’re able to append smaller models to the main one, so it gets the job done without the hefty computational cost. That’s a win for everyone, right?

In the world of Stable Diffusion, LoRA helps the model become a pro at new tasks, like learning how to generate a specific character or nailing a unique artistic style. You get all the benefits of the main model while making it better at producing more specific results. And the best part? LoRA only changes a small portion of the model’s parameters, which makes it way more cost-effective than traditional fine-tuning methods that require heavy lifting.

Once you’ve trained a LoRA model with your preferred subject or style, you can easily share it with others. It’s pretty cool because it means you can integrate these fine-tuned models into your own projects without having to start from scratch. This opens up a whole world of possibilities for making your models do more interesting and specific things, all while using less computing power.

With Stable Diffusion, using LoRA models is a game-changer. They allow you to create affordable models that capture exact subjects or even unique styles. After fine-tuning with LoRA, you can combine them with the full Stable Diffusion model, which enhances its ability to generate spot-on, context-aware images. So, in short, combining LoRA with Stable Diffusion means you get to push the limits of generative workflows and create some seriously detailed images.

Read more about LoRA models and their applications in machine learning on the Towards Data Science blog.

Fast Stable Diffusion

So here’s the scoop on the Fast Stable Diffusion project. Created and led by a GitHub user called TheLastBen, this project is one of the quickest and most efficient ways to access and use Stable Diffusion models. It’s built to make the whole process easier and faster, whether you’re a newbie or a seasoned pro. Basically, it simplifies working with complex AI models, so you don’t have to be a technical genius to make the most of Stable Diffusion.

One of the coolest things about Fast Stable Diffusion is how it maximizes your hardware. It optimizes the user interface and makes the image generation process smoother and quicker, meaning you get results faster without losing any quality. This is super helpful if you need to crank out a ton of images in a short amount of time, or if your computer’s not exactly a powerhouse.

Now, Fast Stable Diffusion works with two really popular user interfaces: the AUTOMATIC1111 Web UI and ComfyUI. Both are designed to be user-friendly, but they also pack a punch when it comes to more advanced features like fine-tuning models or generating images. Whether you prefer the simplicity of the AUTOMATIC1111 Web UI or the customization options in ComfyUI, Fast Stable Diffusion makes sure both are optimized for the best performance.

All in all, Fast Stable Diffusion is a great way to dive into AI-generated images without the headaches. It’s an efficient, user-friendly solution that lets you explore, optimize, and get the most out of your hardware, no matter which interface you choose.

Learn more about optimizing Stable Diffusion models and workflows in this comprehensive guide on Analytics Vidhya.

Demo

So, in the earlier stages of this process, we had to build a custom Gradio interface just to interact with the model. But now, thanks to the awesome contributions from the development community, things have gotten a whole lot easier with some really great tools and interfaces for Stable Diffusion. Now, it’s much simpler to work with Stable Diffusion XL.

In this demo, I’ll guide you through setting up Stable Diffusion using a Jupyter Notebook. If you haven’t used Jupyter before, it’s basically a super handy way to run Python code interactively—kind of like working on a project in a notebook, but way cooler. The setup has been automated in an Ipython notebook created by TheLastBen, which makes everything a breeze. The model itself will be downloaded straight to the cache during setup, and here’s the thing: this cache won’t count toward your storage limit, so you don’t have to stress about running out of space when downloading the model.

Once everything is set up, we’ll jump into some best practices for selecting and preparing images for your specific subject or style. Picking the right images is super important because it impacts the diversity and quality of the results you’ll get. I’ll walk you through how to choose images that vary in settings, angles, and lighting—basically, making sure the training data is well-rounded to give you the best results.

Next, we’ll go over how to add captions for the training data. Captions are key to helping the model understand and generate images based on certain characteristics. I’ll show you how to label each image properly, which will help the model understand what it’s looking at, leading to more accurate outputs.

Finally, we’ll wrap up this demo by showing off some sample images generated with a LoRA (Low-Rank Adaptation) model that I trained using my own face. This will give you a firsthand look at how LoRA models can capture specific subjects and styles, and how customizable and tailored the results can be. You’ll see how powerful and flexible these models are!

To explore more about working with AI models and their setup, check out this detailed guide on TensorFlow’s tutorial on generative models.

Setup

Once your Notebook is up and running, the first thing you’ll need to do is run the first two code cells. These cells are important because they’ll install the necessary package dependencies and download the SD XL Base model. This model is crucial for everything to work smoothly in the project.

Install the dependencies

force_reinstall = False # Set to True only if you want to reinstall the dependencies
#——————–
with open(‘/dev/null’, ‘w’) as devnull: import requests, os, time, importlib
open(‘/notebooks/sdxllorapps.py’, ‘wb’).write(requests.get(‘https://huggingface.co/datasets/TheLastBen/PPS/raw/main/Scripts/sdxllorapps.py’).content)
os.chdir(‘/notebooks’)
import sdxllorapps
importlib.reload(sdxllorapps)
from sdxllorapps import *
Deps(force_reinstall)

This first cell takes care of installing all the dependencies needed for the project to run. You’ll also notice that a folder called “Latest_Notebooks” is created. That folder is actually pretty important because it gives you access to the most current versions of the notebooks from the PPS repository. So, you’ll always be working with the freshest tools and scripts.

After the dependencies are all set up, the next cell will download the model checkpoints from HuggingFace. These checkpoints are essential for the upcoming model training part.

Run the cell to download the model

#————-
MODEL_NAMExl = dls_xl(“”, “”, “”)

Once this cell is finished and the model has been downloaded, you’ll be all set to dive into the next steps. That’s when you’ll start preparing your images, captions, and eventually jump into training the model. This is where things get interesting, as it sets the stage for efficiently training the SD XL model with your own data.

For a comprehensive guide on setting up and configuring machine learning models, check out the TensorFlow setup and tutorial page.

Image Selection and Captioning

Selecting the images for training a LoRA (Low-Rank Adaptation) model, or even for Textual Inversion embedding, is a crucial step in the entire process. The quality and variety of images selected will have a profound impact on the final outputs that the model generates. Specifically, the images chosen will determine the model’s ability to learn and adapt to the desired subject or style, and this must be done with great care. To put it simply, the images you use for training will directly affect how well the model performs in generating realistic, accurate images.

When training a working LoRA model, it is essential to select images that clearly contain the subject or style you want to train the model on. These images should showcase the subject from different angles, in varying settings, and under diverse lighting conditions. This diversity in images will help introduce the flexibility required for the model, enabling it to produce results with a wide range of versatility. In short, the more varied and dynamic your dataset is, the better the model will perform.

In this tutorial, we are going to demonstrate how to train a Stable Diffusion XL (SD XL) LoRA using images of the author’s own face. The same principles we apply to facial images can easily be transferred to other types of subjects or styles, so don’t be concerned if your goal is to train the model for a specific artistic style instead of a face.

To make sure you choose the right images, here is a quick checklist of characteristics we look for when preparing a dataset for a Stable Diffusion LoRA model:
- Single subject or style: For optimal results, it’s best to focus on a single subject or style in your training images. If you use images with multiple entities in them, the model may become confused, which can complicate the learning process. Aim for consistency by focusing on one subject at a time, but featuring it in various poses, clothing, and settings.
- Different angles: A crucial aspect of the training dataset is ensuring the subject appears in different angles. This diversity prevents the model from overtraining on a single perspective, which can negatively impact the model’s flexibility. The goal is to ensure that the model learns to understand the subject in multiple orientations, enhancing its overall performance.
- Settings: The background and environment of your images matter too. If all the images are taken in the same setting, such as a consistent background or similar clothing, the model might overfit to those details, affecting its generalization abilities. If possible, use images taken in different environments, but make sure that the core subject is clearly visible and identifiable. If you prefer, using a neutral, blank background can also work well for training purposes.
- Lighting: While lighting is slightly less important compared to angles and settings, it can still influence the model’s output. Using a range of lighting conditions will allow the model to generate better images that are adaptable to various lighting environments. Be sure to capture the subject in different lighting situations, whether it’s natural light, artificial light, or dramatic shadows.
For this tutorial, we’ll start by taking a set of simple selfies against a blank wall. Let’s use five images for the sake of example. These images should showcase the subject’s face at varying angles to ensure the model gets a comprehensive understanding of the subject’s features. In this case, the goal is to have the subject face the camera from slightly different positions, capturing different sides and perspectives. A smaller dataset like this will provide enough variation without overwhelming the model during training.

Note: The images selected for training must be clear and well-lit to ensure accurate results.

Remove_existing_instance_images = True # Set to False to keep the existing instance images if any
IMAGES_FOLDER_OPTIONAL = “” # If you prefer to specify directly the folder of the pictures instead of uploading, this will add the pictures to the existing (if any) instance images. Leave empty to upload.
Smart_crop_images = True # Automatically crop your input images
Crop_size = 1024 # 1024 is the native resolution

Check out this example for naming: https://i.imgur.com/d2lD3rz.jpeg

Here is a code snippet that configures the settings for uploading the images. The Remove_existing_instance_images variable ensures that you either replace or keep any previously uploaded images. Smart_crop_images enables automatic cropping of the images to the correct aspect ratio, and Crop_size specifies the resolution of the cropped images, which is set to 1024 to maintain high-quality input. This code prepares the images for the next steps in the process.

Once the images are ready, we need to label them with descriptive captions that will aid the training process. This captioning step is essential as it provides context for the model, telling it exactly what it’s seeing in each image. The more descriptive and specific the captions are, the better the model will perform during training.

The next cell will allow us to manually add captions to each image. We recommend being as descriptive as possible for each caption to improve the efficacy of the training process.

The following code allows us to manually label each image with its corresponding caption. It’s essential to include as much detail as possible for each caption to provide rich context for the model. If you have a large dataset and find the manual captioning process too time-consuming, there are alternative methods. One option is to use the Stable Diffusion Web UI’s Training tab, which can automatically generate captions for each image based on its content. You can then load these captions from a text file, simplifying the process.

Once all the images are uploaded and correctly captioned, you can proceed to the next steps of training the model, using the prepared dataset.

For additional insights on selecting and preparing images for model training, check out this detailed guide on image preparation for AI model training.

Training the LoRA Model

In the process of training a LoRA (Low-Rank Adaptation) model, we are able to fine-tune the model using a variety of settings and configurations. The following configuration script allows us to modify key parameters that control how the model trains, making it adaptable to different needs or hardware capabilities.

Here is an example of the code used to configure and run the LoRA training process:

Resume_Training = False # If you’re not satisfied with the result, set to True and run again. This will resume training from where it left off.
Training_Epochs = 50 # Epoch = Number of steps or images to process.
Learning_Rate = “3e-6” # Keep the learning rate between 1e-6 and 6e-6 for optimal results.
External_Captions = False # If True, load captions from a text file for each image instance.
LoRA_Dim = 128 # The dimension of the LoRA model, typically set between 64 and 128 for balanced results.
Resolution = 1024 # Use 1024 as the native resolution for optimal image quality.
Save_VRAM = False # Set to True if you need to save VRAM, though this may slow down the training process.

This code snippet represents the configuration used for initiating and training the LoRA model. The key parameters defined here include:
- Resume_Training: This variable determines whether to continue training from where the last session ended. If you are not satisfied with the model’s results, you can set this to True and re-run the training process. This is especially useful when refining a model.
- Training_Epochs: This refers to the number of times the model will process the images. It determines how many steps the model will take to look at each image and learn from it. Setting it to 50 means the model will go through the dataset 50 times.
- Learning_Rate: This value controls how fast the model learns. A learning rate of 3e-6 is optimal for most cases, but this can be adjusted. Too high a learning rate can lead to unstable training, while too low can make the process too slow or hinder the model from learning effectively.
- External_Captions: When set to True, this option allows you to load captions from an external text file. If you have a large dataset and don’t want to manually label each image, this can save a lot of time.
- LoRA_Dim: The dimension of the LoRA model itself. A higher value means the model has more capacity to learn complex patterns but may require more resources. Typically, values between 64 and 128 are recommended, with 128 being a good balance for most cases.
- Resolution: The resolution at which the images will be processed. Higher resolutions, like 1024, result in more detailed images but require more computational power.
- Save_VRAM: This is a resource-saving option, set to False for the standard setup. If you set it to True, the model will try to use less VRAM, which may make the training process slower but helpful for machines with limited GPU memory.
Once you configure these parameters to suit your needs, you can run the training by executing the final command in the code. This will initiate the training process, where the model will start learning from the provided images and captions.

The training progress will be automatically saved, and the model’s state will be stored in the appropriate directories. After the training is completed, the model checkpoint will be saved and can be used with either the ComfyUI or the Stable Diffusion Web UI. These user interfaces allow for easy testing and refinement of the model, enabling you to fine-tune your results further and test different prompts and settings.

By following this process, you’ll have a trained LoRA model ready for generating images based on the style or subject you’ve trained it on. Whether you’re working with a specific subject like a face or a stylistic model, this setup provides flexibility and efficiency in model development.

For a comprehensive understanding of training techniques and model optimization, refer to this detailed resource on optimizing deep learning models.

Running the LoRA Model with Stable Diffusion XL

Once the training process is complete, you can start testing and running your LoRA model using either the ComfyUI or the Stable Diffusion Web UI. Both interfaces make it super easy to test your newly trained model and make any adjustments you might need to improve its performance.

The first thing you’ll need to do is set up the environment to run your LoRA model. Here’s an example of the initial configuration you’ll need to get going:

User = ""
Password = ""
# Add credentials to your Gradio interface (optional).
Download_SDXL_Model = True
#—————–
configf = test(MDLPTH, User, Password, Download_SDXL_Model)
!python /notebooks/sd/stable-diffusion-webui/webui.py $configf

In this setup:
- User and Password are optional parameters for adding credentials to your Gradio interface. If you need them, you can enter them here to secure access to the interface.
- Download_SDXL_Model is set to True to automatically download the Stable Diffusion XL model. This is an essential step before running the Web UI.
Next, for this demo, we’re using the AUTOMATIC1111 Web UI. To get started, scroll down to the second-to-last code cell and run it. This will automatically set up the Web UI and give you a shareable link. You can open this link in any web browser to access the interface.

Once the Web UI is up, look for a small red and black symbol with a yellow circle under the “Generate” button. When you click on that icon, it’ll open the LoRA dropdown menu. From there, you can select the LoRA tab and choose the LoRA model you just trained. If you haven’t changed the session name, you’ll see your model listed as “Example-Session.”

Now comes the fun part—testing the model! Just type a prompt and add your LoRA model at the end. Here’s an example of a prompt you can use to test your model:

"a wizard with a colorful robe and staff, a red-haired man with freckles dressed up as Merlin lora:Example-Session:.6"

As you can see from the generated image, the model does a great job of keeping the core characteristics of the original subject (in this case, someone who looks like Merlin). The model successfully applies the style and traits it learned during training to produce images that match your specifications.

You can play around with different prompts, training subjects, and settings to see what works best. The great thing about the LoRA model is its flexibility, which lets you refine the results and explore various possibilities to generate high-quality images.

For more insights on optimizing AI model interfaces and testing setups, check out this guide on model interfaces and testing strategies.

Conclusion

In conclusion, training a LoRA model with Stable Diffusion XL (SDXL) offers a powerful way to create customized text-to-image models. With the Fast Stable Diffusion project, setting up the necessary environment and fine-tuning models has never been easier. By leveraging tools like the AUTOMATIC1111 Web UI and ComfyUI, users can effectively manage their LoRA models and generate high-quality, contextually relevant images. As AI and text-to-image synthesis continue to evolve, the integration of LoRA with SDXL is poised to become even more essential for creators and developers. Whether you’re optimizing for specific styles or training unique subjects, the future of image generation is full of possibilities.

Train LoRA Models with Stable Diffusion XL: Optimize with AUTOMATIC1111 and ComfyUI
October 20, 2025