Skip to content

Stable Diffusion ComfyUI Note 02 - Build The Basic Workflow

Last Updated on 2024-08-12 by Clay

Introduction

Previously, we finished the configuration of ComfyUI, now we can try to build a basic and simplest workflow. The workflow is the most different point with stable-diffusion-webui. ComfyUI uses a card-based process that makes it easier to understand how the Stable Diffusion model actually performs inference and also makes it easier to customize and achieve more advanced effects.

In short, when you open the interface, you should see the basic workflow configured by the system.

Default Workflow

However, at first glance, it does look quite complicated, so let's press the Clear button and start over step by step!

This is the workspace that appears on the right side of the interface. Press Clear to clear the screen.

Building a Basic Workflow

1. Create an Empty Latent Image Card

First, double-click anywhere on the interface, and a Search box will pop up. Then search for empty and bring up the Empty Latent Image card.

The Empty Latent Image is actually a bunch of Gaussian-distributed noise images, which is the raw input for Stable Diffusion. The Stable Diffusion model will generate clear images from noise through a series of reverse diffusion steps, gradually denoising.

Here, we can set the model's width and height and the batch size of images to be generated.


2. Create a Load Checkpoint Card

This card is most important for selecting the Stable Diffusion model we want to use. If you have installed ComfyUI, it should come with a basic v1-5-pruned-emaonly.safetensors model by default. Let's use it for now! Later, I will write an article summarizing the resources available for Stable Diffusion on the internet.


3. Create Prompt Cards

Similarly, continue by creating the CLIP Text Encode card. Here, we need to create two: one for the positive prompt to tell the model what we want to generate, and one for the negative prompt to tell the model what we don't want to generate.

Right-click on the card, and you will see a menu with many editing options. Here, I usually rename and color the two cards differently to distinguish which one is the positive prompt and which one is the negative prompt.

Finally, don't forget to connect the yellow CLIP dot from the Load Checkpoint card to both the positive and negative prompt card inputs, so the model can parse these prompts.

By the way, I set the positive prompt to a dog, run in the forest, day, sun, blue sky, and the negative prompt to cat, hoping that only dogs will appear.


4. Create a KSampler Card

This step is the most complex in this note. First, create the KSampler card, and you will see it accepts four inputs:

  • model
  • positive
  • negative
  • latent_image

However, it's not as difficult as it seems: positive and negative are obviously connected to the two different prompts, model is connected to the MODEL output from Load Checkpoint, and latent_image is connected to the LATENT output from the initial noise image.

Now we can start observing some rules to help us avoid connecting the workflow incorrectly:

  1. The uppercase letters on the right side of the card usually represent 'output,' while the corresponding lowercase letters on the left side represent the required 'input'
  2. The corresponding small circles, input and output, are the same color

In addition, the KSampler has some seemingly difficult parameters. Below, I list the meanings of the parameters, which can be experimentally adjusted for their effects:

  • seed: Used for random number generation, setting a fixed seed ensures the generated images are reproducible
  • control_after_generate: Specifies how the seed changes after generating the image. The default is random, but I fixed it, so set it to fixed
  • steps: The number of denoising steps in the diffusion process. More steps usually produce higher quality images but also increase computation time. Sometimes going too far can generate unexpected results. I recommend setting it to 20 steps for testing, confirming the generated results are in the desired direction, and then gradually increasing the steps
  • cfg: Short for Classifier Free Guidance, the value balances the model's adherence to the positive prompt and disregard for the negative prompt. Generally, the higher the value, the more strictly the model follows the prompt, but setting it too high can lead to strange decoding. A common range is 6.0 to 9.0
  • sampler_name: Specifies the sampler used. The sampler determines how the diffusion process is conducted in latent space. For now, use euler. This topic could fill several articles if examined in detail
  • scheduler: The scheduler controls the arrangement of steps in the diffusion process. Set it to normal for now; this is another big topic
  • denoise: A parameter for the denoising intensity. The value range is usually 0 to 1, determining the degree of denoising at each step. 1.00 means complete denoising


5. Create a VAE Decode Card

This is actually the last step in the inference process. Create the VAE Decode card, which decodes the final result into an image. Its input samples are the LATENT output from the KSampler, and the other input vae is connected to the VAE output from the Load Checkpoint card.


6. Create a Save Image Card

Now create the Save Image card and connect it to the output of the VAE Decode. Press the Queue Prompt button on the right work panel, and the Stable Diffusion framework will start generating images.

If there are any errors, ComfyUI will pop up and highlight the issues, allowing you to debug quickly. Finally, we have completed the workflow. Below is our entire workflow.


Save Your Workflow

Now there's another issue — if we have to build the workflow from scratch every time we start inference, isn't that too exhausting? Moreover, when we start playing with more advanced techniques, the complexity of building the workflow will be more than ten times as much?

So, yes, we should save the workflows we build. By clicking the Save button on the right work panel, we can save the workflow as a JSON file, which can be perfectly restored to the current workflow using Load.

Additionally, we can refer to the workflows created by various experts on the internet and learn some techniques. I often do this myself. I recommend the following website: Prompting Pixels, which contains almost all common techniques.

Next time, I will document some online resources for Stable Diffusion, hoping to be helpful.


References


Read More

Leave a Reply