ComfyUI + SD 3.5 Medium: A Practical Guide
header01
ComfyUI + SD 3.5 Medium: A Practical Guide

Introduction

AI image generation has shifted from simple prompt boxes to fully customizable pipelines. ComfyUI sits at the center of this evolution, offering a node‑based environment where every part of the diffusion process is visible, editable, and reusable. When paired with Stable Diffusion 3.5 Medium and a 12 GB RTX 4070 Ti, it becomes a powerful platform for creators who enjoy learning, iterating, and refining their workflows.

Why ComfyUI Fits a Work‑In‑Progress Approach

ComfyUI is designed for creators who want to understand how their images are made. Instead of hiding the diffusion pipeline, it exposes each stage as a node you can rewire at any time.

Key Advantages

  • Full transparency into the generation process

  • Modular, swappable components

  • Reproducible workflows saved as graphs

  • Clear debugging when something breaks

  • Extensibility through custom nodes and community tools

This makes ComfyUI ideal for anyone who prefers hands‑on control over their creative pipeline.

Understanding Stable Diffusion 3.5 Medium

Stable Diffusion 3.5 Medium strikes a balance between capability and efficiency. It offers:

  • Stronger text understanding than SDXL

  • More consistent faces and anatomy

  • Faster sampling than SD3.5 Large

  • Lower VRAM requirements than Turbo

  • A dual‑encoder architecture (T5XXL + CLIP‑L)

This architecture allows SD3.5 Medium to interpret prompts with nuance while maintaining high visual fidelity.

What a 12 GB RTX 4070 Ti Can Handle

A 12 GB GPU is more capable than many expect. With SD3.5 Medium, you can reliably generate:

Image Generation

  • 1024×1024 at batch size 1

  • 1536×1536 using tiled VAE decode

  • FP8 text encoders for reduced VRAM load

Motion and Video

  • SVD motion workflows

  • AnimateDiff sequences

  • MP4 output using FFmpeg‑based nodes

Img2Img and Refinement

  • Style transfer

  • Prompt‑guided edits

  • High‑quality detail enhancement

With proper workflow design, the 4070 Ti becomes a dependable engine for both still images and short video sequences. Technical specifications will tell you the model cannot effectively operate on only 12 GB's of VRAM, and if it does it will run unbelievably slow; however, that isn't necessarily true. If you have the right supporting hardware it can generate images in batches quite rapidly. It makes beauty and grace in spades.

A woman in a glowing red dress standing in a mystical forest, surrounded by radiant red light and floating magical particles.

Core Components of an SD3.5 Medium Workflow

A typical SD3.5 Medium workflow in ComfyUI includes several key stages.

Text Encoding

SD3.5 uses two encoders:

  • T5XXL FP8 for semantic meaning

  • CLIP‑L for style and visual grounding

These are combined before sampling to create a unified conditioning signal.

Conditioning

Conditioning nodes define the intent of the image:

  • Positive and negative prompts

  • Optional adapters (depth, pose, style)

  • SVD_img2vid_Conditioning for motion workflows

Sampling

The KSampler is the heart of the workflow.

Recommended settings:

  • Sampler: DPM++ 2M Karras or Euler a

  • Steps: 20–35

  • CFG: 3–6

  • Seed: locked for reproducibility

Latent Processing

Useful nodes include:

  • Empty Latent Image

  • Latent Upscale

  • Noise injection

Decoding

The VAE converts latent space into pixel space.

Options:

  • Standard VAE Decode

  • Tiled VAE Decode for high resolutions

Output

ComfyUI supports:

  • PNG/JPG via Save Image

  • Animated WEBP for previews

  • MP4 via FFmpeg‑based nodes


Note:  It is not necessary to generate video if you do not wish to.  You simpy need to remove or not include the SVD or AnimatedDiff nodes that create it.  Generating only beautiful imagery is amazingly simple.  An AI companion can assist you in arranging the nodes properly and finding the right ones inside the ComfyUI node menu.  To open the node menu, simply right-click on the ComfyUI canvas.

Example Workflows

High‑Quality Single Image

  • T5XXL → CLIP → Conditioning Combine

  • Empty Latent Image → KSampler

  • VAE Decode → Save Image

Image‑to‑Image Refinement

  • Load Image → VAE Encode

  • Conditioning Combine

  • KSampler (img2img mode)

  • VAE Decode → Save Image

SVD Motion / Short Video

  • SVD_img2vid_Conditioning

  • KSampler

  • VAE Decode

  • Save Frames FFmpeg Frames to Video

Portrait AI image of a beautiful young woman surrounded by blue sparkles wearing a sparkly blue dress.

Optimizing for a 12 GB GPU

To avoid VRAM issues:

  • Use FP8 text encoders

  • Keep batch size at 1 although I have run it with a batch size of 3.

  • Use tiled VAE decode for large resolutions

  • Avoid unnecessary mid‑graph upscalers

  • Use VRAM‑efficient samplers

  • Offload CLIP to CPU if needed

These optimizations allow SD3.5 Medium to run smoothly on a 12 GB card.

Troubleshooting Common Issues

Models Not Appearing

  • Wrong folder

  • Wrong file extension

  • Unsupported characters in filename

  • Placed in the wrong model category

WEBP Instead of MP4

  • SaveAnimatedWEBP is not a video encoder

  • Use FFmpeg‑based nodes for MP4 output

VRAM Crashes

  • Resolution too high

  • VAE decode not tiled

  • Sampler using too many steps

Workflow JSON Errors

  • Missing models

  • Outdated custom nodes

  • Incorrect node IDs

Conclusion

ComfyUI represents a shift toward transparent, modular, and reproducible AI workflows. It is designed for people that like to engineer their own workflows or use those created accurately and maintained current by other creators and developers. With Stable Diffusion 3.5 Medium and a 12 GB RTX 4070 Ti, creators can build pipelines that are fast, flexible, and capable of producing professional‑grade results. For anyone who enjoys learning, experimenting, and refining their craft, ComfyUI is more than a tool — it’s a creative environment built for growth.

Article written in cooperation with Microsoft Copilot who designed the article structure.  Prompt images engineered with AI SD 3.5 Medium by Michael Harleman using custom workflow from Stable-Art-Diffusion.  Workflow unmodified.

Speak into the Codex — your words will be weighed and remembered. All remarks are filtered by Codex rules: deception, spam, and malice are struck from the record.