ComfyUI + SD 3.5 Medium: A Practical Guide - Michael Harleman

Introduction

AI image generation has shifted from simple prompt boxes to fully customizable pipelines. ComfyUI sits at the center of this evolution, offering a node‑based environment where every part of the diffusion process is visible, editable, and reusable. When paired with Stable Diffusion 3.5 Medium and a 12 GB RTX 4070 Ti, it becomes a powerful platform for creators who enjoy learning, iterating, and refining their workflows.

Why ComfyUI Fits a Work‑In‑Progress Approach

ComfyUI is designed for creators who want to understand how their images are made. Instead of hiding the diffusion pipeline, it exposes each stage as a node you can rewire at any time.

Key Advantages

Full transparency into the generation process
Modular, swappable components
Reproducible workflows saved as graphs
Clear debugging when something breaks
Extensibility through custom nodes and community tools

This makes ComfyUI ideal for anyone who prefers hands‑on control over their creative pipeline.

Understanding Stable Diffusion 3.5 Medium

Stable Diffusion 3.5 Medium strikes a balance between capability and efficiency. It offers:

Stronger text understanding than SDXL
More consistent faces and anatomy
Faster sampling than SD3.5 Large
Lower VRAM requirements than Turbo
A dual‑encoder architecture (T5XXL + CLIP‑L)

This architecture allows SD3.5 Medium to interpret prompts with nuance while maintaining high visual fidelity.

What a 12 GB RTX 4070 Ti Can Handle

A 12 GB GPU is more capable than many expect. With SD3.5 Medium, you can reliably generate:

Image Generation

1024×1024 at batch size 1
1536×1536 using tiled VAE decode
FP8 text encoders for reduced VRAM load

Motion and Video

SVD motion workflows
AnimateDiff sequences
MP4 output using FFmpeg‑based nodes

Img2Img and Refinement

Style transfer
Prompt‑guided edits
High‑quality detail enhancement

With proper workflow design, the 4070 Ti becomes a dependable engine for both still images and short video sequences. Technical specifications will tell you the model cannot effectively operate on only 12 GB's of VRAM, and if it does it will run unbelievably slow; however, that isn't necessarily true. If you have the right supporting hardware it can generate images in batches quite rapidly. It makes beauty and grace in spades.

A woman in a glowing red dress standing in a mystical forest, surrounded by radiant red light and floating magical particles.

Core Components of an SD3.5 Medium Workflow

A typical SD3.5 Medium workflow in ComfyUI includes several key stages.

Text Encoding

SD3.5 uses two encoders:

T5XXL FP8 for semantic meaning
CLIP‑L for style and visual grounding

These are combined before sampling to create a unified conditioning signal.

Conditioning

Conditioning nodes define the intent of the image:

Positive and negative prompts
Optional adapters (depth, pose, style)
SVD_img2vid_Conditioning for motion workflows

Sampling

The KSampler is the heart of the workflow.

Recommended settings:

Sampler: DPM++ 2M Karras or Euler a
Steps: 20–35
CFG: 3–6
Seed: locked for reproducibility

Latent Processing

Useful nodes include:

Empty Latent Image
Latent Upscale
Noise injection

Decoding

The VAE converts latent space into pixel space.

Options:

Standard VAE Decode
Tiled VAE Decode for high resolutions

Output

ComfyUI supports:

PNG/JPG via Save Image
Animated WEBP for previews
MP4 via FFmpeg‑based nodes

Note: It is not necessary to generate video if you do not wish to. You simpy need to remove or not include the SVD or AnimatedDiff nodes that create it. Generating only beautiful imagery is amazingly simple. An AI companion can assist you in arranging the nodes properly and finding the right ones inside the ComfyUI node menu. To open the node menu, simply right-click on the ComfyUI canvas.

Example Workflows

High‑Quality Single Image

T5XXL → CLIP → Conditioning Combine
Empty Latent Image → KSampler
VAE Decode → Save Image

Image‑to‑Image Refinement

Load Image → VAE Encode
Conditioning Combine
KSampler (img2img mode)
VAE Decode → Save Image

SVD Motion / Short Video

SVD_img2vid_Conditioning
KSampler
VAE Decode
Save Frames → FFmpeg Frames to Video

Portrait AI image of a beautiful young woman surrounded by blue sparkles wearing a sparkly blue dress.

Optimizing for a 12 GB GPU

To avoid VRAM issues:

Use FP8 text encoders
Keep batch size at 1 although I have run it with a batch size of 3.
Use tiled VAE decode for large resolutions
Avoid unnecessary mid‑graph upscalers
Use VRAM‑efficient samplers
Offload CLIP to CPU if needed

These optimizations allow SD3.5 Medium to run smoothly on a 12 GB card.

Troubleshooting Common Issues

Models Not Appearing

Wrong folder
Wrong file extension
Unsupported characters in filename
Placed in the wrong model category

WEBP Instead of MP4

SaveAnimatedWEBP is not a video encoder
Use FFmpeg‑based nodes for MP4 output

VRAM Crashes

Resolution too high
VAE decode not tiled
Sampler using too many steps

Workflow JSON Errors

Missing models
Outdated custom nodes
Incorrect node IDs

Conclusion

ComfyUI represents a shift toward transparent, modular, and reproducible AI workflows. It is designed for people that like to engineer their own workflows or use those created accurately and maintained current by other creators and developers. With Stable Diffusion 3.5 Medium and a 12 GB RTX 4070 Ti, creators can build pipelines that are fast, flexible, and capable of producing professional‑grade results. For anyone who enjoys learning, experimenting, and refining their craft, ComfyUI is more than a tool — it’s a creative environment built for growth.

Article written in cooperation with Microsoft Copilot who designed the article structure. Prompt images engineered with AI SD 3.5 Medium by Michael Harleman using custom workflow from Stable-Art-Diffusion. Workflow unmodified.

Introduction

Why ComfyUI Fits a Work‑In‑Progress Approach

Key Advantages

Understanding Stable Diffusion 3.5 Medium

What a 12 GB RTX 4070 Ti Can Handle

Image Generation

Motion and Video

Img2Img and Refinement

Core Components of an SD3.5 Medium Workflow

Text Encoding

Conditioning

Sampling

Latent Processing

Decoding

Output

Example Workflows

High‑Quality Single Image

Image‑to‑Image Refinement

SVD Motion / Short Video

Optimizing for a 12 GB GPU

Troubleshooting Common Issues

Models Not Appearing

WEBP Instead of MP4

VRAM Crashes

Workflow JSON Errors

Conclusion

Speak into the Codex — your words will be weighed and remembered. All remarks are filtered by Codex rules: deception, spam, and malice are struck from the record.

Subscribe to
our newsletter

Introduction

Why ComfyUI Fits a Work‑In‑Progress Approach

Key Advantages

Understanding Stable Diffusion 3.5 Medium

What a 12 GB RTX 4070 Ti Can Handle

Image Generation

Motion and Video

Img2Img and Refinement

Core Components of an SD3.5 Medium Workflow

Text Encoding

Conditioning

Sampling

Latent Processing

Decoding

Output

Example Workflows

High‑Quality Single Image

Image‑to‑Image Refinement

SVD Motion / Short Video

Optimizing for a 12 GB GPU

Troubleshooting Common Issues

Models Not Appearing

WEBP Instead of MP4

VRAM Crashes

Workflow JSON Errors

Conclusion

Speak into the Codex — your words will be weighed and remembered. All remarks are filtered by Codex rules: deception, spam, and malice are struck from the record.

Subscribe to our newsletter

What a 12 GB RTX 4070 Ti Can Handle

Optimizing for a 12 GB GPU

Subscribe to
our newsletter