Introduction
AI image generation has shifted from simple prompt boxes to fully customizable pipelines. ComfyUI sits at the center of this evolution, offering a node‑based environment where every part of the diffusion process is visible, editable, and reusable. When paired with Stable Diffusion 3.5 Medium and a 12 GB RTX 4070 Ti, it becomes a powerful platform for creators who enjoy learning, iterating, and refining their workflows.
Why ComfyUI Fits a Work‑In‑Progress Approach
ComfyUI is designed for creators who want to understand how their images are made. Instead of hiding the diffusion pipeline, it exposes each stage as a node you can rewire at any time.
Key Advantages
Full transparency into the generation process
Modular, swappable components
Reproducible workflows saved as graphs
Clear debugging when something breaks
Extensibility through custom nodes and community tools
This makes ComfyUI ideal for anyone who prefers hands‑on control over their creative pipeline.
Understanding Stable Diffusion 3.5 Medium
Stable Diffusion 3.5 Medium strikes a balance between capability and efficiency. It offers:
Stronger text understanding than SDXL
More consistent faces and anatomy
Faster sampling than SD3.5 Large
Lower VRAM requirements than Turbo
A dual‑encoder architecture (T5XXL + CLIP‑L)
This architecture allows SD3.5 Medium to interpret prompts with nuance while maintaining high visual fidelity.
What a 12 GB RTX 4070 Ti Can Handle
A 12 GB GPU is more capable than many expect. With SD3.5 Medium, you can reliably generate:
Image Generation
1024×1024 at batch size 1
1536×1536 using tiled VAE decode
FP8 text encoders for reduced VRAM load
Motion and Video
SVD motion workflows
AnimateDiff sequences
MP4 output using FFmpeg‑based nodes
Img2Img and Refinement
Style transfer
Prompt‑guided edits
High‑quality detail enhancement
With proper workflow design, the 4070 Ti becomes a dependable engine for both still images and short video sequences. Technical specifications will tell you the model cannot effectively operate on only 12 GB's of VRAM, and if it does it will run unbelievably slow; however, that isn't necessarily true. If you have the right supporting hardware it can generate images in batches quite rapidly. It makes beauty and grace in spades.
Core Components of an SD3.5 Medium Workflow
A typical SD3.5 Medium workflow in ComfyUI includes several key stages.
Text Encoding
SD3.5 uses two encoders:
T5XXL FP8 for semantic meaning
CLIP‑L for style and visual grounding
These are combined before sampling to create a unified conditioning signal.
Conditioning
Conditioning nodes define the intent of the image:
Positive and negative prompts
Optional adapters (depth, pose, style)
SVD_img2vid_Conditioning for motion workflows
Sampling
The KSampler is the heart of the workflow.
Recommended settings:
Sampler: DPM++ 2M Karras or Euler a
Steps: 20–35
CFG: 3–6
Seed: locked for reproducibility
Latent Processing
Useful nodes include:
Empty Latent Image
Latent Upscale
Noise injection
Decoding
The VAE converts latent space into pixel space.
Options:
Standard VAE Decode
Tiled VAE Decode for high resolutions
Output
ComfyUI supports:
PNG/JPG via Save Image
Animated WEBP for previews
MP4 via FFmpeg‑based nodes
Example Workflows
High‑Quality Single Image
T5XXL → CLIP → Conditioning Combine
Empty Latent Image → KSampler
VAE Decode → Save Image
Image‑to‑Image Refinement
Load Image → VAE Encode
Conditioning Combine
KSampler (img2img mode)
VAE Decode → Save Image
SVD Motion / Short Video
SVD_img2vid_Conditioning
KSampler
VAE Decode
Save Frames → FFmpeg Frames to Video
Optimizing for a 12 GB GPU
To avoid VRAM issues:
Use FP8 text encoders
Keep batch size at 1 although I have run it with a batch size of 3.
Use tiled VAE decode for large resolutions
Avoid unnecessary mid‑graph upscalers
Use VRAM‑efficient samplers
Offload CLIP to CPU if needed
These optimizations allow SD3.5 Medium to run smoothly on a 12 GB card.
Troubleshooting Common Issues
Models Not Appearing
Wrong folder
Wrong file extension
Unsupported characters in filename
Placed in the wrong model category
WEBP Instead of MP4
SaveAnimatedWEBP is not a video encoder
Use FFmpeg‑based nodes for MP4 output
VRAM Crashes
Resolution too high
VAE decode not tiled
Sampler using too many steps
Workflow JSON Errors
Missing models
Outdated custom nodes
Incorrect node IDs
Conclusion
ComfyUI represents a shift toward transparent, modular, and reproducible AI workflows. It is designed for people that like to engineer their own workflows or use those created accurately and maintained current by other creators and developers. With Stable Diffusion 3.5 Medium and a 12 GB RTX 4070 Ti, creators can build pipelines that are fast, flexible, and capable of producing professional‑grade results. For anyone who enjoys learning, experimenting, and refining their craft, ComfyUI is more than a tool — it’s a creative environment built for growth.
Article written in cooperation with Microsoft Copilot who designed the article structure. Prompt images engineered with AI SD 3.5 Medium by Michael Harleman using custom workflow from Stable-Art-Diffusion. Workflow unmodified.




