By Clore in GPU Cloud — 15 Apr 2025

How to Run Stable Diffusion on a Cloud GPU (Step-by-Step)

Stable Diffusion has become one of the most popular AI image generation tools in the world. From photorealistic portraits to fantasy landscapes, anime characters to architectural concepts, Stable Diffusion can create stunning images from text prompts in seconds — if you have the right hardware.

The problem? Running Stable Diffusion locally requires a decent NVIDIA GPU with at least 8 GB of VRAM, and for the best experience with SDXL or Stable Diffusion 3, you'll want 12–24 GB. Not everyone has that kind of hardware sitting on their desk. Even if you do, generation can be slow on older cards, and training custom models (LoRAs, Dreambooth) is even more demanding.

The solution is simple: rent a cloud GPU. In this step-by-step tutorial, you'll learn how to rent a GPU on Clore.ai, deploy a Stable Diffusion interface (ComfyUI or Automatic1111), and start generating images — all in under 15 minutes. No local GPU required.

What You'll Need

Before we start, here's what you'll need:

A Clore.ai account — Sign up here (free, takes 30 seconds)
A small balance — $1–$5 is enough for several hours of image generation
A web browser — Chrome, Firefox, or any modern browser
Basic familiarity with web interfaces — no coding required for basic image generation

That's it. No local GPU, no Python installation, no driver headaches.

Step 1: Choose the Right GPU for Stable Diffusion

Not all GPUs are created equal for image generation. Here's what we recommend:

Best Budget Option: RTX 3090 ($0.06–$0.12/hr on Clore.ai)

The RTX 3090's 24 GB of VRAM handles everything Stable Diffusion can throw at it. SDXL, SD 3.0, large batch generation, and even LoRA training all fit comfortably. At Clore.ai's spot prices, you can generate images for pennies per hour.

Best Performance Option: RTX 4090 ($0.10–$0.25/hr on Clore.ai)

The RTX 4090 generates images roughly 1.5–2x faster than the 3090. If you're iterating quickly on prompts or generating large batches, the speed difference is worth the modest price increase.

Best for SD 3.0 and Flux: RTX 5090 ($0.30–$0.50/hr on Clore.ai)

The latest models like Stable Diffusion 3 and Flux benefit from the RTX 5090's 32 GB of GDDR7 VRAM and 5th-gen Tensor Cores. If you're working with the newest, most demanding checkpoints, the 5090 gives you headroom. Clore.ai offers strong RTX 5090 availability — excellent selection for the latest GPU.

GPUs to Avoid for Stable Diffusion

RTX 3060 (12 GB): Works for basic SD 1.5 but struggles with SDXL and larger models
RTX 3070 (8 GB): Only 8 GB VRAM; you'll hit out-of-memory errors on most modern models
Any GPU with <8 GB VRAM: Not viable for Stable Diffusion

For a complete GPU pricing breakdown, check our Top 10 Cheapest GPUs for AI Training in 2025.

Step 2: Rent a GPU on Clore.ai

2.1: Log In and Add Funds

Log into your Clore.ai account. If you haven't added funds yet, deposit using:

$CLORE tokens (ERC-20) — the platform's native token
Bitcoin (BTC)
Credit/debit card

For a casual Stable Diffusion session, $2–$5 will last you several hours even on an RTX 4090.

2.2: Browse the Marketplace

Navigate to the marketplace and filter for your preferred GPU:

Click on the GPU filter and select "RTX 4090" (or your chosen card)
Sort by price (low to high) to find the cheapest available option
Look for servers with good host ratings (4+ stars) and reasonable storage (at least 50 GB for model files)

2.3: Select a Server

Click on a server listing to see full details. Pay attention to:

Internet speed: Faster connections mean quicker model downloads (look for 500+ Mbps)
Storage: Stable Diffusion models can be 2–7 GB each. If you plan to use multiple models, ensure at least 50–100 GB of free space
RAM: 32 GB or more is recommended for smooth operation
Location: Choose a server geographically close to you for lower latency on the web interface

Step 3: Deploy ComfyUI (Recommended)

ComfyUI is the most popular Stable Diffusion interface in 2025. It uses a node-based workflow system that's incredibly powerful and flexible. Here's how to deploy it on your rented GPU.

3.1: Choose the Docker Image

When configuring your rental on Clore.ai, you'll be asked to select a Docker image. Look for a ComfyUI pre-configured image in the available templates. If one is available, select it — this saves significant setup time.

If a pre-configured ComfyUI image isn't available, select a PyTorch base image (e.g., pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime) and we'll install ComfyUI manually.

3.2: Configure Ports

Ensure the following port is exposed:

Port 8188 — ComfyUI's default web interface port

Also enable SSH access so you can manage the server via terminal if needed.

3.3: Start the Rental

Click "Rent" and wait for the server to spin up. This typically takes 1–3 minutes.

3.4: Install ComfyUI (If Using a Base Image)

If you chose a pre-configured ComfyUI image, skip to Step 4. Otherwise, SSH into your server and run:

# Connect to your server
ssh root@your-server-address -p your-port

# Clone ComfyUI
cd /root
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# Install dependencies
pip install -r requirements.txt

# Download a Stable Diffusion model (SDXL example)
cd models/checkpoints
wget https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors

# Start ComfyUI
cd /root/ComfyUI
python main.py --listen 0.0.0.0 --port 8188

3.5: Access the Web Interface

Once ComfyUI is running, open your browser and navigate to:

http://your-server-address:8188

You should see the ComfyUI node editor. Congratulations — you're ready to generate images!

Step 4: Deploy Automatic1111 (Alternative)

If you prefer the classic Stable Diffusion WebUI (Automatic1111 / FORGE), here's how to set it up.

4.1: SSH Into Your Server

ssh root@your-server-address -p your-port

4.2: Install and Launch

# Clone the repository
cd /root
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui

# Download a model
cd models/Stable-diffusion
wget https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors
cd /root/stable-diffusion-webui

# Launch with remote access enabled
python launch.py --listen --xformers --enable-insecure-extension-access

The first launch will take 5–10 minutes as it installs all dependencies. Subsequent launches are much faster.

4.3: Access the Interface

Open your browser and go to:

http://your-server-address:7860

You'll see the familiar A1111 interface with txt2img, img2img, and all the standard tabs.

Step 5: Generate Your First Image

Now for the fun part. Let's generate an image.

In ComfyUI:

The default workflow should already be loaded (a simple text-to-image pipeline)
Find the "CLIP Text Encode" node (the positive prompt)
Type your prompt, for example: "a majestic dragon flying over a medieval castle at sunset, highly detailed, cinematic lighting, 8k"
Find the negative prompt node and type: "blurry, low quality, distorted, watermark"
Set the resolution to 1024x1024 for SDXL
Click "Queue Prompt"
Wait a few seconds — your image will appear in the preview node

In Automatic1111:

Go to the txt2img tab
Enter your prompt: "a majestic dragon flying over a medieval castle at sunset, highly detailed, cinematic lighting, 8k"
Enter negative prompt: "blurry, low quality, distorted, watermark"
Set width and height to 1024x1024
Set sampling steps to 25–30
Choose DPM++ 2M Karras as the sampler
Click "Generate"
Your image appears in seconds

Generation Speed Benchmarks

Here's what to expect for a single 1024x1024 SDXL image at 25 steps:

GPU	Time per Image
RTX 3090	~8–12 seconds
RTX 4090	~4–6 seconds
RTX 5090	~2–4 seconds

At these speeds, you can iterate rapidly on prompts and generate hundreds of images per hour.

Step 6: Download and Install Additional Models

The Stable Diffusion ecosystem is rich with models, LoRAs, and extensions. Here's how to expand your setup.

Downloading Models from Hugging Face

# SSH into your server
ssh root@your-server-address -p your-port

# Navigate to the models directory
cd /root/ComfyUI/models/checkpoints  # for ComfyUI
# or
cd /root/stable-diffusion-webui/models/Stable-diffusion  # for A1111

# Download Stable Diffusion 3 Medium
wget https://huggingface.co/stabilityai/stable-diffusion-3-medium/resolve/main/sd3_medium.safetensors

# Download a popular community model (example: DreamShaper)
wget "https://civitai.com/api/download/models/XXXXX" -O dreamshaper_xl.safetensors

Installing LoRAs

LoRAs (Low-Rank Adaptations) are small model add-ons that modify the style or subject of generated images:

# For ComfyUI
cd /root/ComfyUI/models/loras

# For A1111
cd /root/stable-diffusion-webui/models/Lora

# Download a LoRA (example)
wget "https://civitai.com/api/download/models/YYYYY" -O my_lora.safetensors

Installing Extensions (A1111)

cd /root/stable-diffusion-webui/extensions
git clone https://github.com/Mikubill/sd-webui-controlnet.git
# Restart the WebUI to activate

Installing Custom Nodes (ComfyUI)

cd /root/ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
# Restart ComfyUI to activate

The ComfyUI Manager extension is particularly useful — it provides a GUI for installing and managing other custom nodes directly from the interface.

Step 7: Advanced Techniques

Once you're comfortable with basic generation, explore these advanced workflows:

ControlNet

ControlNet lets you guide image generation using reference images — poses, depth maps, edge detection, and more. Install the ControlNet extension/nodes and download the appropriate ControlNet models for precise control over your outputs.

Inpainting

Modify specific regions of an existing image while keeping the rest intact. Both ComfyUI and A1111 support inpainting natively.

Batch Generation

Generate multiple images at once to explore variations:

In ComfyUI: Use the batch size setting in the sampler node
In A1111: Set "Batch count" or "Batch size" in the generation settings

Upscaling

Use AI upscalers (Real-ESRGAN, SwinIR) to increase image resolution after generation. This is especially useful for creating print-quality outputs.

Training Custom LoRAs

Want to generate images of a specific person, style, or object? Train a custom LoRA directly on your rented GPU:

# Clone a LoRA training tool
git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts
pip install -r requirements.txt

# Prepare your training images (10-30 images)
# Configure and run training
python sdxl_train_network.py --config your_config.toml

Training a LoRA takes 30–90 minutes on an RTX 4090 depending on dataset size and steps.

Optimizing Costs for Stable Diffusion

Use Spot Pricing

For image generation sessions, Clore.ai's GigaSPOT is perfect. Stable Diffusion doesn't require long, uninterrupted sessions — if your spot instance is interrupted, you've only lost your unsaved images, not hours of training progress.

Choose the Right GPU

Don't rent an A100 for Stable Diffusion — it won't be significantly faster than an RTX 4090 for image generation, and it costs 3–5x more. Stick with consumer GPUs (RTX 3090, 4090, or 5090) for the best price-to-performance ratio.

Download Models Once, Save to Persistent Storage

If Clore.ai offers persistent storage or volume mounts, use them to avoid re-downloading multi-GB model files every session. Alternatively, create a custom Docker image with your preferred models pre-installed.

Generate in Batches

Instead of generating one image at a time and inspecting it, generate batches of 4–8 images. This is more GPU-efficient and gives you more variations to choose from per generation cycle.

Use FP16 and xFormers

Both ComfyUI and A1111 support half-precision (FP16) inference and xFormers memory optimization. These reduce VRAM usage and increase generation speed at virtually no quality cost:

# A1111 with optimizations
python launch.py --listen --xformers --opt-sdp-attention

ComfyUI uses FP16 by default for most operations.

Troubleshooting Common Issues

"CUDA out of memory" Error

Reduce image resolution (try 768x768 instead of 1024x1024)
Lower batch size to 1
Enable FP16 / xFormers optimizations
Choose a GPU with more VRAM

Slow Generation Speed

Ensure you're using the GPU (not CPU). Check with nvidia-smi
Enable xFormers attention
Reduce sampling steps (20 steps is often sufficient)
Use a faster sampler (DPM++ 2M Karras)

Cannot Access Web Interface

Ensure the correct port is exposed in your Clore.ai rental configuration
Check that the application is running with --listen 0.0.0.0 (not just localhost)
Try accessing via the IP address shown in your Clore.ai dashboard

Models Not Loading

Verify the model file downloaded completely (check file size)
Ensure the model is in the correct directory
For SDXL models, make sure you're using a compatible version of ComfyUI/A1111

Conclusion

Running Stable Diffusion on a cloud GPU is the fastest, cheapest, and most hassle-free way to generate AI images in 2025. With Clore.ai, you can rent an RTX 4090 for as little as $0.10/hour and deploy ComfyUI or Automatic1111 in minutes — no local GPU, no driver issues, no hardware investment.

Here's a quick recap of the workflow:

Sign up on Clore.ai and add funds
Rent an RTX 4090 or RTX 3090 from the marketplace
Deploy a ComfyUI or A1111 Docker image (1-click or manual setup)
Generate images through the web interface
Save your creations and stop the rental when done

Whether you're creating art for fun, prototyping designs for a client, or training custom models, cloud GPU rental makes professional-grade AI image generation accessible to everyone.

Ready to start creating? Rent a GPU on Clore.ai and generate your first AI image today. For more AI tutorials, check out our guide on How to Fine-Tune LLaMA 3 on a Cloud GPU.