beam-logo
← All posts
Tutorials

How to Deploy ComfyUI as an API

Nathanael ChiangNathanael Chiang
June 28, 20266 min read
How to Deploy ComfyUI as an API

To deploy ComfyUI as an API, wrap it in a Beam Pod that exposes port 8000 and call create(). You get a public HTTPS URL backed by an H100 that scales to zero between requests and bills by the second. ComfyUI already serves its own HTTP API, so the Pod hands you a callable /prompt endpoint with no extra framework and no GPU box to babysit.

What this involves

ComfyUI is a desktop tool. To call it from an application you need it running on a GPU somewhere, reachable over HTTP, and ideally not burning money while no images are being generated. That is the whole problem: the model itself is the easy part.

The naive version is renting a GPU box, installing ComfyUI, leaving the server up, and paying for it 24/7. That works until traffic is bursty, which image generation almost always is. You want the GPU to appear when a request arrives and disappear when the queue drains.

There are two shapes of "ComfyUI as an API":

  • Expose ComfyUI's own HTTP API. ComfyUI already speaks HTTP on the port you pass --port. You POST a workflow graph to /prompt and read results from /history. This is the fastest path and keeps full ComfyUI compatibility.
  • Wrap one workflow in your own REST endpoint. You hide the node graph behind a small handler that takes a prompt and returns an image URL. Cleaner contract, more code.

If ComfyUI is new to you, the install walkthrough and the workflow basics cover the desktop side before you put any of it behind an endpoint.

Export your workflow in API format first

Before any of this works, export your workflow in API format, not the UI format ComfyUI saves by default. The two JSON shapes look similar but the /prompt endpoint only accepts the API one, and feeding it the UI export is the most common cause of validation errors. Get this step right and the rest is plumbing.

In ComfyUI, open the settings gear, enable Dev mode options, and a Save (API Format) button appears in the menu. Click it to write workflow_api.json. That file is what you POST to /prompt; the node graph you see on screen is not.

Two more things bite people here:

  • Reference only checkpoints and LoRAs that are actually installed on the server. A path that exists on your laptop but not in the container fails validation.
  • Custom nodes must be installed in the image. If the workflow calls a node class the server has not loaded, the request errors before generation starts.

How to deploy ComfyUI on a serverless GPU

Beam runs your container on a GPU and gives it a public URL. The snippet at the top is the whole "expose ComfyUI's HTTP API" path. A few details worth expanding.

Build the image once. The Image chain installs comfy-cli, runs the ComfyUI installer pinned to a version, and is where you bake in model weights so they are not pulled on every start:

Want a single-workflow REST endpoint instead of the whole ComfyUI server? Swap the Pod for an @asgi handler. It boots ComfyUI in the background on start, then serves your own FastAPI routes:

Deploy it with beam deploy api.py:handler. The rendered images come back as signed URLs from Output, so you do not have to stand up your own object storage.

Which ComfyUI endpoints you actually call

Once the Pod is up, you talk to ComfyUI's built-in HTTP server. You queue a workflow on /prompt, then poll /history/{prompt_id} until the render lands and pull the file from /view. Four endpoints cover almost every integration:

EndpointMethodWhat it does
/promptPOSTQueues an API-format workflow, returns a prompt_id
/history/{prompt_id}GETReturns status and output filenames for a run
/viewGETReturns a rendered image by filename, subfolder, and type
/upload/imagePOSTUploads an input image for img2img or controlnet workflows
/wsWebSocketStreams live progress for long-running renders

For quick jobs, POST to /prompt and poll /history. For multi-minute workflows, open the /ws socket so your UI can show progress instead of a spinner.

What to look for in a ComfyUI API platform

ComfyUI traffic is spiky: a user kicks off a batch of renders, then nothing for an hour. Beam fits that pattern because the endpoint costs nothing while idle, starts on the next request, and bills by the second once it does. Three specifics matter for image generation.

Scale to zero, per-second billing. The endpoint costs nothing while idle and starts on the next request. You pay for render seconds, not for a parked GPU, and Beam does not charge for container spin-up.

GPU price. An H100 on Beam is $1.74/hr versus $4.18/hr on RunPod serverless and $3.95/hr on Modal (all checked June 2026). For GPU-bound image generation that is the dominant line on the bill. The same scale-to-zero economics apply to any serverless GPU workload, not just ComfyUI.

Weights stay on the machine. Baking checkpoints into the image or a Volume means a cold start loads weights into VRAM rather than re-downloading multi-gigabyte safetensors, which is the slow part of starting ComfyUI cold.

You keep ComfyUI itself unchanged. The Pod route exposes the exact same HTTP API you would hit locally, so existing ComfyUI clients and your favorite ComfyUI workflows run without rewrites.

ComfyUI hosting options compared

Prices are list H100 rates checked June 2026. RunPod and Modal both scale to zero and bill per second; the gap is GPU price and how much glue code you write to expose the workflow. A raw GPU VM is the most flexible and the most work: you own the autoscaling, the URL, and the idle bill.

OptionH100 $/hrScales to zeroHow you call itWeights
Beam Pod$1.74YesComfyUI's native /prompt API on a public URLBaked into image or Volume
Beam @asgi$1.74YesYour own REST route in front of the graphBaked into image or Volume
RunPod Serverless$4.18YesCustom worker handler you buildNetwork volume or baked
Modal$3.95YesYour own web endpoint you buildModal Volume
Raw GPU VM (EC2/EKS)variesNoWhatever you wire up plus opsYour responsibility

FAQ

How do I call the ComfyUI API once it is deployed?

Pod.create() returns a public HTTPS URL. ComfyUI already serves its own HTTP API on the port you exposed, so you POST an API-format workflow JSON to /prompt on that URL and poll /history for the result. No extra web framework is needed for the Pod route.

What is the difference between API-format and UI-format workflow JSON?

The UI format is what ComfyUI saves by default and describes the visual graph. The API format is a flatter structure the /prompt endpoint expects. Enable Dev mode in settings and use Save (API Format) to export the right one; posting the UI export is the usual cause of validation errors.

Does the endpoint scale to zero when no one is generating images?

Yes. Beam scales the container to zero when idle and starts it again on the next request. You are billed by the second for compute time, not for an always-on GPU, and Beam does not bill for container spin-up.

Which GPU should I pick for ComfyUI?

It depends on the model. SDXL and Flux-schnell run comfortably on an L40S or A100 80GB; large Flux-dev or video workflows benefit from an H100. You change one argument (gpu="A100-80") to switch.

How do I avoid re-downloading model weights on every cold start?

Bake the weights into the image with huggingface-cli during the build step, or mount a Beam Volume and download once. Both keep the weights on the machine so a cold start only loads them into VRAM instead of pulling them over the network.

Can I expose a single workflow as a clean REST endpoint instead of the whole ComfyUI server?

Yes. Use the @asgi route with a FastAPI handler that accepts your prompt parameters, runs the workflow through a background ComfyUI process, and returns the output image URL. That hides the ComfyUI graph behind your own API contract.

Get started

Put a ComfyUI workflow behind an HTTPS endpoint without managing a GPU box. Get started free on Beam — new accounts include $30 in credit refreshed monthly.

Nathanael Chiang
Nathanael Chiang
Published June 28, 2026
$30 free creditrefreshed monthly

Start shipping on infra
you won’t outgrow.

Run sandboxes and GPU workloads on your cloud, and scale out to ours when you need to. No infra to manage.