Ideogram 4 was trained on JSON — plain prompts are second-class

Ideogram 4.0: open weights, JSON-first prompting, bounding-box layout, native 2K. API starts at $0.03/image.

Jun 12, 2026

Ideogram 4 was trained on JSON — plain prompts are second-class

Ideogram 4.0 landed on June 3, 2026, and it changes how you talk to the model: weights you can download, native 2K output, and a prompt format that is JSON first rather than prose first.

What Ideogram 4 unlocks that 3 didn't

Ideogram 4.0 is the company's first open-weight text-to-image foundation model, a 9.3-billion-parameter model trained from scratch and the first Ideogram release with a public model card . Versions 1 through 3 were closed and app-only, so this is the first time you can run the model on your own hardware .

Three shifts matter for builders:

Downloadable weights. An NF4 checkpoint fits on a single 24GB CUDA GPU with Diffusers support, and an FP8 build targets broader hardware without Diffusers .
Native 2K in one checkpoint. Single weights span 256–2048 px per side across aspect ratios up to 6:1 or 1:6, eliminating the generate-then-upscale round-trip of earlier versions .
JSON-first prompting. 4.0 was trained exclusively on structured JSON captions, so a json_prompt bypasses Magic Prompt expansion and reaches the diffusion model directly. Plain text still works but skips that alignment .

On standing: at release 4.0 placed 9th overall in the DesignArena text-to-image arena and 1st among openly distributed models . Ideogram's own preference evaluation self-reports an ELO of 1062, 2nd of 9 models — useful context, but vendor-run and not independently reproduced .

Authenticating and picking your entry point

Ideogram 4.0 exposes three surfaces, and which one you pick depends on whether you want zero setup, programmatic access, or full control over weights. The hosted app at ideogram.ai needs only a sign-in: select model 4.0, write a prompt, generate, and download. Every plan can export JPEG, but PNG requires Basic Plan or above, and Batch Generation is gated to Pro and Team plans with uploads capped at 500 rows including the header .

The hosted API is the route for automation. Mint an Api-Key at developer.ideogram.ai, then POST multipart/form-data to /v1/ideogram-v4/generate. Pricing is per output image with no subscription — US $0.03 Turbo, $0.06 Default, $0.10 Quality — and the default rate limit is 10 in-flight requests .

For self-hosting, clone ideogram-oss/ideogram-4, run pip install ., and authenticate via hf auth login or an HF_TOKEN env var. The ideogram-ai/ideogram-4-nf4 and ideogram-4-fp8 repos are gated — accept the license before weights download, and note the nf4 build fits a single 24GB GPU .

Surface	Cost	Hardware floor	Throughput cap	Output formats
Hosted app	Plan subscription	None (browser)	Batch ≤500 rows (Pro/Team)	JPEG all; PNG Basic+
Hosted API	$0.03–$0.10/image	None (HTTP)	10 concurrent requests	Ephemeral URL (download)
Self-hosted	Free weights + your compute	24GB GPU (nf4)	Your hardware	Local files

Writing a JSON prompt: the format Ideogram 4 expects

A JSON prompt for Ideogram 4 has three top-level fields, and using them is what separates power-user control from prose guesswork: high_level_description (a one-to-two-sentence summary of the image), style_description (medium, lighting, and color palette), and compositional_deconstruction (the background element listed first, then foreground objects — order matters) . The model was trained exclusively on structured JSON captions, so writing JSON minimizes train/inference mismatch and raises controllability versus plain text .

"4.0 was trained exclusively on structured JSON captions, so JSON minimizes train/inference mismatch and improves controllability." — Ideogram 4.0 prompt guidance (source: imagine.art prompt guide)

Declare medium explicitly. Set it to "photograph" and add a photo field (focal length, aperture, film stock — e.g. "35mm, f/2.8, shallow depth of field"), or set it to "graphic_design" and add art_style. Combining a photo field and art_style in the same prompt degrades output quality .

Spatial control uses bounding boxes in normalized [y_min, x_min, y_max, x_max] coordinates on a 0–1000 canvas with a top-left origin; rough coordinates are sufficient and pixel precision is not required . For multi-line typography, give each text element its own non-overlapping box — overlapping boxes are a common cause of garbled type. Colors must be uppercase #RRGGBB hex (e.g. #FF6B35), with up to 16 in the global palette and up to 5 per element; describing colors in words ("deep red", "warm orange") silently degrades output .

For rendered type, the separation between the visible string and its spec is the mechanism that suppresses glyph errors and misspellings: put the literal string in text and the typographic spec (font, weight, size, color) in desc . The contrast is easiest to see by structuring the same intent both ways — the snippet below ran successfully and just prints both forms, showing how prose collapses discrete fields into loose hints:

import json

intent = {
    "subject": "a red enamel coffee mug",
    "style": "clean studio product photo",
    "text": "JSON FIRST",
    "composition": "centered, white background, soft shadow",
    "constraints": ["legible text", "no extra objects"],
}

plain_prompt = (
    "A clean studio product photo of a red enamel coffee mug that says "
    "'JSON FIRST', centered on a white background with a soft shadow. "
    "Make the text legible and add no extra objects."
)

json_prompt = json.dumps(intent, indent=2)

print("plain prompt:")
print(plain_prompt)
print("\njson prompt for Ideogram 4:")
print(json_prompt)
print("\npoint: keep prompt intent structured; plain prose collapses fields into hints.")

Known issues and incomplete parts in 4.0 today

Ideogram 4.0 is open-weight and current, but several surfaces are still in flux — plan around them before shipping. The biggest live gap is rendering speed: FLASH is listed as "coming soon" and currently returns HTTP 400. The working tiers are TURBO (12 steps, V4_TURBO_12), DEFAULT (20 steps, V4_DEFAULT_20), and QUALITY (48 steps, V4_QUALITY_48) .

Docs drift: the hosted app still describes some 3.0-era settings; treat developer.ideogram.ai and the ideogram-oss GitHub as authoritative for 4.0 field names and defaults .
Post-generation tools: transparent/alpha output, layerize, reframe, edit, and background removal may route to non-v4 endpoints — validate each before wiring it into a pipeline .
Roadmap, not shipped: native alpha channels and editable text layers are "coming to 4.0," so treat both as forward-looking .

On benchmarks, stay skeptical. The DesignArena placement (top open-weight, 9th overall, 1st in quality mode) is third-party but preference-based, not an objective metric . Ideogram's own figures — 0.97 OCR accuracy, 0.69 mIoU, 0.76 SpatialGenEval — are self-reported and not independently reproduced at publication time .

How to go further with Ideogram 4

The practical workflow is to start with text_prompt for fast ideation, then migrate to json_prompt once layout precision, brand hex colors, or multi-line typography matter. Two API endpoints scaffold that transition. POST /v1/ideogram-v4/magic-prompt converts a plain prompt into a full structured json_prompt, and setting aspect_ratio to AUTO lets it pick dimensions for you — a useful first draft before hand-tuning. POST /v1/ideogram-v4/describe accepts any JPEG, PNG, or WebP up to 10MB and returns the structured JSON prompt for that reference, optionally preserving bounding boxes — handy for reversing a source asset or recreating a competitor layout. If you prefer not to write JSON by hand, update ComfyUI to 0.24.0+, pull the image_ideogram4_t2i.json template and the Comfy-Org/Ideogram-4 checkpoint, and compose visually with nodes . Takeaway: treat plain text as a sketch and JSON as the contract — let magic-prompt and describe generate the structure, then refine the fields you actually care about.

Frequently asked questions

What is the difference between text_prompt and json_prompt in the Ideogram 4 API?

The two fields control whether your input gets rewritten before generation. text_prompt automatically enables Magic Prompt expansion — the model rewrites your plain prose into a structured prompt before rendering. json_prompt bypasses that step and feeds structured JSON straight to the diffusion model. They are mutually exclusive per request . Prefer json_prompt when layout, brand colors, or typography precision matter, because 4.0 was trained exclusively on structured JSON captions and reads them with less train/inference mismatch .

What GPU is required to run Ideogram 4 locally?

The NF4 quantized checkpoint fits on a single 24GB GPU with CUDA and Diffusers support; the FP8 checkpoint targets broader hardware that lacks Diffusers . To set it up, clone ideogram-oss/ideogram-4, run pip install ., then authenticate with hf auth login — the Hugging Face repositories are gated and require accepting the license before the weights download .

How do bounding boxes work in Ideogram 4 JSON prompts?

Bounding boxes are normalized [y_min, x_min, y_max, x_max] coordinates on a 0–1000 virtual canvas with the origin at the top-left . Inside compositional_deconstruction, list the background element first, then placed elements. Pixel precision is not required — rough placement is enough, since the model tolerates imprecise boxes. For multi-line text, split each line into a non-overlapping box to keep glyphs clean.

Why does FLASH rendering_speed return HTTP 400?

FLASH is marked "coming soon" in the 4.0 API docs and is not live yet, so requests using it currently return HTTP 400 . The available speeds are TURBO (US $0.03 per image, 12 diffusion steps), DEFAULT (US $0.06, 20 steps), and QUALITY (US $0.10, 48 steps) . Pick TURBO for ideation and QUALITY only when final fidelity matters.

Are Ideogram 4's benchmark claims independently verified?

Only partially. The DesignArena leaderboard position — 9th overall and 1st among open-weight models at launch — is third-party, but it is based on human preference voting rather than a fixed objective metric . Ideogram's own designer-preference ELO (1062), text-rendering OCR accuracy (0.97), and SpatialGenEval spatial score (0.76) are self-reported from the company's technical page and have not been independently reproduced as of this writing . Validate against the published Hugging Face and GitHub artifacts before committing production workflows.

Ideogram 4 was trained on JSON — plain prompts are second-class

What Ideogram 4 unlocks that 3 didn't

Authenticating and picking your entry point

Writing a JSON prompt: the format Ideogram 4 expects

Known issues and incomplete parts in 4.0 today

How to go further with Ideogram 4

Frequently asked questions

What is the difference between text_prompt and json_prompt in the Ideogram 4 API?

What GPU is required to run Ideogram 4 locally?

How do bounding boxes work in Ideogram 4 JSON prompts?

Why does FLASH rendering_speed return HTTP 400?

Are Ideogram 4's benchmark claims independently verified?

Featured posts

Tags

Ideogram 4 was trained on JSON — plain prompts are second-class

What Ideogram 4 unlocks that 3 didn't

Authenticating and picking your entry point

Writing a JSON prompt: the format Ideogram 4 expects

Known issues and incomplete parts in 4.0 today

How to go further with Ideogram 4

Frequently asked questions

What is the difference between text_prompt and json_prompt in the Ideogram 4 API?

What GPU is required to run Ideogram 4 locally?

How do bounding boxes work in Ideogram 4 JSON prompts?

Why does FLASH rendering_speed return HTTP 400?

Are Ideogram 4's benchmark claims independently verified?

Featured posts

Tags

Sign up for insights and ideas