GMI gpt-image-2 — debug report

A controlled sample of 18 cells × 7 repeats of the same request. Goal: confirm that “faster and cheaper” is not a property of the model but a side effect of non-deterministic routing and a broken size parameter. The underlying model is genuine OpenAI gpt-image-2 (C2PA signature).

Bottom line:

At a glance — by quality tier

Key metrics grouped by quality (low / medium / high) and split by request type — this is where the tiers diverge.

1 · Wrong size depends SOLELY on the backend

1024×1024 was requested every single time. Correct / wrong split by the backend identified from the C2PA signature.

correct 1024×1024 wrong size

direct (“OpenAI gpt-image (direct)”) — served straight from OpenAI's own API. Identified by the C2PA Content Credentials manifest in the PNG (claim_generator = gpt-image, urn:c2pa). In this test this is the path that ignores the requested size and returns the wrong resolution on a cheap token tier.

Azure (“Azure OpenAI ImageGen”) — the same OpenAI model served via Microsoft Azure OpenAI. Identified by the C2PA softwareAgent = Azure OpenAI ImageGen. In this test it always returned the correct 1024×1024 with honest token billing.

unknown — the returned PNG carried no recognizable C2PA provenance (neither marker). Here it coincided every time with the 0-token “free” images — most likely a cache / provenance-stripped path that is not token-billed.

2 · What actually came back

Distribution of actual resolutions (all requested 1024×1024).

3 · Same request → different cost

Cost min–max spread within a cell (7 identical repeats). Green dot = cheapest run, red dot = most expensive.

4 · Lottery: cost × time by backend

Each dot = 1 request. Note the separated clusters: the cheap/fast “direct” path (wrong size) vs the honest “Azure”.

OpenAI gpt-image (direct) Azure OpenAI ImageGen unknown (freebie, 0 tok)

5 · Visual evidence

Click any thumbnail to open the full image. The edits come out correctly — the problem is the size and the billing, not the quality.

6 · Full data (126 requests)

Every row has a thumbnail — hover it for a larger preview, click it to open the full-resolution image. Group, filter, sort and search below.

Pricing basis: $30/1M img-out, $8/1M img-in, $5/1M txt-in (identical to the original benchmark). Backend identified from the C2PA manifest embedded in the PNG files. Provider-side timestamps can be inconsistent between backends. Raw data: results.jsonl, results.csv, full responses in responses/, images in outputs/. Scripts: run_debug.py, report.py.

GMI Cloud gpt-image-2 — debug report