OpenAI: Migrating to GPT-5.6

Use this guide when you are preparing existing OpenAI traffic for GPT-5.6. GPT-5.6 is currently tracked as a limited preview. In AI Stats, the models are listed as coming soon until public gateway routing is active, so treat this as a readiness checklist rather than a same-day production cutover.

Choose the right GPT-5.6 model

Model	Use it for	Reasoning effort
`openai/gpt-5.6-sol`	Highest-capability reasoning, agentic coding, scientific analysis, and complex professional work	`none`, `low`, `medium`, `high`, `xhigh`, `max`
`openai/gpt-5.6-terra`	Balanced everyday work across reasoning, coding, and assistant workflows	`none`, `low`, `medium`, `high`, `xhigh`
`openai/gpt-5.6-luna`	Lower-latency and cost-sensitive GPT-5.6 workloads	`none`, `low`, `medium`, `high`, `xhigh`

Only Sol is currently marked for the new max reasoning effort. Do not send max to Terra or Luna unless their model metadata changes. AI Stats also tracks preview aliases for the latest model in each tier: openai/gpt-sol-latest, openai/gpt-terra-latest, and openai/gpt-luna-latest. Use the fixed GPT-5.6 IDs for controlled migrations, and use the tier aliases only when you deliberately want future Sol, Terra, or Luna releases to roll forward through the same route.

What changed

GPT-5.6 adds the new Sol/Terra/Luna split instead of one default GPT route.
Sol adds reasoning.effort: "max" for the highest reasoning budget.
Terra and Luna keep the standard non-max GPT-5.6 reasoning effort set.
Prompt caching is priced with separate uncached input, cache read, cache write, and output meters.
Explicit cache breakpoints are supported during preview, with a 30-minute minimum cache life noted in the model metadata.

Update your request

Start by swapping only the model id and keeping the rest of the request stable. The first examples use the Responses API-style input shape. If you are migrating Chat Completions traffic, keep using messages and the flat reasoning_effort field where the route supports it.

{
  "model": "openai/gpt-5.6-terra",
  "input": "Summarize the rollout risks in this migration plan.",
  "reasoning": {
    "effort": "medium"
  }
}

Use Sol’s max effort only for routes where the extra reasoning budget is worth the latency and cost.

{
  "model": "openai/gpt-5.6-sol",
  "input": "Review this multi-service incident report and propose a rollback plan.",
  "reasoning": {
    "effort": "max"
  }
}

If your integration still sends the flat OpenAI-compatible field, AI Stats also accepts reasoning_effort where the route supports it:

{
  "model": "openai/gpt-5.6-sol",
  "messages": [
    {
      "role": "user",
      "content": "Design a test plan for this agent workflow."
    }
  ],
  "reasoning_effort": "max"
}

Review pricing

GPT-5.6 pricing is tracked per 1M tokens in the catalog.

Model	Input	Cache read	Cache write	Output
Sol	$5.00	$0.50	$6.25	$30.00
Terra	$2.50	$0.25	$3.125	$15.00
Luna	$1.00	$0.10	$1.25	$6.00

Cache reads are priced separately from cache writes. In the current catalog, cache reads use a 90% discount from uncached input, while cache writes are priced at 1.25x uncached input.

Use prompt caching deliberately

For repeated context, keep the stable part of the prompt in cacheable blocks and leave request-specific text uncached.

{
  "model": "openai/gpt-5.6-sol",
  "input": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "Stable policy document...",
          "cache_control": {
            "type": "ephemeral",
            "ttl": "1h"
          }
        },
        {
          "type": "input_text",
          "text": "Apply the policy to this new customer request."
        }
      ]
    }
  ],
  "prompt_cache_retention": "24h"
}

Use cache_control when you want provider-neutral cache hints or explicit cache breakpoints. Use prompt_cache_retention when you want to pass OpenAI cache-retention options directly.

What to test

Reasoning and output quality

Sol at high, xhigh, and max on your hardest tasks
Terra and Luna at the effort levels you expect to expose to users
structured outputs and schema pass rate at each effort level
tool-call selection and argument quality

Cost and latency

latency at each reasoning effort
output token growth when moving from older GPT-5.x models
cache read/write mix on repeated prompts
cost per successful task, not just price per token

Rollback

keep your previous GPT-5.x route available as a fallback
keep max behind a config flag or preset until it is proven on production-like prompts
monitor cache write volume separately from cache read volume
do not make GPT-5.6 your default route until the model page shows active gateway providers

​OpenAI: Migrating to GPT-5.6

​Choose the right GPT-5.6 model

​What changed

​Update your request

​Review pricing

​Use prompt caching deliberately

​What to test

​Reasoning and output quality

​Cost and latency

​Rollback

​Sources