Skip to main content

OpenAI: Migrating to GPT-5.6

Use this guide when you are preparing existing OpenAI traffic for GPT-5.6. GPT-5.6 is currently tracked as a limited preview. In AI Stats, the models are listed as coming soon until public gateway routing is active, so treat this as a readiness checklist rather than a same-day production cutover.

Choose the right GPT-5.6 model

ModelUse it forReasoning effort
openai/gpt-5.6-solHighest-capability reasoning, agentic coding, scientific analysis, and complex professional worknone, low, medium, high, xhigh, max
openai/gpt-5.6-terraBalanced everyday work across reasoning, coding, and assistant workflowsnone, low, medium, high, xhigh
openai/gpt-5.6-lunaLower-latency and cost-sensitive GPT-5.6 workloadsnone, low, medium, high, xhigh
Only Sol is currently marked for the new max reasoning effort. Do not send max to Terra or Luna unless their model metadata changes. AI Stats also tracks preview aliases for the latest model in each tier: openai/gpt-sol-latest, openai/gpt-terra-latest, and openai/gpt-luna-latest. Use the fixed GPT-5.6 IDs for controlled migrations, and use the tier aliases only when you deliberately want future Sol, Terra, or Luna releases to roll forward through the same route.

What changed

  • GPT-5.6 adds the new Sol/Terra/Luna split instead of one default GPT route.
  • Sol adds reasoning.effort: "max" for the highest reasoning budget.
  • Terra and Luna keep the standard non-max GPT-5.6 reasoning effort set.
  • Prompt caching is priced with separate uncached input, cache read, cache write, and output meters.
  • Explicit cache breakpoints are supported during preview, with a 30-minute minimum cache life noted in the model metadata.

Update your request

Start by swapping only the model id and keeping the rest of the request stable. The first examples use the Responses API-style input shape. If you are migrating Chat Completions traffic, keep using messages and the flat reasoning_effort field where the route supports it.
{
  "model": "openai/gpt-5.6-terra",
  "input": "Summarize the rollout risks in this migration plan.",
  "reasoning": {
    "effort": "medium"
  }
}
Use Sol’s max effort only for routes where the extra reasoning budget is worth the latency and cost.
{
  "model": "openai/gpt-5.6-sol",
  "input": "Review this multi-service incident report and propose a rollback plan.",
  "reasoning": {
    "effort": "max"
  }
}
If your integration still sends the flat OpenAI-compatible field, AI Stats also accepts reasoning_effort where the route supports it:
{
  "model": "openai/gpt-5.6-sol",
  "messages": [
    {
      "role": "user",
      "content": "Design a test plan for this agent workflow."
    }
  ],
  "reasoning_effort": "max"
}

Review pricing

GPT-5.6 pricing is tracked per 1M tokens in the catalog.
ModelInputCache readCache writeOutput
Sol$5.00$0.50$6.25$30.00
Terra$2.50$0.25$3.125$15.00
Luna$1.00$0.10$1.25$6.00
Cache reads are priced separately from cache writes. In the current catalog, cache reads use a 90% discount from uncached input, while cache writes are priced at 1.25x uncached input.

Use prompt caching deliberately

For repeated context, keep the stable part of the prompt in cacheable blocks and leave request-specific text uncached.
{
  "model": "openai/gpt-5.6-sol",
  "input": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "Stable policy document...",
          "cache_control": {
            "type": "ephemeral",
            "ttl": "1h"
          }
        },
        {
          "type": "input_text",
          "text": "Apply the policy to this new customer request."
        }
      ]
    }
  ],
  "prompt_cache_retention": "24h"
}
Use cache_control when you want provider-neutral cache hints or explicit cache breakpoints. Use prompt_cache_retention when you want to pass OpenAI cache-retention options directly.

What to test

Reasoning and output quality

  • Sol at high, xhigh, and max on your hardest tasks
  • Terra and Luna at the effort levels you expect to expose to users
  • structured outputs and schema pass rate at each effort level
  • tool-call selection and argument quality

Cost and latency

  • latency at each reasoning effort
  • output token growth when moving from older GPT-5.x models
  • cache read/write mix on repeated prompts
  • cost per successful task, not just price per token

Rollback

  • keep your previous GPT-5.x route available as a fallback
  • keep max behind a config flag or preset until it is proven on production-like prompts
  • monitor cache write volume separately from cache read volume
  • do not make GPT-5.6 your default route until the model page shows active gateway providers

Sources

Last modified on June 28, 2026