For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
ModelsChatRankingsDocs
DocsAPI ReferenceClient SDKsAgent SDKCookbookChangelog
DocsAPI ReferenceClient SDKsAgent SDKCookbookChangelog
  • Overview
    • Quickstart
    • Principles
    • Models
    • Stripe Projects
    • FAQ
    • Report Feedback
  • Models & Routing
    • Model Fallbacks
    • Provider Selection
    • Auto Exacto
    • Private Models
  • Features
    • Workspaces
    • Presets
    • Response Caching
    • Tool Calling
    • Structured Outputs
    • Message Transforms
    • Zero Completion Insurance
    • ZDR
    • App Attribution
    • Service Tiers
    • Sovereign AI
    • Router Metadata
    • Input & Output Logging
LogoLogo
ModelsChatRankingsDocs
On this page
  • Service Tiers
  • Using Service Tiers
  • Supported Providers
  • API Response Differences
Features

Service Tiers

Control cost and latency tradeoffs with service tier selection
Was this page helpful?
Previous

Sovereign AI

Keep AI workloads within national and regional boundaries
Next
Built with

Service Tiers

The service_tier parameter lets you control cost and latency tradeoffs when sending requests through OpenRouter. You can pass it in your request to select a specific processing tier, and the response will indicate which tier was actually used. Your request is billed at the actual served tier’s rate.

Using Service Tiers

Pass service_tier as a top-level parameter in your request body. Supported values are flex (lower cost, higher latency) and priority (faster, higher cost). The example below requests the flex tier from OpenAI’s gpt-5 for a 50% discount in exchange for higher latency and lower availability.

$curl https://openrouter.ai/api/v1/chat/completions \
> -H "Authorization: Bearer {{API_KEY_REF}}" \
> -H "Content-Type: application/json" \
> -d '{
> "model": "{{MODEL}}",
> "service_tier": "flex",
> "messages": [
> { "role": "user", "content": "What is the meaning of life?" }
> ]
> }'

The service_tier parameter is also accepted on the Responses API and the Anthropic Messages API — see API Response Differences below for where the response field is returned in each.

Anthropic Messages API
$curl https://openrouter.ai/api/v1/messages \
> -H "Authorization: Bearer <OPENROUTER_API_KEY>" \
> -H "Content-Type: application/json" \
> -d '{
> "model": "openai/gpt-5",
> "service_tier": "flex",
> "max_tokens": 1024,
> "messages": [
> { "role": "user", "content": "What is the meaning of life?" }
> ]
> }'

Supported Providers

The following providers support flex and priority service tiers for select models:

  • OpenAI
  • Google Vertex
  • Google AI Studio

The response’s service_tier field reports which tier was actually used. Possible response values are default, flex, priority, or null when no service tier is available from upstream. Note that OpenRouter normalizes provider-equivalent base tier labels, such as Google’s standard, to default.

Provider documentation:

  • OpenAI: Chat Completions, Responses, and pricing
  • Google Vertex: Flex and Priority
  • Google AI Studio: Flex and Priority

API Response Differences

The API response includes a service_tier field that indicates which capacity tier was actually used to serve your request. The placement of this field varies by API format:

  • Chat Completions API (/api/v1/chat/completions): service_tier is returned at the top level of the response object, matching OpenAI’s native format.
  • Responses API (/api/v1/responses): service_tier is returned at the top level of the response object, matching OpenAI’s native format.
  • Messages API (/api/v1/messages): service_tier is returned inside the usage object, matching Anthropic’s native format.