ARTICLE AD BOX
I'm working with the OpenAI.Realtime C# SDK and logging detailed token‑usage metrics for a multi‑turn audio conversation. My setup is based on a custom RealtimeScenarioBase class and a derived scenario (RT_Test3_TwoVoices) that sends two audio inputs and receives two audio responses.
Everything works, but I’ve noticed a strange token‑usage pattern:
Problem
The first assistant response in the session shows cached input tokens, even though:
No previous conversation history exists
No prior user messages were sent
The session was just created
The model is gpt-realtime-mini (Azure deployment)
Example Logged Usage
Timestamp: 2026-03-22T02:49:15.7672185+00:00 Step: Voice1 Model: realtime-mini-test-3 InputTokens: 1459 InputTextTokens: 1349 InputAudioTokens: 110 CachedInputTokens: 1408 CachedTextInputTokens: 1344 CachedAudioInputTokens: 64 OutputTokens: 396 OutputTextTokens: 97 OutputAudioTokens: 299 TotalTokens: 1855The second turn also shows cached tokens, which is expected, but the first turn should not, unless I'm misunderstanding how caching works.
My Question
Why does the first assistant response in a Realtime session show cached token usage?
Is this expected behavior for gpt-realtime-mini models, or am I missing a usage event earlier in the session lifecycle (e.g., during session.created, session.updated, or instruction injection)?
Relevant Code
I have a large base class, but here are the key parts:
Session setup
var sessionOptions = new RealtimeConversationSessionOptions { Instructions = firstLargeInstruction, AudioOptions = BuildStandardAudioOptions(), }; var session = await CreateAndConfigureSessionAsync(sessionOptions);First turn
await SendAudioFileAsync(session, DefaultInputAudioPath); await session.StartResponseAsync();Token usage extraction
case RealtimeServerUpdateResponseDone responseDone: var usageMetrics = ReadUsage(responseDone.Response.Usage); // ...What I’ve Considered
Maybe the system instructions (firstLargeInstruction) are being cached?
Maybe the model preloads or embeds internal system prompts?
Maybe I’m missing a usage event during session.updated?
Maybe Azure’s gpt-realtime-mini deployment uses internal warm‑start caching?
What I Need to Know
Is cached token usage in the first response expected for Realtime models?
Does the Realtime API internally cache system instructions or model preamble?
Should I be capturing token usage earlier in the session lifecycle?
Or is this a bug in my usage‑tracking logic?
Packages
<PackageReference Include="Microsoft.Extensions.AI.OpenAI" Version="10.4.0" /> <PackageReference Include="OpenAI" Version="2.9.1" />