Why does the first response in my OpenAI Realtime C# session show cached token usage (cached_input_tokens) even though no prior context exists?

1 week ago 16

ARTICLE AD BOX

I'm working with the OpenAI.Realtime C# SDK and logging detailed token‑usage metrics for a multi‑turn audio conversation. My setup is based on a custom RealtimeScenarioBase class and a derived scenario (RT_Test3_TwoVoices) that sends two audio inputs and receives two audio responses.

Everything works, but I’ve noticed a strange token‑usage pattern:

Problem

The first assistant response in the session shows cached input tokens, even though:

No previous conversation history exists

No prior user messages were sent

The session was just created

The model is gpt-realtime-mini (Azure deployment)

Example Logged Usage

Timestamp: 2026-03-22T02:49:15.7672185+00:00 Step: Voice1 Model: realtime-mini-test-3 InputTokens: 1459 InputTextTokens: 1349 InputAudioTokens: 110 CachedInputTokens: 1408 CachedTextInputTokens: 1344 CachedAudioInputTokens: 64 OutputTokens: 396 OutputTextTokens: 97 OutputAudioTokens: 299 TotalTokens: 1855

The second turn also shows cached tokens, which is expected, but the first turn should not, unless I'm misunderstanding how caching works.

My Question

Why does the first assistant response in a Realtime session show cached token usage?

Is this expected behavior for gpt-realtime-mini models, or am I missing a usage event earlier in the session lifecycle (e.g., during session.created, session.updated, or instruction injection)?

Relevant Code

I have a large base class, but here are the key parts:

Session setup

var sessionOptions = new RealtimeConversationSessionOptions { Instructions = firstLargeInstruction, AudioOptions = BuildStandardAudioOptions(), }; var session = await CreateAndConfigureSessionAsync(sessionOptions);

First turn

await SendAudioFileAsync(session, DefaultInputAudioPath); await session.StartResponseAsync();

Token usage extraction

case RealtimeServerUpdateResponseDone responseDone: var usageMetrics = ReadUsage(responseDone.Response.Usage); // ...

What I’ve Considered

Maybe the system instructions (firstLargeInstruction) are being cached?

Maybe the model preloads or embeds internal system prompts?

Maybe I’m missing a usage event during session.updated?

Maybe Azure’s gpt-realtime-mini deployment uses internal warm‑start caching?

What I Need to Know

Is cached token usage in the first response expected for Realtime models?

Does the Realtime API internally cache system instructions or model preamble?

Should I be capturing token usage earlier in the session lifecycle?

Or is this a bug in my usage‑tracking logic?

Packages

Read Entire Article

LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.