ReferenceUpdated 2025-12-30

Streaming chat completions

Streaming chat completions explained with examples and guidance for OpenAI-compatible APIs.

Streaming chat completions send partial tokens over a single HTTP response instead of waiting for the full output.

This improves perceived latency and enables responsive chat UIs.

Definition

Streaming chat completions

A streaming chat completion is a response delivered incrementally as a sequence of delta chunks rather than one final message.

Why it matters
  • Lower time-to-first-token for a better user experience.
  • Ability to cancel early once the output is sufficient.
  • Smoother progress indicators for long responses.
How it works
  1. 01Set stream: true in the request body.
  2. 02Consume chunks from the SDK iterator or the HTTP response stream.
  3. 03Append delta content to build the final message.
Example request
curl -N https://api.abliteration.ai/v1/chat/completions \
  -H "Authorization: Bearer $ABLIT_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "abliterated-model",
    "messages": [{"role":"user","content":"Write a short haiku about the ocean."}],
    "stream": true
  }'
FAQ

Frequently asked questions.

Do I need WebSockets to stream?

No. Streaming uses a long-lived HTTP response, so standard HTTPS works.

Can I cancel a stream?

Yes. Close the connection or abort the request to stop generation early.

Does streaming change billing?

No. Usage is still counted by tokens generated and processed.