ReferenceUpdated 2025-12-30
Streaming chat completions
Streaming chat completions explained with examples and guidance for OpenAI-compatible APIs.
Streaming chat completions send partial tokens over a single HTTP response instead of waiting for the full output.
This improves perceived latency and enables responsive chat UIs.
Definition
Streaming chat completions
A streaming chat completion is a response delivered incrementally as a sequence of delta chunks rather than one final message.
Why it matters
- Lower time-to-first-token for a better user experience.
- Ability to cancel early once the output is sufficient.
- Smoother progress indicators for long responses.
How it works
- 01Set stream: true in the request body.
- 02Consume chunks from the SDK iterator or the HTTP response stream.
- 03Append delta content to build the final message.
Example request
curl -N https://api.abliteration.ai/v1/chat/completions \
-H "Authorization: Bearer $ABLIT_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "abliterated-model",
"messages": [{"role":"user","content":"Write a short haiku about the ocean."}],
"stream": true
}'FAQ
Frequently asked questions.
Do I need WebSockets to stream?
No. Streaming uses a long-lived HTTP response, so standard HTTPS works.
Can I cancel a stream?
Yes. Close the connection or abort the request to stop generation early.
Does streaming change billing?
No. Usage is still counted by tokens generated and processed.