Reference

Streaming chat completions

Streaming chat completions send partial tokens over a single HTTP response instead of waiting for the full output.

This improves perceived latency and enables responsive chat UIs.

Definition of Streaming chat completions

A streaming chat completion is a response delivered incrementally as a sequence of delta chunks rather than one final message.

Why Streaming chat completions matters

Lower time-to-first-token for a better user experience.
Ability to cancel early once the output is sufficient.
Smoother progress indicators for long responses.

How it works

Set stream: true in the request body.
Consume chunks from the SDK iterator or the HTTP response stream.
Append delta content to build the final message.

Example request

curl -N https://api.abliteration.ai/v1/chat/completions \
  -H "Authorization: Bearer $ABLIT_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "abliterated-model",
    "messages": [{"role":"user","content":"Write a short haiku about the ocean."}],
    "stream": true
  }'

FAQ

Frequently Asked Questions

Do I need WebSockets to stream?

No. Streaming uses a long-lived HTTP response, so standard HTTPS works.

Can I cancel a stream?

Yes. Close the connection or abort the request to stop generation early.

Does streaming change billing?

No. Usage is still counted by tokens generated and processed.