Docs
Streaming chat completions
Stream chat completions with abliteration.ai. Set stream: true and iterate over delta chunks.
Updated 2025-12-30
Streaming reduces time-to-first-token and delivers partial output as it is generated.
Use the OpenAI SDK with stream: true and iterate over chunks to render tokens immediately.
Streaming is ideal for chat UIs, typing indicators, and long-form generation where early feedback matters.
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.ABLIT_KEY,
baseURL: "https://api.abliteration.ai/v1",
});
const stream = await client.chat.completions.create({
model: "abliterated-model",
messages: [{ role: "user", content: "Write a short haiku about the ocean." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}When to stream
Stream when you want faster perceived latency or to show partial output.
How streaming works
The response is sent as a series of chunks. Each chunk contains a delta that you append to the final message.
Python streaming example
The Python SDK yields chunks you can iterate over. Append delta content as it arrives.
from openai import OpenAI
client = OpenAI(
base_url="https://api.abliteration.ai/v1",
api_key="YOUR_ABLIT_KEY",
)
stream = client.chat.completions.create(
model="abliterated-model",
messages=[{"role": "user", "content": "Write a short haiku about the ocean."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="")UI and reliability tips
Streaming is best-effort over long-lived HTTP connections, so plan for reconnects and graceful fallbacks.