Docs

Streaming chat completions

Stream chat completions with abliteration.ai. Set stream: true and iterate over delta chunks.

Updated 2025-12-30

Streaming reduces time-to-first-token and delivers partial output as it is generated.

Use the OpenAI SDK with stream: true and iterate over chunks to render tokens immediately.

Streaming is ideal for chat UIs, typing indicators, and long-form generation where early feedback matters.

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.ABLIT_KEY,
  baseURL: "https://api.abliteration.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "abliterated-model",
  messages: [{ role: "user", content: "Write a short haiku about the ocean." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

When to stream

Stream when you want faster perceived latency or to show partial output.

How streaming works

The response is sent as a series of chunks. Each chunk contains a delta that you append to the final message.

Python streaming example

The Python SDK yields chunks you can iterate over. Append delta content as it arrives.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.abliteration.ai/v1",
    api_key="YOUR_ABLIT_KEY",
)

stream = client.chat.completions.create(
    model="abliterated-model",
    messages=[{"role": "user", "content": "Write a short haiku about the ocean."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="")

UI and reliability tips

Streaming is best-effort over long-lived HTTP connections, so plan for reconnects and graceful fallbacks.