How to Stream Perplexity API Responses With Server-Sent Events
🔍 WiseChecker

How to Stream Perplexity API Responses With Server-Sent Events

Streaming Perplexity API responses using Server-Sent Events allows your application to display real-time answers as they are generated instead of waiting for the full response. SSE is a standard protocol that sends data from the server to the client over a single HTTP connection. This article explains how to configure your API request to enable streaming and how to handle the incoming event stream in JavaScript and Python. You will learn the exact parameters, event types, and code patterns required for a working implementation.

Key Takeaways: Streaming Perplexity API With Server-Sent Events

  • API request parameter stream: true: Enables Server-Sent Events mode for real-time token delivery.
  • Event types text_chunk and done: Parse these events to capture partial answer text and detect stream completion.
  • Standard SSE client (JavaScript EventSource or Python requests with streaming): Use these tools to consume the event stream without custom parsing.

ADVERTISEMENT

Understanding Server-Sent Events for the Perplexity API

Server-Sent Events is a web standard that lets a server push data to a client over a long-lived HTTP connection. Unlike WebSockets, SSE is unidirectional — the server sends data only. The Perplexity API supports SSE when you set the stream parameter to true in your request body. Instead of receiving a single JSON response, the server sends multiple events, each containing a fragment of the generated text. This approach reduces perceived latency for users because they see the answer appear word by word.

The SSE stream uses the standard text/event-stream content type. Each event has a data field and an optional event field. Perplexity emits two event types: text_chunk for each piece of generated text and done to signal the end of the stream. The data field for text_chunk events contains a JSON object with the partial answer, while the done event includes the full answer and metadata.

What You Need Before You Start

To stream responses from the Perplexity API, you need a valid API key from the Perplexity developer portal. You also need a development environment that can make HTTP requests and parse SSE streams. This guide covers JavaScript in the browser and Python 3.7 or later. No additional libraries are required for the basic implementation, but you may use the eventsource polyfill for older browsers or the sseclient library in Python for convenience.

Steps to Enable Streaming and Parse Server-Sent Events

The following steps show how to set the stream parameter and handle the incoming events in JavaScript and Python. Both methods follow the same logical flow: open a connection, listen for events, process each text chunk, and close the stream when the done event arrives.

Method 1: JavaScript in the Browser

  1. Create an EventSource object
    Use the EventSource constructor to connect to the Perplexity API endpoint. Pass the API endpoint URL and your API key as a query parameter or set the Authorization header. The EventSource constructor does not support custom headers, so you must include the API key in the URL as a query parameter if your backend proxy forwards it. Alternatively, use the fetch API with a ReadableStream for full header control.
  2. Listen for the text_chunk event
    Attach an event listener to the EventSource object for the text_chunk event. Inside the callback, parse the event.data property as JSON. Extract the text field from the parsed object and append it to the display element in your UI. Each event contains a small piece of the answer.
  3. Listen for the done event
    Add a second event listener for the done event. When this event fires, parse the JSON data to retrieve the full answer and any metadata such as usage statistics or citations. After processing, call eventSource.close() to terminate the connection.
  4. Handle errors
    Use the onerror callback on the EventSource object to detect connection failures or server errors. The readyState property indicates the connection status. Reconnect logic can be implemented by creating a new EventSource after a delay.

Method 2: Python With the Requests Library

  1. Send a POST request with stream enabled
    Use the requests.post method with the stream=True parameter. Set the Authorization header to Bearer YOUR_API_KEY. Include the stream: true field in the JSON body along with the model name and prompt.
  2. Iterate over the response lines
    Use response.iter_lines() to process each line of the SSE stream. Each line that starts with data: contains a JSON payload. Skip lines that are empty or start with event: if you prefer to parse the event type from the data field.
  3. Parse each data line
    Strip the data: prefix and parse the remaining string as JSON. Check the type field of the parsed object. If the type is text_chunk, extract the text field and yield or print it. If the type is done, break the loop and store the final answer and metadata.
  4. Close the response
    After the loop ends, call response.close() to free the connection. Alternatively, use a context manager with the with statement to ensure the response is closed automatically.

ADVERTISEMENT

Common Issues and Limitations When Streaming

EventSource Does Not Support Custom Headers

The browser EventSource API does not allow setting custom HTTP headers, such as Authorization. To work around this, you must proxy the request through your own backend server that adds the API key. Alternatively, use the fetch API with a ReadableStream and parse the SSE manually. The fetch approach gives you full control over headers and request body.

Streaming Stops Before the Answer Is Complete

If the connection drops or the server times out, the stream may end prematurely. Implement reconnection logic that detects the done event or the onerror callback. In Python, catch requests.exceptions.ChunkedEncodingError and retry the request with the same prompt. The Perplexity API does not support resuming a partial stream, so you must restart the request.

Events Arrive Out of Order

SSE events are delivered in the order the server sends them. Network buffering or proxy servers may reorder packets, but the event stream itself preserves order. If you observe out-of-order text, verify that you are not processing events from multiple concurrent connections. Each stream is independent and sequential.

Perplexity API Streaming vs Non-Streaming Response

Item Streaming (SSE) Non-Streaming
Response delivery One event per token or small chunk Single JSON response after full generation
Perceived latency Low — user sees text appear immediately Higher — user waits for complete answer
Connection duration Open until all events are sent Closed after the single response
Error handling Partial output may be lost on failure No partial output — retry the whole request
Client complexity Requires event parsing and reconnection logic Simple — parse one JSON object

Streaming is best for chat interfaces and real-time displays. Non-streaming is simpler for batch processing or server-side aggregation where latency is not critical.

Conclusion

You can now stream Perplexity API responses using Server-Sent Events by setting stream: true in your request and parsing the text_chunk and done events. In JavaScript, use the EventSource API or fetch with a ReadableStream for full header control. In Python, use the requests library with stream=True and iterate over lines. As a next step, implement reconnection logic to handle network interruptions gracefully. For advanced use, combine streaming with the citation parameter to receive citation data in the done event.

ADVERTISEMENT