Perplexity offers several AI models through its API, including llama-3.1-sonar-small-128k-online, llama-3.1-sonar-large-128k-online, and llama-3.1-sonar-huge-128k-online. Each model has different capabilities in terms of speed, cost, and answer quality. You may need to switch between these models for different tasks such as quick lookups versus deep research. This article explains how to specify the model parameter in a single API request without changing your entire configuration.

Key Takeaways: Switching Perplexity API Models Per Request

API request parameter model: Set the model field in your JSON payload to switch models for each call.
Available model IDs: Use exact strings like llama-3.1-sonar-small-128k-online for small, llama-3.1-sonar-large-128k-online for large, and llama-3.1-sonar-huge-128k-online for huge.
No need to re-authenticate: The API key stays the same across model switches; only the request body changes.

Understanding the Perplexity API Model Parameter

The Perplexity API uses a RESTful interface where each request is independent. This means you can change the model in every request without affecting previous or subsequent calls. The model is specified in the JSON body of the POST request to the chat completions endpoint.

The API endpoint is https://api.perplexity.ai/chat/completions. You send a POST request with headers including your API key and Content-Type, and a body containing the model name, messages, and optional parameters like temperature or max_tokens. The model field is a string that matches one of the supported model IDs.

Perplexity currently offers three main online models. The small model is fastest and cheapest, suitable for simple Q&A. The large model balances speed and quality for general use. The huge model provides the most thorough answers but costs more and responds slower. You can switch between them per request based on the complexity of each question.

There is no need to create multiple API clients or maintain separate sessions. Each request is stateless. The model you choose only applies to that single response. This flexibility lets you optimize cost and performance dynamically.

Steps to Specify the Model in a Single API Request

Follow these steps to send a request with a specific model. You can use any programming language that supports HTTP requests. The examples below use Python with the requests library, but the same JSON structure works with cURL, JavaScript, or other tools.

Method 1: Using cURL from the Command Line

Prepare your API key
Get your API key from the Perplexity API dashboard at https://www.perplexity.ai/settings/api. Copy the key and keep it secure.
Write the cURL command
Use the following template. Replace YOUR_API_KEY with your actual key and set the model to llama-3.1-sonar-large-128k-online for this example.
curl -X POST "https://api.perplexity.ai/chat/completions" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model": "llama-3.1-sonar-large-128k-online", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
Execute the command
Run the command in your terminal. The API returns a JSON response with the model’s answer. The response includes the model field confirming which model was used.
Change the model for the next request
To switch to the huge model, replace the model string with llama-3.1-sonar-huge-128k-online and send the request again.

Method 2: Using Python with the Requests Library

Install the requests library
If you don’t have it, run pip install requests in your terminal.
Write the Python script
Create a file named perplexity_request.py with the following content:
import requests
url = "https://api.perplexity.ai/chat/completions" headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" } payload = { "model": "llama-3.1-sonar-small-128k-online", "messages": [ {"role": "user", "content": "Explain quantum computing in simple terms."} ] } response = requests.post(url, json=payload, headers=headers) print(response.json()["choices"][0]["message"]["content"])
Run the script
Execute python perplexity_request.py. The script prints the model’s answer. To use a different model, change the value of the model key in the payload dictionary.

Method 3: Using JavaScript with Fetch (Node.js or Browser)

Set up the fetch call
Use the following code in a Node.js environment (requires node-fetch or built-in fetch in newer versions):
const url = "https://api.perplexity.ai/chat/completions"; const headers = { "Authorization": "Bearer YOUR_API_KEY", "Content-Type": "application/json" }; const body = JSON.stringify({ model: "llama-3.1-sonar-huge-128k-online", messages: [{ role: "user", content: "What are the benefits of renewable energy?" }] }); fetch(url, { method: "POST", headers, body }) .then(res => res.json()) .then(data => console.log(data.choices[0].message.content));
Run the code
Execute the script with Node.js. The console displays the response. Change the model string to switch models in subsequent calls.

Common Mistakes When Switching Models

The API Returns a 400 Error for Unknown Model Names

If you receive a 400 Bad Request error, the model name is likely incorrect. Double-check the spelling and ensure you use the exact model ID from the Perplexity documentation. Common mistakes include typos like ‘llama-3.1-sonar-large’ missing the context length suffix or using outdated model names.

Model Switch Does Not Affect the Response Quality

If you switch models but see no difference in answers, verify that the model field in the request body is being sent correctly. Some API clients may cache the previous request or ignore the model field if it’s placed incorrectly. Always print or log the full response to confirm the model field returned by the API matches what you requested.

Rate Limits Are Reached After Switching Models

Each model has its own rate limit based on your subscription plan. Switching to a larger model may consume more tokens per request, causing you to hit your rate limit sooner. Check your plan limits and adjust the model selection accordingly. You can also implement retry logic with exponential backoff.

Perplexity API Models: Small vs Large vs Huge

Item	Small (llama-3.1-sonar-small-128k-online)	Large (llama-3.1-sonar-large-128k-online)	Huge (llama-3.1-sonar-huge-128k-online)
Context window	128k tokens	128k tokens	128k tokens
Speed	Fastest	Balanced	Slowest
Cost per 1M input tokens	$0.20	$1.00	$5.00
Cost per 1M output tokens	$0.80	$4.00	$20.00
Best use case	Simple Q&A, quick definitions	General research, summaries	Deep analysis, complex reasoning
Availability	All plans	All plans	Pro and Enterprise plans

Each model supports the same online search capability. The difference lies in the underlying neural network size and the resulting answer depth. The small model produces concise answers, while the huge model generates detailed explanations with more citations. Choose based on the trade-off between cost and required response quality.

You can switch models per request without any additional setup. This lets you use the small model for routine queries and the huge model for critical research within the same application. The API key and authentication remain unchanged across all model switches.

← Back to WiseChecker Home More in Windows & PC