You want the Perplexity API to only search or retrieve data from certain websites, not the entire internet. This is useful for internal knowledge bases, trusted news sources, or curated content sets. The Perplexity API does not offer a built-in domain whitelist filter like some search engines do. This article explains how to achieve domain restriction by combining the API with a custom Python script that validates sources before returning results.
Key Takeaways: Domain Restriction for Perplexity API
- Python script with urllib.parse: Extracts the domain from each API result URL and compares it against your allowed list.
- Allowed domains list: A simple Python list or set you define at the start of your script with the exact domains you trust.
- Response filtering loop: Iterates through API results, keeps only those matching your allowed domains, and returns the filtered list.
Why the Perplexity API Does Not Have a Built-In Domain Filter
The Perplexity API is designed as a general-purpose search and answer engine. It returns the most relevant sources across the entire web based on your query. The API does not include a parameter like site:example.com that you can pass in the request body. This is a deliberate design choice to maintain simplicity and broad utility.
However, many business use cases require restricting results to specific domains. For example, a legal research tool may only want results from government websites. A customer support bot may only want results from your company’s help center. Without a native filter, you must implement domain restriction on the client side. This means you write code that receives the API response, inspects the source URLs, and discards any that do not match your allowed list.
What You Need Before You Start
You need a Perplexity API key with an active subscription. You need Python 3.7 or later installed on your machine. You need the requests library. Install it with pip install requests. You also need a list of allowed domains. Write them exactly as they appear in URLs, such as example.com or support.microsoft.com. Do not include the protocol or trailing slash.
Steps to Build a Python Script That Filters API Results by Domain
The following steps create a script that sends a query to the Perplexity API, receives the response, and keeps only those results whose domain is in your allowed list.
- Create a new Python file
Open your code editor and create a file namedperplexity_domain_filter.py. - Import required libraries
Addimport requestsandfrom urllib.parse import urlparseat the top of the file. - Define your allowed domains
Create a Python set with the exact domains you allow. Example:ALLOWED_DOMAINS = {"wikipedia.org", "bbc.com", "reuters.com"} - Set your API key and endpoint
DefineAPI_KEY = "your-key-here"andAPI_URL = "https://api.perplexity.ai/search". Replaceyour-key-herewith your actual key. - Define the search function
Create a function namedsearch_perplexity(query). Inside, build the request headers and payload. The headers should include"Authorization": f"Bearer {API_KEY}"and"Content-Type": "application/json". The payload should be a Python dictionary with at least the key"query"set to the query string. - Send the request and parse the response
Userequests.post(API_URL, headers=headers, json=payload). Check thatresponse.status_code == 200. Parse the JSON withdata = response.json(). - Extract sources from the response
The API response typically contains a key like"sources"or"results". Iterate over each source item. For each source, extract the URL usingsource.get("url", ""). - Parse the domain from each URL
Useparsed = urlparse(url)to get the netloc. Then extract the domain withdomain = parsed.netloc.lower(). Remove anywww.prefix if present. - Check the domain against your allowed list
Write an if statement:if domain in ALLOWED_DOMAINS:. If true, append the source to a new list calledfiltered_sources. If false, skip the source. - Return only the filtered results
After the loop, return or printfiltered_sources. Your script now returns only results from your allowed domains.
Complete Example Script
Here is the full script you can copy and adapt:
import requests
from urllib.parse import urlparse
ALLOWED_DOMAINS = {"wikipedia.org", "bbc.com", "reuters.com"}
API_KEY = "your-api-key-here"
API_URL = "https://api.perplexity.ai/search"
def search_perplexity(query):
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
payload = {
"query": query,
"model": "sonar-pro"
}
response = requests.post(API_URL, headers=headers, json=payload)
if response.status_code != 200:
return []
data = response.json()
sources = data.get("sources", [])
filtered = []
for source in sources:
url = source.get("url", "")
if not url:
continue
parsed = urlparse(url)
domain = parsed.netloc.lower()
if domain.startswith("www."):
domain = domain[4:]
if domain in ALLOWED_DOMAINS:
filtered.append(source)
return filtered
results = search_perplexity("latest technology news")
print(results)
Common Issues With Domain Filtering and Their Solutions
The API Response Does Not Include URLs
Some Perplexity API endpoints return only an answer text with no separate sources list. Check your API documentation to confirm you are using an endpoint that returns source URLs. If not, switch to the /search endpoint or the /chat/completions endpoint with source citations enabled.
Subdomain Mismatches
If you allow example.com but the result comes from blog.example.com, the strict check will reject it. To include all subdomains, modify the check to see if the domain ends with .example.com or equals example.com. Use if domain == ALLOWED or domain.endswith("." + ALLOWED).
Rate Limits and Quotas
Filtering on the client side does not reduce your API call count. You still pay for every query, even if most results are discarded. To save costs, consider pre-filtering your query by including domain hints in the query text. For example, include site:nature.com in the query string. This is not guaranteed to work but can reduce irrelevant results.
| Item | Client-Side Filtering | Query-Level Hints |
|---|---|---|
| Reliability | Always works | May be ignored by the API |
| Cost | Full API call cost | Same cost but fewer results discarded |
| Implementation effort | Medium, requires Python code | Low, just modify query string |
| Flexibility | High, supports any domain list | Low, depends on API behavior |
You can now restrict Perplexity API results to any set of domains using a short Python script. Next, consider adding a caching layer to avoid repeated API calls for the same query. An advanced tip is to use a Bloom filter for very large domain allowlists to keep memory usage low.