How to Restrict Perplexity API to Specific Domains
🔍 WiseChecker

How to Restrict Perplexity API to Specific Domains

You want the Perplexity API to only search or retrieve data from certain websites, not the entire internet. This is useful for internal knowledge bases, trusted news sources, or curated content sets. The Perplexity API does not offer a built-in domain whitelist filter like some search engines do. This article explains how to achieve domain restriction by combining the API with a custom Python script that validates sources before returning results.

Key Takeaways: Domain Restriction for Perplexity API

  • Python script with urllib.parse: Extracts the domain from each API result URL and compares it against your allowed list.
  • Allowed domains list: A simple Python list or set you define at the start of your script with the exact domains you trust.
  • Response filtering loop: Iterates through API results, keeps only those matching your allowed domains, and returns the filtered list.

ADVERTISEMENT

Why the Perplexity API Does Not Have a Built-In Domain Filter

The Perplexity API is designed as a general-purpose search and answer engine. It returns the most relevant sources across the entire web based on your query. The API does not include a parameter like site:example.com that you can pass in the request body. This is a deliberate design choice to maintain simplicity and broad utility.

However, many business use cases require restricting results to specific domains. For example, a legal research tool may only want results from government websites. A customer support bot may only want results from your company’s help center. Without a native filter, you must implement domain restriction on the client side. This means you write code that receives the API response, inspects the source URLs, and discards any that do not match your allowed list.

What You Need Before You Start

You need a Perplexity API key with an active subscription. You need Python 3.7 or later installed on your machine. You need the requests library. Install it with pip install requests. You also need a list of allowed domains. Write them exactly as they appear in URLs, such as example.com or support.microsoft.com. Do not include the protocol or trailing slash.

Steps to Build a Python Script That Filters API Results by Domain

The following steps create a script that sends a query to the Perplexity API, receives the response, and keeps only those results whose domain is in your allowed list.

  1. Create a new Python file
    Open your code editor and create a file named perplexity_domain_filter.py.
  2. Import required libraries
    Add import requests and from urllib.parse import urlparse at the top of the file.
  3. Define your allowed domains
    Create a Python set with the exact domains you allow. Example: ALLOWED_DOMAINS = {"wikipedia.org", "bbc.com", "reuters.com"}
  4. Set your API key and endpoint
    Define API_KEY = "your-key-here" and API_URL = "https://api.perplexity.ai/search". Replace your-key-here with your actual key.
  5. Define the search function
    Create a function named search_perplexity(query). Inside, build the request headers and payload. The headers should include "Authorization": f"Bearer {API_KEY}" and "Content-Type": "application/json". The payload should be a Python dictionary with at least the key "query" set to the query string.
  6. Send the request and parse the response
    Use requests.post(API_URL, headers=headers, json=payload). Check that response.status_code == 200. Parse the JSON with data = response.json().
  7. Extract sources from the response
    The API response typically contains a key like "sources" or "results". Iterate over each source item. For each source, extract the URL using source.get("url", "").
  8. Parse the domain from each URL
    Use parsed = urlparse(url) to get the netloc. Then extract the domain with domain = parsed.netloc.lower(). Remove any www. prefix if present.
  9. Check the domain against your allowed list
    Write an if statement: if domain in ALLOWED_DOMAINS:. If true, append the source to a new list called filtered_sources. If false, skip the source.
  10. Return only the filtered results
    After the loop, return or print filtered_sources. Your script now returns only results from your allowed domains.

Complete Example Script

Here is the full script you can copy and adapt:

import requests
from urllib.parse import urlparse

ALLOWED_DOMAINS = {"wikipedia.org", "bbc.com", "reuters.com"}
API_KEY = "your-api-key-here"
API_URL = "https://api.perplexity.ai/search"

def search_perplexity(query):
    headers = {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    }
    payload = {
        "query": query,
        "model": "sonar-pro"
    }
    response = requests.post(API_URL, headers=headers, json=payload)
    if response.status_code != 200:
        return []
    data = response.json()
    sources = data.get("sources", [])
    filtered = []
    for source in sources:
        url = source.get("url", "")
        if not url:
            continue
        parsed = urlparse(url)
        domain = parsed.netloc.lower()
        if domain.startswith("www."):
            domain = domain[4:]
        if domain in ALLOWED_DOMAINS:
            filtered.append(source)
    return filtered

results = search_perplexity("latest technology news")
print(results)

ADVERTISEMENT

Common Issues With Domain Filtering and Their Solutions

The API Response Does Not Include URLs

Some Perplexity API endpoints return only an answer text with no separate sources list. Check your API documentation to confirm you are using an endpoint that returns source URLs. If not, switch to the /search endpoint or the /chat/completions endpoint with source citations enabled.

Subdomain Mismatches

If you allow example.com but the result comes from blog.example.com, the strict check will reject it. To include all subdomains, modify the check to see if the domain ends with .example.com or equals example.com. Use if domain == ALLOWED or domain.endswith("." + ALLOWED).

Rate Limits and Quotas

Filtering on the client side does not reduce your API call count. You still pay for every query, even if most results are discarded. To save costs, consider pre-filtering your query by including domain hints in the query text. For example, include site:nature.com in the query string. This is not guaranteed to work but can reduce irrelevant results.

Item Client-Side Filtering Query-Level Hints
Reliability Always works May be ignored by the API
Cost Full API call cost Same cost but fewer results discarded
Implementation effort Medium, requires Python code Low, just modify query string
Flexibility High, supports any domain list Low, depends on API behavior

You can now restrict Perplexity API results to any set of domains using a short Python script. Next, consider adding a caching layer to avoid repeated API calls for the same query. An advanced tip is to use a Bloom filter for very large domain allowlists to keep memory usage low.

ADVERTISEMENT