Microsoft Copilot Agents: How They Discover Knowledge Sources
🔍 WiseChecker

Microsoft Copilot Agents: How They Discover Knowledge Sources

When you build a Copilot agent in Microsoft Copilot Studio, the agent needs to find the right information to answer user questions. Without a clear method for discovering knowledge sources, the agent may return outdated, incomplete, or incorrect answers. This article explains the technical process Copilot agents use to locate and pull data from Microsoft Graph, SharePoint, and other connected services. You will learn how the discovery pipeline works and how to configure it for accurate results.

Key Takeaways: Copilot Agent Knowledge Discovery Pipeline

  • Microsoft Copilot Studio > Create agent > Topics > Knowledge sources: Define which SharePoint sites, OneDrive folders, or web URLs the agent can query.
  • Microsoft Graph connectors > External connections: Index third-party data from services like ServiceNow or Salesforce into the Microsoft 365 search index.
  • Agent settings > Grounding > Limit to selected sources: Restrict the agent to specific knowledge sources to prevent irrelevant or unauthorized data from appearing in responses.

ADVERTISEMENT

How Copilot Agents Discover Knowledge Sources

A Copilot agent does not search the internet or your entire Microsoft 365 tenant by default. Instead, it uses a discovery pipeline that starts with the agent’s configured knowledge sources. The pipeline has three main stages: indexing, retrieval, and grounding. In the indexing stage, the agent scans the defined sources and creates a searchable vector index. In the retrieval stage, when a user asks a question, the agent converts the query into an embedding and finds the most similar chunks in the index. In the grounding stage, the agent checks the retrieved content against the user’s permissions and the agent’s configuration to decide what to include in the response.

Knowledge Sources Supported by Copilot Agents

Copilot agents can discover knowledge from the following source types:

  • Microsoft Graph data: Emails, files, calendar events, Teams messages, and SharePoint sites that the user has access to. The agent uses the Microsoft Graph search API to find this data.
  • SharePoint sites and pages: Specific SharePoint site collections, document libraries, or pages you add to the agent. The agent indexes the content and uses it for grounding.
  • OneDrive folders: Personal OneDrive folders shared with the user. The agent respects sharing permissions.
  • Web URLs: Public or authenticated web pages you specify. The agent crawls these pages and indexes their text content.
  • Microsoft Graph connectors: External services like ServiceNow, Salesforce, Jira, or SQL databases. You configure a connector in the Microsoft 365 admin center, which indexes the external data into the Microsoft 365 search index.
  • Custom datasets: CSV, JSON, or PDF files uploaded directly to the agent in Copilot Studio. These files are stored in the agent’s internal storage.

The Discovery Process in Detail

When a user sends a message to the agent, the following steps occur:

  1. Query parsing
    The agent uses a large language model to understand the user’s intent and extract key entities such as project names, dates, or document types.
  2. Source selection
    The agent checks its configured knowledge sources. If the agent has multiple sources, it applies a relevance scoring algorithm to pick the most likely sources for the query.
  3. Embedding generation
    The agent converts the query into a vector embedding using a text embedding model. This embedding represents the semantic meaning of the query.
  4. Vector search
    The agent searches the vector index of the selected sources for chunks of text with similar embeddings. The index is built during the indexing stage using Azure AI Search.
  5. Permission filtering
    The agent checks each retrieved chunk against the user’s access rights. If the user does not have read permission on a file, the agent discards that chunk.
  6. Grounding and response generation
    The agent passes the filtered chunks to the language model with the user’s query. The model generates a response using only the grounded content. The agent does not use its own training data to answer.

Steps to Configure Knowledge Sources for a Copilot Agent

To ensure your agent discovers the correct sources, follow these steps in Copilot Studio.

Add SharePoint Sites as Knowledge Sources

  1. Open your agent in Copilot Studio
    Go to Copilot Studio and sign in with your work or school account. Select the agent you want to configure.
  2. Navigate to the Knowledge tab
    In the left navigation pane, click Knowledge. This opens the knowledge source management page.
  3. Add a SharePoint source
    Click Add knowledge source and select SharePoint. Paste the URL of the SharePoint site you want to include. You can add multiple sites by repeating this step.
  4. Save and sync
    Click Save. The agent starts indexing the SharePoint content. Indexing time depends on the amount of content. You can monitor progress in the Sync status column.

Add Web URLs as Knowledge Sources

  1. Open the Knowledge tab
    In Copilot Studio, go to your agent and click Knowledge.
  2. Select Public website
    Click Add knowledge source and choose Public website. Enter the full URL of the website you want to index. The agent will crawl all pages under that domain.
  3. Set crawl depth
    In the settings, set the maximum number of pages to crawl. A depth of 50 pages is typical for a documentation site. Click Save.
  4. Review indexed content
    After syncing, you can view the indexed pages in the Knowledge tab. Remove any pages that contain irrelevant information.

Configure Microsoft Graph Connectors for External Data

  1. Go to Microsoft 365 admin center
    Open the Microsoft 365 admin center and sign in as a global admin or search admin.
  2. Open Microsoft Search
    In the left navigation, expand Settings and select Search & intelligence. Then click Data sources.
  3. Add a Graph connector
    Click Add connector and select the service you want to connect, such as ServiceNow or Salesforce. Follow the connector wizard to authenticate and configure the data source.
  4. Assign the connector to your agent
    In Copilot Studio, go to the Knowledge tab and click Add knowledge source. Select Microsoft Graph connector and choose the connector you created. Save the configuration.

ADVERTISEMENT

If the Agent Cannot Find the Right Knowledge

Agent Returns No Results for a Known File

The agent may not find a file if the file is not included in any configured knowledge source. Verify that the file’s parent SharePoint site or OneDrive folder is added to the agent. Also check that the file is not excluded by a retention policy or sensitivity label. Go to the Knowledge tab and confirm the source sync status shows Synced.

Agent Returns Results from Unauthorized Sources

If the agent pulls data from a source you did not intend, the grounding configuration may be too broad. Open the agent settings and navigate to Grounding. Enable the option Limit to selected knowledge sources. This forces the agent to only use the sources you explicitly added. Without this setting, the agent may fall back to the entire Microsoft 365 search index for the user.

Agent Ignores a Graph Connector Source

A Graph connector may not appear in the agent’s knowledge source list if it was not assigned correctly. In Copilot Studio, go to the Knowledge tab and click Add knowledge source. If the connector is missing, return to the Microsoft 365 admin center and verify that the connector is enabled and has indexed data. Also confirm that the connector’s permissions include the users who will interact with the agent.

Item Default Agent Knowledge Discovery Custom Agent Knowledge Discovery
Description Uses Microsoft Graph data for the signed-in user Uses user-defined sources including SharePoint, web URLs, and Graph connectors
Source scope User’s own files, emails, and calendar Any SharePoint site, OneDrive folder, web URL, or external system
Permission model Respects existing Microsoft 365 permissions Respects existing permissions plus source-level restrictions
Indexing frequency Near real-time via Microsoft Graph Configurable via sync schedule in Copilot Studio
Grounding control No limit to specific sources Can limit to selected sources via Grounding settings

Now you understand how Copilot agents discover knowledge sources through indexing, vector search, and grounding. Start by adding your most important SharePoint sites and web URLs to the agent’s Knowledge tab. Then set the grounding option to limit the agent to those sources. For external data, use Microsoft Graph connectors and assign them to the agent in Copilot Studio.

ADVERTISEMENT