When you upload a video to Microsoft Stream, Copilot can answer questions based on the video’s transcript and spoken content. However, the transcript does not become available to Copilot instantly. Users often see a message that says Copilot is not ready yet or that the transcript is still being processed. This delay is caused by the indexing pipeline that Azure AI Speech and Microsoft 365 use to generate and store the transcript. This article explains the technical steps behind the latency, what affects processing time, and how to plan your workflow around this delay.
Key Takeaways: Copilot Transcript Indexing in Microsoft Stream
- Azure AI Speech transcription service: Processes audio from uploaded videos and generates a text transcript, which is the first step before Copilot can access the content.
- Microsoft Graph indexing pipeline: Stores the transcript and makes it searchable by Copilot, typically taking 30 minutes to several hours depending on video length and system load.
- Video duration and format: Longer videos and complex audio formats increase indexing time, while pre-recorded captions can reduce or eliminate the delay.
Why Copilot Cannot Access Stream Transcripts Immediately
Copilot relies on the Microsoft Graph to retrieve transcript data from Stream videos. When you upload a video, the following steps must complete before Copilot can answer questions:
Step 1: Audio Extraction and Speech Recognition
Azure AI Speech first extracts the audio track from the video file. It then converts the spoken words into a text transcript using automatic speech recognition. This process is compute-intensive, especially for videos with background noise, multiple speakers, or non-English languages. The service processes the audio in chunks, and the total time depends on the video length. A 10-minute video may take 5 to 15 minutes to transcribe. A 60-minute video can take 30 to 60 minutes.
Step 2: Transcript Storage and Indexing in Microsoft Graph
Once the transcript is generated, Microsoft 365 stores it as a metadata file attached to the video object in Microsoft Graph. The system then indexes the transcript so that Copilot can search and retrieve relevant segments. Indexing includes breaking the transcript into time-stamped chunks, mapping speaker labels, and building a semantic search index. This step adds another 15 to 30 minutes for most videos. During peak usage hours, the queue can cause additional delays.
Step 3: Copilot Query Processing
After indexing is complete, Copilot can accept queries about the video. When you ask a question, Copilot searches the indexed transcript using Microsoft Graph and returns the most relevant segment with a timestamp. If indexing is still in progress, Copilot returns a message saying the video is not ready or the transcript is unavailable. The entire pipeline from upload to Copilot readiness typically takes 30 minutes to 2 hours for standard videos. For very long videos or those with poor audio quality, it can take up to 4 hours.
Factors That Affect Indexing Latency
Several variables influence how quickly Copilot can access a transcript:
- Video duration: Longer videos take more time to transcribe and index. A 5-minute video may be ready in 20 minutes, while a 2-hour video could take 3 to 4 hours.
- Audio quality: Videos with clear speech, minimal background noise, and a single speaker transcribe faster. Poor audio quality or multiple overlapping speakers increases processing time.
- Language and accent: Azure AI Speech supports over 100 languages, but some languages or regional accents may require more processing cycles.
- System load: During business hours when many organizations upload videos, the indexing queue can slow down processing. Uploading during off-peak hours may reduce wait time.
- Pre-existing captions: If the video already has manually uploaded captions or a pre-generated transcript file in WebVTT format, Copilot can use that data immediately. The indexing pipeline skips the speech recognition step and goes directly to indexing, which takes only 5 to 10 minutes.
How to Check the Status of Transcript Indexing
You can verify whether Copilot has indexed a video by using the Stream web app or the Microsoft Graph API.
Using the Stream Web App
- Open the video in Microsoft Stream
Navigate to the video you uploaded and click the video title to open the playback page. - Check the transcript pane
Click the Transcript icon in the bottom toolbar. If a full transcript with timestamps appears, indexing is complete. If you see a message that says Transcript is being generated, the pipeline is still running. - Test Copilot
Open Copilot in Microsoft 365 and ask a question about the video content. For example, ask What were the key points in the marketing strategy video? If Copilot returns a response with a timestamp, indexing is finished. If Copilot says it cannot find information, wait 30 minutes and try again.
Using Microsoft Graph API
- Send a GET request to the video metadata endpoint
Use the Graph API endpoint:https://graph.microsoft.com/v1.0/me/drive/items/{video-id}/microsoft.graph.video. Replace{video-id}with the actual file ID. - Examine the transcriptProcessingStatus property
The response includes a property calledtranscriptProcessingStatus. The valuecompletedmeans indexing is done.processingmeans the pipeline is still running.failedindicates an error that requires re-uploading the video.
Common Issues with Transcript Indexing and Copilot
Copilot Returns No Results Even After Indexing Completes
If the transcript status shows completed but Copilot still cannot answer questions, the issue is likely a permission problem. The user must have at least read access to the video file in Stream. Verify that the video is stored in a SharePoint site or OneDrive folder where the user has View or Edit permissions. Also confirm that the Microsoft 365 Copilot license is assigned to the user and that the video is in a supported format MP4, WMV, or MOV with audio track.
Transcript Shows Incomplete or Missing Sections
Azure AI Speech may fail to transcribe sections with heavy background music, very quiet speech, or rapid overlapping dialogue. In these cases, the transcript will have gaps. To fix this, upload a WebVTT caption file manually before or after uploading the video. Copilot will use the manual captions instead of the auto-generated transcript, and the indexing time drops to under 10 minutes. To add captions, go to Stream > video > Details > Captions > Upload captions and select a .vtt file.
Indexing Takes More Than 4 Hours
If a video exceeds 4 hours of processing time, the pipeline may have encountered an error. Check the transcriptProcessingStatus using the Graph API. If the status is failed, delete the video and re-upload it. Before re-uploading, convert the video to a standard format MP4 with H.264 codec and AAC audio at 128 kbps or higher. Avoid variable frame rates or unusual codecs like VP9 or HEVC, as these can cause transcription failures.
Copilot with Auto-Generated Transcript vs Copilot with Manual Captions
| Item | Auto-Generated Transcript | Manual Captions WebVTT |
|---|---|---|
| Indexing time | 30 minutes to 4 hours | 5 to 10 minutes |
| Accuracy | High for clear audio, lower for noisy or accented speech | 100% accurate if captions are correct |
| Speaker identification | Automatic, may label speakers incorrectly | Manual, can assign exact speaker names |
| Language support | 100+ languages | Any language the captions are written in |
| Copilot response quality | Good, may miss context in noisy sections | Excellent, no gaps in the transcript |
For critical videos where you need Copilot to work immediately after upload, prepare a WebVTT caption file in advance. For videos where timeliness is less important, the auto-generated transcript works well after the indexing delay.
Now you understand why Copilot cannot access Stream transcripts right after upload. The delay comes from Azure AI Speech processing the audio and Microsoft Graph indexing the result. To speed up the process, upload manual captions before or right after the video. For ongoing monitoring, check the transcript pane in Stream or use the Graph API to confirm when indexing completes. As a next step, review your organization’s video upload schedule and consider uploading long videos during off-peak hours to reduce queue wait time. If you frequently need instant Copilot access, set up a workflow that automatically attaches a WebVTT file to every new video upload using Power Automate.