Why Copilot Truncates Output Even With Short Prompts
🔍 WiseChecker

Why Copilot Truncates Output Even With Short Prompts

You type a short prompt into Copilot, but the response cuts off after a few paragraphs or even a few sentences. This truncation happens mid-sentence or at an unnatural break, forcing you to ask for the rest of the output. The cause is not the length of your prompt but a combination of token limits, context window constraints, and response generation settings in Copilot for Microsoft 365. This article explains the technical reasons behind output truncation and provides steps to reduce or prevent it.

Key Takeaways: Preventing Copilot Output Truncation

  • Copilot pane > Settings > Token limit slider: Adjusts the maximum number of tokens the model can generate per response, with higher values reducing truncation.
  • Context window limit of 8,192 tokens (GPT-4 Turbo): Shared between your prompt, system instructions, and the generated response, so shorter prompts leave more room for output.
  • Using the “Continue” button or typing “Continue” in the chat: Instructs Copilot to regenerate the truncated portion without losing the existing context.

Why Copilot Truncates Output: Token Limits and Context Windows

Copilot, like other large language models, processes text in units called tokens. A token is roughly four characters in English, so a word like “Microsoft” is two tokens. Every Copilot interaction has a fixed maximum number of tokens it can handle at once, called the context window. For Copilot for Microsoft 365 using GPT-4 Turbo, the context window is 8,192 tokens.

The context window includes three parts:

  • System prompt: Pre-set instructions that define Copilot’s behavior, such as “You are an AI assistant for Microsoft 365.” This uses 200 to 500 tokens.
  • User prompt: The text you type, including any attached documents, emails, or file content. Even short prompts consume tokens if you attach large files.
  • Generated response: The output Copilot writes. When the sum of system prompt, user prompt, and generated response reaches 8,192 tokens, Copilot stops generating and truncates the output.

A second cause is the maximum response token limit set by your Microsoft 365 administrator. Admins can configure Copilot to generate only up to a certain number of tokens per response, typically between 1,024 and 4,096 tokens. This setting overrides the model’s natural capacity. If the admin sets a low limit, even a short prompt will produce a truncated answer.

Steps to Reduce Output Truncation in Copilot

These steps help you increase the length of Copilot’s responses by adjusting settings and changing how you write prompts.

  1. Check your current token limit in Copilot settings
    Open any Copilot interface, such as the Copilot pane in Microsoft Edge or the Copilot app in Teams. Click the three-dot menu or gear icon to open Settings. Look for a slider labeled Token limit or Maximum response length. Move the slider to the highest value available, usually 4,096 tokens. This gives Copilot more room to generate a complete response.
  2. Reduce the size of attached files or data
    If you attach a Word document, PDF, or email thread to your prompt, Copilot must read that content into the context window. Before attaching, trim the file to only the relevant pages or sections. For emails, forward only the specific message instead of the entire thread. Shorter attached content leaves more tokens for the output.
  3. Write shorter prompts with explicit length requests
    Instead of a long prompt that explains background context, write a concise prompt that states exactly what you want. Add a phrase like “Write a complete response of at least 500 words” or “Do not truncate the output.” This instruction tells Copilot to maximize the response within the token limit.
  4. Use the Continue command to regenerate the rest of the output
    When Copilot truncates a response, type the word Continue in the chat box and press Enter. Copilot uses the existing conversation context to generate the next segment of the response. Repeat this command until the output is complete. This method works because each Continue request opens a new generation cycle within the same conversation.
  5. Ask your admin to increase the tenant-wide token limit
    If the token limit slider is grayed out or set to a low value, your Microsoft 365 administrator has enforced a policy. Contact your admin and ask them to navigate to Microsoft 365 admin center > Copilot > Settings > Response limits and increase the Maximum tokens per response field to at least 4,096. Admins can also set this value per user group using PowerShell.

If Copilot Still Truncates Output After Adjusting Settings

Copilot stops generating at the same point every time

This pattern indicates a hard token limit enforced by the model or the admin policy. Verify your token limit slider is at maximum. If the slider is already at 4,096 and truncation still occurs, the model may be hitting the 8,192-token context window because your attached content is too large. Remove all attachments and retry the prompt. If the response completes fully, the issue is the size of the attached data.

Copilot truncates even with no attachments and a very short prompt

When no files are attached and the prompt is under 50 tokens, truncation is usually caused by the system prompt consuming a large portion of the context window. Some Microsoft 365 tenants have custom system prompts that include lengthy instructions, brand guidelines, or compliance disclaimers. You cannot change the system prompt yourself. Contact your Microsoft 365 administrator and ask them to review the Copilot system prompt in the admin center. A shorter system prompt frees tokens for the output.

Copilot truncates mid-word or mid-sentence

This behavior is normal when the model hits the token limit exactly. The model does not finish the current sentence before stopping. Use the Continue command as described in step 4. The regenerated segment will start from the last complete token and complete the sentence, then continue generating.

Copilot Free vs Copilot for Microsoft 365: Token Limits Compared

Item Copilot Free Copilot for Microsoft 365
Base model GPT-4o mini or GPT-4o GPT-4 Turbo
Context window size 8,192 tokens 8,192 tokens
Default max response tokens 2,048 tokens 4,096 tokens (adjustable by admin)
Attached file support Images only Documents, emails, meetings, images
System prompt overhead Minimal (200-300 tokens) Higher (400-500 tokens with tenant policies)

Copilot for Microsoft 365 has a higher default max response token limit than Copilot Free, but the larger system prompt overhead reduces the effective output space. Both versions use the same 8,192-token context window, so attached files have the same impact on truncation.

To maximize output in Copilot for Microsoft 365, keep attached files under 2,000 tokens and use the Continue command for longer documents. In Copilot Free, avoid attaching images with complex text because the image description also consumes tokens. For both versions, the most reliable method to prevent truncation is to ask for a specific word count and use the Continue command when the response stops.