Discord bots can do more than send text messages and play audio files. You can build a bot that listens to a voice channel and processes audio in real time. This is useful for speech recognition, audio transcription, music analysis, or live voice moderation. Pycord is a modern Discord API wrapper for Python that includes a feature called sinks. Sinks allow your bot to receive and handle audio data as it arrives from voice channels. This article explains how to set up Pycord, create a voice client, and use sinks to stream real-time audio from Discord voice channels.

Key Takeaways: Pycord Sinks for Real-Time Audio Streaming

Pycord sink classes: Built-in classes like PCMAudio and FFmpegPCMAudio let your bot receive and process audio without saving files to disk.
voice_client.listen() method: Starts listening to a voice channel and sends raw PCM audio data to a callback function you define.
after parameter in create_ffmpeg_player: Triggers a function after audio playback finishes, useful for chaining listen and play actions.

What Pycord Sinks Do and Why You Need Them

Pycord is a maintained fork of discord.py that supports Discord’s voice API. A sink is a mechanism that captures audio data from a voice channel and delivers it to your code in chunks. Without sinks, your bot can only play audio or send audio files. With sinks, your bot can process live audio for speech-to-text, sound detection, or custom commands triggered by voice.

The core concept is simple. When your bot joins a voice channel, it creates a VoiceClient object. Calling voice_client.listen() starts a sink that receives audio data from every user speaking in the channel. The data arrives as raw PCM frames. You write a callback function that receives each frame. Inside that function, you can analyze the audio, save it, or forward it to another service.

Pycord provides two main sink types. The PCMAudio class works for raw PCM data without transcoding. The FFmpegPCMAudio class uses FFmpeg to convert audio formats on the fly. For real-time streaming, PCMAudio is the better choice because it avoids the overhead of encoding and decoding. You need FFmpeg installed on your server or local machine if you plan to use FFmpegPCMAudio.

Prerequisites for Streaming Real-Time Audio

Before you write any code, make sure these items are in place:

Python 3.8 or newer installed on your system.
Pycord library installed via pip: pip install py-cord[voice]. The [voice] extra installs the required audio dependencies like pynacl and audioop.
FFmpeg installed and added to your system PATH. Download from ffmpeg.org and place the executable in a folder that is in your PATH environment variable.
A Discord bot token. Create a bot in the Discord Developer Portal, enable the Server Members Intent and Message Content Intent, and invite the bot to a server with the connect and speak permissions.
A text channel where the bot can send messages or a dedicated voice channel for testing.

Steps to Build a Bot That Streams Real-Time Audio

The following steps create a simple bot that joins a voice channel, listens to all audio, and prints the audio data size to the console. You can replace the callback with your own processing logic.

Create the bot script
Open a new Python file named voice_sink_bot.py. Import the required modules: discord, asyncio, and discord.ext.commands. Define a bot instance with the commands.Bot class and set the command prefix to !.
Define the sink callback class
Create a class that inherits from discord.Sink. Override the __init__ method to initialize a buffer. Override the write method. The write method receives two arguments: data (a bytes object of PCM audio) and user (a discord.User or discord.Member object). Inside write, process the audio data. For this example, print the length of the data and the user’s name.
Create the join command
Add a command named join that takes a ctx parameter. Check that the author is in a voice channel. If not, send an error message. Use await ctx.author.voice.channel.connect() to make the bot join the channel. Store the voice_client for later use.
Create the listen command
Add a command named listen. Inside the command, get the voice_client from ctx.voice_client. If the bot is not connected, send a message asking the user to run !join first. Instantiate your custom sink class. Call voice_client.listen(sink_instance) to start listening. Send a confirmation message.
Create the stop command
Add a command named stop. Call voice_client.stop_listening() to stop the sink. Send a message that listening has stopped.
Run the bot
At the bottom of the script, add bot.run('YOUR_BOT_TOKEN'). Replace YOUR_BOT_TOKEN with your actual bot token. Run the script with python voice_sink_bot.py.

Complete Example Code

Here is the full working script for a bot that streams real-time audio and prints data sizes:

import discord
from discord.ext import commands
import asyncio

class AudioSink(discord.Sink):
    def __init__(self):
        super().__init__()
        self.buffer = b''

    def write(self, data, user):
        self.buffer += data
        print(f"Received {len(data)} bytes from {user}")

bot = commands.Bot(command_prefix='!')

@bot.command()
async def join(ctx):
    if ctx.author.voice:
        channel = ctx.author.voice.channel
        await channel.connect()
        await ctx.send(f"Joined {channel.name}")
    else:
        await ctx.send("You are not in a voice channel")

@bot.command()
async def listen(ctx):
    voice = ctx.voice_client
    if not voice:
        await ctx.send("Bot is not in a voice channel. Use !join first")
        return
    sink = AudioSink()
    voice.listen(sink)
    await ctx.send("Listening to audio...")

@bot.command()
async def stop(ctx):
    voice = ctx.voice_client
    if voice and voice.is_listening():
        voice.stop_listening()
        await ctx.send("Stopped listening")
    else:
        await ctx.send("Bot is not currently listening")

bot.run('YOUR_BOT_TOKEN')

Common Mistakes and Limitations When Using Pycord Sinks

Bot Does Not Respond to Voice Commands

If the bot joins the channel but does not receive audio, check that you have enabled the Server Members Intent in the Discord Developer Portal. Without this intent, the bot cannot see who is speaking. Also verify that the bot has the speak permission in the voice channel. Without it, the bot cannot receive audio.

Audio Data Arrives in Small Chunks

Pycord sends audio data in 20-millisecond frames. This is normal for real-time streaming. If you need larger buffers for processing, accumulate the data in your sink’s write method until you have enough samples. Use a timer or a frame counter to flush the buffer periodically.

Bot Crashes When Multiple Users Speak Simultaneously

The write method is called for each user separately. Your sink class receives audio from all speakers. If your processing logic is slow, use asyncio.create_task inside write to offload the work to a separate coroutine. This prevents blocking the audio stream.

FFmpeg Not Found Error

If you use FFmpegPCMAudio and get a FileNotFoundError, FFmpeg is not installed or not in your PATH. Download the correct version for your operating system from ffmpeg.org, extract the executable, and add the folder to your system’s PATH environment variable. Restart your terminal or IDE after making the change.

Bot Disconnects After Playing Audio

Pycord’s voice client disconnects automatically if no audio is played or listened to for a period. To keep the bot connected, start the sink immediately after joining, or use voice_client.play(discord.PCMAudio(silent_pcm_data)) to play a silent track. The silent track keeps the connection alive without producing audible noise.

Pycord Sink Types: PCMAudio vs FFmpegPCMAudio

Item	PCMAudio	FFmpegPCMAudio
Input format	Raw PCM audio (uncompressed)	Any format supported by FFmpeg (MP3, AAC, OGG, etc.)
Transcoding overhead	None	Additional CPU usage for decoding
Installation requirement	None beyond Pycord	FFmpeg must be installed and in PATH
Real-time streaming suitability	Best for low-latency processing	Suitable when you need to receive audio in a different codec
Use case	Voice activity detection, speech recognition, custom analysis	Recording audio to files, playing music, format conversion

You now have a working bot that streams real-time audio from a Discord voice channel using Pycord sinks. The next step is to replace the print statement in the sink’s write method with actual audio processing logic. For example, you can feed the PCM data to a speech recognition library like speech_recognition or a sound analysis library like pydub. Remember to handle the audio data in chunks to keep the bot responsive. For advanced use, explore Pycord’s discord.Sink class documentation to learn about custom sink parameters and the after callback for chaining listen and play actions.

← Back to WiseChecker Home More in Windows & PC