Discord bots can do more than send text messages and play audio files. You can build a bot that listens to a voice channel and processes audio in real time. This is useful for speech recognition, audio transcription, music analysis, or live voice moderation. Pycord is a modern Discord API wrapper for Python that includes a feature called sinks. Sinks allow your bot to receive and handle audio data as it arrives from voice channels. This article explains how to set up Pycord, create a voice client, and use sinks to stream real-time audio from Discord voice channels.
Key Takeaways: Pycord Sinks for Real-Time Audio Streaming
- Pycord sink classes: Built-in classes like
PCMAudioandFFmpegPCMAudiolet your bot receive and process audio without saving files to disk. voice_client.listen()method: Starts listening to a voice channel and sends raw PCM audio data to a callback function you define.afterparameter increate_ffmpeg_player: Triggers a function after audio playback finishes, useful for chaining listen and play actions.
What Pycord Sinks Do and Why You Need Them
Pycord is a maintained fork of discord.py that supports Discord’s voice API. A sink is a mechanism that captures audio data from a voice channel and delivers it to your code in chunks. Without sinks, your bot can only play audio or send audio files. With sinks, your bot can process live audio for speech-to-text, sound detection, or custom commands triggered by voice.
The core concept is simple. When your bot joins a voice channel, it creates a VoiceClient object. Calling voice_client.listen() starts a sink that receives audio data from every user speaking in the channel. The data arrives as raw PCM frames. You write a callback function that receives each frame. Inside that function, you can analyze the audio, save it, or forward it to another service.
Pycord provides two main sink types. The PCMAudio class works for raw PCM data without transcoding. The FFmpegPCMAudio class uses FFmpeg to convert audio formats on the fly. For real-time streaming, PCMAudio is the better choice because it avoids the overhead of encoding and decoding. You need FFmpeg installed on your server or local machine if you plan to use FFmpegPCMAudio.
Prerequisites for Streaming Real-Time Audio
Before you write any code, make sure these items are in place:
- Python 3.8 or newer installed on your system.
- Pycord library installed via pip:
pip install py-cord[voice]. The[voice]extra installs the required audio dependencies likepynaclandaudioop. - FFmpeg installed and added to your system PATH. Download from ffmpeg.org and place the executable in a folder that is in your PATH environment variable.
- A Discord bot token. Create a bot in the Discord Developer Portal, enable the Server Members Intent and Message Content Intent, and invite the bot to a server with the
connectandspeakpermissions. - A text channel where the bot can send messages or a dedicated voice channel for testing.
Steps to Build a Bot That Streams Real-Time Audio
The following steps create a simple bot that joins a voice channel, listens to all audio, and prints the audio data size to the console. You can replace the callback with your own processing logic.
- Create the bot script
Open a new Python file namedvoice_sink_bot.py. Import the required modules:discord,asyncio, anddiscord.ext.commands. Define a bot instance with thecommands.Botclass and set the command prefix to!. - Define the sink callback class
Create a class that inherits fromdiscord.Sink. Override the__init__method to initialize a buffer. Override thewritemethod. Thewritemethod receives two arguments:data(a bytes object of PCM audio) anduser(adiscord.Userordiscord.Memberobject). Insidewrite, process the audio data. For this example, print the length of the data and the user’s name. - Create the join command
Add a command namedjointhat takes actxparameter. Check that the author is in a voice channel. If not, send an error message. Useawait ctx.author.voice.channel.connect()to make the bot join the channel. Store thevoice_clientfor later use. - Create the listen command
Add a command namedlisten. Inside the command, get thevoice_clientfromctx.voice_client. If the bot is not connected, send a message asking the user to run!joinfirst. Instantiate your custom sink class. Callvoice_client.listen(sink_instance)to start listening. Send a confirmation message. - Create the stop command
Add a command namedstop. Callvoice_client.stop_listening()to stop the sink. Send a message that listening has stopped. - Run the bot
At the bottom of the script, addbot.run('YOUR_BOT_TOKEN'). ReplaceYOUR_BOT_TOKENwith your actual bot token. Run the script withpython voice_sink_bot.py.
Complete Example Code
Here is the full working script for a bot that streams real-time audio and prints data sizes:
import discord
from discord.ext import commands
import asyncio
class AudioSink(discord.Sink):
def __init__(self):
super().__init__()
self.buffer = b''
def write(self, data, user):
self.buffer += data
print(f"Received {len(data)} bytes from {user}")
bot = commands.Bot(command_prefix='!')
@bot.command()
async def join(ctx):
if ctx.author.voice:
channel = ctx.author.voice.channel
await channel.connect()
await ctx.send(f"Joined {channel.name}")
else:
await ctx.send("You are not in a voice channel")
@bot.command()
async def listen(ctx):
voice = ctx.voice_client
if not voice:
await ctx.send("Bot is not in a voice channel. Use !join first")
return
sink = AudioSink()
voice.listen(sink)
await ctx.send("Listening to audio...")
@bot.command()
async def stop(ctx):
voice = ctx.voice_client
if voice and voice.is_listening():
voice.stop_listening()
await ctx.send("Stopped listening")
else:
await ctx.send("Bot is not currently listening")
bot.run('YOUR_BOT_TOKEN')
Common Mistakes and Limitations When Using Pycord Sinks
Bot Does Not Respond to Voice Commands
If the bot joins the channel but does not receive audio, check that you have enabled the Server Members Intent in the Discord Developer Portal. Without this intent, the bot cannot see who is speaking. Also verify that the bot has the speak permission in the voice channel. Without it, the bot cannot receive audio.
Audio Data Arrives in Small Chunks
Pycord sends audio data in 20-millisecond frames. This is normal for real-time streaming. If you need larger buffers for processing, accumulate the data in your sink’s write method until you have enough samples. Use a timer or a frame counter to flush the buffer periodically.
Bot Crashes When Multiple Users Speak Simultaneously
The write method is called for each user separately. Your sink class receives audio from all speakers. If your processing logic is slow, use asyncio.create_task inside write to offload the work to a separate coroutine. This prevents blocking the audio stream.
FFmpeg Not Found Error
If you use FFmpegPCMAudio and get a FileNotFoundError, FFmpeg is not installed or not in your PATH. Download the correct version for your operating system from ffmpeg.org, extract the executable, and add the folder to your system’s PATH environment variable. Restart your terminal or IDE after making the change.
Bot Disconnects After Playing Audio
Pycord’s voice client disconnects automatically if no audio is played or listened to for a period. To keep the bot connected, start the sink immediately after joining, or use voice_client.play(discord.PCMAudio(silent_pcm_data)) to play a silent track. The silent track keeps the connection alive without producing audible noise.
Pycord Sink Types: PCMAudio vs FFmpegPCMAudio
| Item | PCMAudio | FFmpegPCMAudio |
|---|---|---|
| Input format | Raw PCM audio (uncompressed) | Any format supported by FFmpeg (MP3, AAC, OGG, etc.) |
| Transcoding overhead | None | Additional CPU usage for decoding |
| Installation requirement | None beyond Pycord | FFmpeg must be installed and in PATH |
| Real-time streaming suitability | Best for low-latency processing | Suitable when you need to receive audio in a different codec |
| Use case | Voice activity detection, speech recognition, custom analysis | Recording audio to files, playing music, format conversion |
You now have a working bot that streams real-time audio from a Discord voice channel using Pycord sinks. The next step is to replace the print statement in the sink’s write method with actual audio processing logic. For example, you can feed the PCM data to a speech recognition library like speech_recognition or a sound analysis library like pydub. Remember to handle the audio data in chunks to keep the bot responsive. For advanced use, explore Pycord’s discord.Sink class documentation to learn about custom sink parameters and the after callback for chaining listen and play actions.