Music Creator Agent
Learn how to build an AI agent that provides comprehensive music creation and audio processing capabilities. This tutorial shows how to create an agent that can generate music, analyze audio, and process sound files through natural language commands.
🎥 Demo Video
Watch the Music Creator Agent in action:
⚠️ Experimental Status: This agent is currently in experimental development. The tools have not been extensively tested in production environments and may have limitations or bugs. We're actively seeking feedback and improvements from users.
What You'll Build
A music creator agent that can:
- Generate melodies, chord progressions, and drum patterns
- Analyze audio for tempo, key, and musical features
- Convert between audio formats with quality control
- Apply audio effects and processing
- Mix multiple audio tracks with volume control
- Play audio and MIDI files with precise control
- Process both audio and MIDI files seamlessly
Understanding the Architecture
The music creator agent follows Saiki's framework design with clear separation of responsibilities:
- MCP Server: Sets up the server and exposes audio processing tools to the agent
- Agent: Orchestrates workflows and handles user interaction
- Tools: Contain the actual audio processing logic
This architecture allows the agent to focus on understanding musical intent while the tools handle the technical audio processing.
MCP Server Code
The core functionality is provided by the Music Agent MCP Server, a Python-based server built with FastMCP. To understand the complete MCP server implementation, refer to the mcp-servers repository:
- Music Server: src/music - Audio generation, processing, effects, and MIDI handling
Step 1: Setting Up the Project
First, let's understand the project structure:
agents/music-agent/
├── music-agent.yml # Agent configuration
└── README.md # Documentation
Step 2: Quick Setup
The music creator agent uses a published MCP server that's automatically installed:
# From the saiki project root
saiki --agent agents/music-agent/music-agent.yml
That's it! The MCP server (truffle-ai-music-creator-mcp
) will be automatically downloaded and installed via uvx
on first run.
What's Happening Behind the Scenes
The published MCP server includes these key dependencies:
- librosa: Audio analysis and music information retrieval
- pydub: Audio file manipulation and processing
- music21: Music notation and analysis
- pretty_midi: MIDI file handling
- FastMCP: Model Context Protocol server framework
- NumPy & SciPy: Numerical computing for audio processing
Step 3: Understanding the Agent Configuration
The agent is configured in music-agent.yml
:
systemPrompt: |
You are an AI assistant specialized in music creation, editing, and production. You have access to a comprehensive set of tools for working with audio and music including:
- **Audio Analysis**: Analyze audio files for tempo, key, BPM, frequency spectrum, and audio characteristics
- **Audio Processing**: Convert formats, adjust volume, normalize, apply effects (reverb, echo, distortion, etc.)
- **Music Generation**: Create melodies, chord progressions, drum patterns, and complete compositions
- **Audio Manipulation**: Trim, cut, splice, loop, and arrange audio segments
- **Effects & Filters**: Apply various audio effects and filters for creative sound design
- **Mixing & Mastering**: Balance levels, apply compression, EQ, and mastering effects
- **File Management**: Organize, convert, and manage audio files in various formats
mcpServers:
music_creator:
type: stdio
command: uvx
args:
- truffle-ai-music-creator-mcp
connectionMode: strict
llm:
provider: openai
model: gpt-4o-mini
apiKey: $OPENAI_API_KEY
Key Components Explained
- systemPrompt: Defines the agent's capabilities and behavior
- mcpServers: Connects to the Python MCP server
- llm: Configures the language model for intelligent interaction
Step 4: Available Tools
The music creator agent provides 20+ powerful tools organized into categories:
Music Generation
create_melody
- Generate melodies in any key and scalecreate_chord_progression
- Create chord progressions using Roman numeralscreate_drum_pattern
- Generate drum patterns for different styles
Audio Analysis
analyze_audio
- Comprehensive audio analysis with spectral featuresdetect_tempo
- Detect BPM and beat positionsdetect_key
- Identify musical key and modeget_audio_info
- Get detailed audio file informationget_midi_info
- Get detailed MIDI file information
Audio Processing
convert_audio_format
- Convert between audio formatsconvert_midi_to_audio
- Convert MIDI files to high-quality audio (WAV, 44.1kHz, 16-bit)adjust_volume
- Adjust audio levels in dBnormalize_audio
- Normalize audio to target levelstrim_audio
- Cut audio to specific time rangesapply_audio_effect
- Apply reverb, echo, distortion, filters
Mixing & Arrangement
merge_audio_files
- Combine multiple audio filesmix_audio_files
- Mix tracks with individual volume control (supports both audio and MIDI)
Playback
play_audio
- Play audio files with optional start time and durationplay_midi
- Play MIDI files with optional start time and duration
Utility
list_available_effects
- List all audio effectslist_drum_patterns
- List available drum patterns
Step 5: Running the Agent
Start the music creator agent:
# From the project root
saiki --agent agents/music-agent/music-agent.yml
Step 6: Testing with Example Prompts
Let's test the agent with some example prompts to understand how it works:
Music Generation
"Create a melody in G major at 140 BPM for 15 seconds"
What happens: The agent calls create_melody
with the specified key, tempo, and duration.
"Create a I-IV-V-I chord progression in D major"
What happens: The agent calls create_chord_progression
with the Roman numeral progression and key.
Audio Analysis
"Analyze the tempo and key of my song.mp3"
What happens: The agent calls analyze_audio
to get comprehensive audio information.
"What's the BPM of this track?"
What happens: The agent calls detect_tempo
to find the beat per minute.
Audio Processing
"Convert my song.wav to MP3 format"
What happens: The agent calls convert_audio_format
to change the file format.
"Convert my MIDI melody to WAV format"
What happens: The agent calls convert_midi_to_audio
to synthesize the MIDI file.
Audio Effects
"Add reverb to my guitar with 200ms reverb time"
What happens: The agent calls apply_audio_effect
with reverb parameters.
Mixing & Playback
"Mix my vocals, guitar, and drums together with the vocals at +3dB"
What happens: The agent calls mix_audio_files
with volume levels for each track.
"Create a melody in G major and play it for 5 seconds"
What happens: The agent calls create_melody
followed by play_midi
to generate and preview.
Step 7: Understanding the Workflow
Here's how the three components work together in a typical interaction:
- User Request: "Create a rock song with drums and a melody in C major"
- Agent: Interprets the request and orchestrates the workflow
- Tools: Agent calls the processing functions:
create_drum_pattern()
- generates rock drum patterncreate_melody()
- creates C major melodymix_audio_files()
- combines the tracks
- Response: Agent provides the result with musical context
Example Workflow
User: "Create a jazz melody in B minor, add some reverb, and play it for 10 seconds"
Agent Response:
"I'll help you create a jazz melody with reverb. Let me break this down:
1. First, I'll create a jazz melody in B minor
2. Then I'll add reverb to give it some space
3. Finally, I'll play it for you to hear
[Executes tools and provides results]"
Supported Formats
- Audio: MP3, WAV, FLAC, OGG, M4A, AIFF, WMA
- MIDI: MID, MIDI
Experimental Features
This agent is in active development. We encourage feedback on real-world usage, different genres, and various file sizes.
Common Use Cases
- Music Production: Create backing tracks, generate drum patterns, compose melodies
- Audio Editing: Clean up recordings, normalize levels, apply effects
- Music Analysis: Analyze tempo, key, and musical features
- Educational: Learn music theory through generation and experimentation
Ready to start? Run the setup script and begin creating intelligent music workflows!
💡 Tip: This is an experimental agent, so we encourage you to try different use cases and provide feedback to help improve the tools!