Vibe Coding with Goose and the Speech MCP
Imagine creating an app just by describing what you want out loud, like you’re talking to a friend. That’s the magic of vibe coding: turning natural language into working code with the help of an AI agent. And while typing a prompt gets the job done, saying it out loud hits different 🔥 The new Speech MCP server has quite literally entered the chat.
In a recent Wild Goose Case livestream, hosts Ebony Louis and Adewale Abati were joined by Max Novich from Block's AI tools team, who demonstrated an exciting new extension - the Speech MCP server.
During the livestream, Max demonstrated this by creating an entire web application using only voice commands - no keyboard or mouse required. This resulted in a vibrant, animated webpage with 3D effects, synthwave aesthetics, and interactive elements, all created through natural conversation with Goose.
The Speech MCP Server
Speech MCP is an open source MCP server that enables voice interaction with AI agents like Goose. What makes it special is that it runs entirely locally on your machine, making it:
- LLM agnostic
- Privacy-focused
- Cost-effective compared to cloud-based alternatives
- Accessible without internet connectivity
Key Features
-
Local Speech Processing: Uses two main models:
- Faster Whisper: An efficient method to convert speech to text
- Coqui TTS: A Japanese-engineered text-to-speech model with 54 natural-sounding voices
-
Voice Selection: Choose from 54 different voices with varying characteristics and personalities
-
Multi-Speaker Narration: Generate and play conversations between multiple voices
-
Audio Transcription: Convert audio/video content to text with timestamps and speaker detection
Live Demo Highlights
During the demonstration, Max showcased several impressive capabilities:
-
Voice-Controlled Development:
- Created animated text effects
- Implemented 3D transformations
- Added synthwave aesthetics with gradients and grids
- Integrated music controls
-
System Integration:
- Controlled applications like Discord using voice commands
- Navigated file system and development environment
- Generated and managed audio content
-
Natural Interaction:
- Fluid conversation with Goose
- Real-time feedback and adjustments
- Multi-voice narration for documentation
Getting Started
To try the Speech MCP server yourself:
-
Install the required audio library (PortAudio):
# For macOS
brew install portaudio
# For Linux
apt-get install portaudio # or dnf install portaudio -
Install the extension directly using the one-click deep link install in Goose
Join the Development
The Speech MCP server is open-source and welcomes contributions. You can also connect with Max on Discord for questions and collaboration.
Voice interactions with AI agents like Goose with the power and tools to act on instructions provides a different kind of vibe that makes the future feel closer than ever. Whether you're interested in vibe coding, accessibility improvements, or just want to feel a bit more like Tony Stark while getting Goose to pull a J.A.R.V.I.S, the Speech MCP server offers a glimpse into the future of human-AI collaboration - and it's available today.