Vibe Coding with Goose and the Speech MCP

March 28, 2025 · 3 min read

Staff Developer Advocate

blog cover

Imagine creating an app just by describing what you want out loud, like you’re talking to a friend. That’s the magic of vibe coding: turning natural language into working code with the help of an AI agent. And while typing a prompt gets the job done, saying it out loud hits different 🔥 The new Speech MCP server has quite literally entered the chat.

In a recent Wild Goose Case livestream, hosts Ebony Louis and Adewale Abati were joined by Max Novich from Block's AI tools team, who demonstrated an exciting new extension - the Speech MCP server.

During the livestream, Max demonstrated this by creating an entire web application using only voice commands - no keyboard or mouse required. This resulted in a vibrant, animated webpage with 3D effects, synthwave aesthetics, and interactive elements, all created through natural conversation with Goose.

The Speech MCP Server

Speech MCP is an open source MCP server that enables voice interaction with AI agents like Goose. What makes it special is that it runs entirely locally on your machine, making it:

LLM agnostic
Privacy-focused
Cost-effective compared to cloud-based alternatives
Accessible without internet connectivity

Key Features

Local Speech Processing: Uses two main models:
- Faster Whisper: An efficient method to convert speech to text
- Coqui TTS: A Japanese-engineered text-to-speech model with 54 natural-sounding voices
Voice Selection: Choose from 54 different voices with varying characteristics and personalities
Multi-Speaker Narration: Generate and play conversations between multiple voices
Audio Transcription: Convert audio/video content to text with timestamps and speaker detection

Live Demo Highlights

During the demonstration, Max showcased several impressive capabilities:

Voice-Controlled Development:
- Created animated text effects
- Implemented 3D transformations
- Added synthwave aesthetics with gradients and grids
- Integrated music controls
System Integration:
- Controlled applications like Discord using voice commands
- Navigated file system and development environment
- Generated and managed audio content
Natural Interaction:
- Fluid conversation with Goose
- Real-time feedback and adjustments
- Multi-voice narration for documentation

Getting Started

To try the Speech MCP server yourself:

Install the required audio library (PortAudio):

# For macOS
brew install portaudio

# For Linux
apt-get install portaudio  # or dnf install portaudio

Install the extension directly using the one-click deep link install in Goose

Join the Development

The Speech MCP server is open-source and welcomes contributions. You can also connect with Max on Discord for questions and collaboration.

Voice interactions with AI agents like Goose with the power and tools to act on instructions provides a different kind of vibe that makes the future feel closer than ever. Whether you're interested in vibe coding, accessibility improvements, or just want to feel a bit more like Tony Stark while getting Goose to pull a J.A.R.V.I.S, the Speech MCP server offers a glimpse into the future of human-AI collaboration - and it's available today.

The Speech MCP Server​

Key Features​

Live Demo Highlights​

Getting Started​

Join the Development​

The Speech MCP Server

Key Features

Live Demo Highlights

Getting Started

Join the Development