Why Tool Descriptions Aren’t Enough

The first question I had when I heard about MCP sampling was:
“Can’t I just write better tool descriptions and tell the tool it’s an expert?”

The first question I had when I heard about MCP sampling was:
“Can’t I just write better tool descriptions and tell the tool it’s an expert?”

The MCP ecosystem is standardizing how servers deliver interactive UIs to hosts, and goose is an early adopter. Today we're shipping support for the draft MCP Apps specification (SEP-1865), bringing goose in line with the emerging standard, as other hosts like Claude and ChatGPT move toward adoption.

In our previous blog post we detailed the Model Context Protocol (MCP) system and discussed some security concerns and mitigations. As a brief recap, MCP provides agents with a means to accomplish tasks using defined tools; reducing the burden of using complex and varied APIs and integrations on the agent.

Can you automate taste? The short answer is no, you cannot automate taste, but I did make my design preferences legible.
But for those interested in my experiment, I'll share the longer answer: I wanted to participate in Genuary, the annual challenge where people create one piece of creative coding every day in January.
My goal here wasn't to "outsource" my creativity. Instead, I wanted to use Genuary as a sandbox to learn agentic engineering workflows. These workflows are becoming the standard for how developers work with technology. To keep my skills sharp, I used goose to experiment with these workflows in small, daily bursts.

As AI agents grow in capability, more people feel empowered to code and contribute to open source. The ceiling feels higher than ever. That is a net positive for the ecosystem, but it also changes the day-to-day reality for maintainers. Maintainers like the goose team face a growing volume of pull requests and issues, often faster than they can realistically process.
We embraced this reality and put goose to work on its own backlog.

Every time there's a hot new development in AI, Tech Twitter™ declares a casualty.
This week's headline take is "Skills just killed MCP"
It sounds bold. It sounds confident. It's also wrong.

One day, we will tell our kids we used to have to wait for agents, but they won't know that world because the agents in their day would be so fast. I joked about this with Nick Cooper, an MCP Steering Committee Member from OpenAI, and Bradley Axen, the creator of goose. They both chuckled at the thought because they understand exactly how clunky and experimental our current "dial-up era" of agentic workflows can feel.
Model Context Protocol (MCP) has moved the needle by introducing a new norm: the ability to connect agents to everyday apps. However, the experience isn't perfect. We are still figuring out how to balance the power of these tools with the technical constraints of the models themselves.

To plan or not to plan, that's the wrong question. Rather than a binary yes/no, planning exists on a spectrum. The real question is which approach fits your current task and working style.
Different developers approach planning in different ways. One builder might draft detailed pseudocode before touching a keyboard, while another practices test driven development to let the architecture emerge organically. You'll find teams sketching complex diagrams on whiteboards and others spinning up fast prototypes to "fail fast" and refactor later.
If planning is a spectrum when coding manually, why wouldn't it be a spectrum when using an agent to code as well?

We're excited to announce two new ways to interact with goose: a native iOS app for mobile access and native terminal integration. Both give you more flexibility in how and where you use your AI agent.

There is an emerging approach to MCP tool calling referred to as "sandbox mode" or "code mode". These ideas were initially presented by Cloudflare in their Code Mode: the better way to use MCP post and Anthropic in their Code execution with MCP: Building more efficient agents posts. Since the approach and the benefits are clearly laid out in those posts I will summarize them here.