Skip to main content

Smart Context Management

When working with Large Language Models (LLMs), there are limits to how much conversation history they can process at once. Goose provides smart context management features to help handle context and conversation limits so you can maintain productive sessions. Here are some key concepts:

  • Context Length: The amount of conversation history the LLM can consider
  • Context Limit: The maximum number of tokens the model can process
  • Context Management: How Goose handles conversations approaching these limits
  • Turn: One complete prompt-response interaction between Goose and the LLM

How Goose Manages Context

Goose uses a two-tierd approach to context management:

  1. Auto-Compaction: Proactively summarizes conversation when approaching token limits
  2. Context Strategies: Backup strategy used if the context limit is still exceeded after auto-compaction

This layered approach lets Goose handle token and context limits gracefully.

Automatic Compaction

Goose automatically compacts (summarizes) older parts of your conversation when approaching token limits, allowing you to maintain long-running sessions without manual intervention. Auto-compaction is triggered by default when you reach 80% of the token limit in Goose Desktop and the Goose CLI.

Control the auto-compaction behavior with the GOOSE_AUTO_COMPACT_THRESHOLD environment variable. Disable this feature by setting the value to 0.0.

# Automatically compact sessions when 60% of available tokens are used
export GOOSE_AUTO_COMPACT_THRESHOLD=0.6

When you reach the auto-compaction threshold:

  1. Goose will automatically start summarizing the conversation to make room.
  2. You'll see a message that says "Auto-compacted context: X → Y tokens (Z% reduction)"
  3. Once complete, previous messages in your conversation remain visible, but only the summary is included in the active context for Goose.
  4. Continue the session with the summarized context in place.

Manual Compaction

You can also trigger compaction manually before reaching context or token limits:

  1. Click the scroll text icon in the chat interface
  2. Confirm the summarization in the modal
  3. View or edit the generated summary if needed
note

Before the scroll icon appears, you must send at least one message in the chat. Simply starting a new session won't trigger it.

Context Limit Strategies

When auto-compaction is disabled, or if a conversation still exceeds the context limit, Goose offers different ways to handle it:

FeatureDescriptionBest ForAvailabilityImpact
SummarizationCondenses conversation while preserving key pointsLong, complex conversationsDesktop and CLIMaintains most context
TruncationRemoves oldest messages to make roomSimple, linear conversationsCLI onlyLoses old context
ClearStarts fresh while keeping session activeNew direction in conversationCLI onlyLoses all context
PromptAsks user to choose from the above optionsControl over each decision in interactive sessionsCLI onlyDepends on choice made

Goose Desktop exclusively uses summarization to manage context, preserving key information while reducing size.

Maximum Turns

The Max Turns limit is the maximum number of consecutive turns that Goose can take without user input (default: 1000). When the limit is reached, Goose stops and prompts: "I've reached the maximum number of actions I can do without user input. Would you like me to continue?" If the user answers in the affirmative, Goose continues until the limit is reached and then prompts again.

This feature gives you control over agent autonomy and prevents infinite loops and runaway behavior, which could have significant cost consequences or damaging impact in production environments. Use it for:

  • Preventing infinite loops and excessive API calls or resource consumption in automated tasks
  • Enabling human supervision or interaction during autonomous operations
  • Controlling loops while testing and debugging agent behavior

This setting is stored as the GOOSE_MAX_TURNS environment variable in your config.yaml file. You can configure it using the Desktop app or CLI.

  1. Click the button in the top-left to open the sidebar
  2. Click the Settings button on the sidebar
  3. Click the Chat tab
  4. Scroll to Conversation Limits and enter a value for Max Turns

Choosing the Right Value

The appropriate max turns value depends on your use case and comfort level with automation:

  • 5-10 turns: Good for exploratory tasks, debugging, or when you want frequent check-ins. For example, "analyze this codebase and suggest improvements" where you want to review each step
  • 25-50 turns: Effective for well-defined tasks with moderate complexity, such as "refactor this module to use the new API" or "set up a basic CI/CD pipeline"
  • 100+ turns: More suitable for complex, multi-step automation where you trust Goose to work independently, like "migrate this entire project from React 16 to React 18" or "implement comprehensive test coverage for this service"

Remember that even simple-seeming tasks often require multiple turns. For example, asking Goose to "fix the failing tests" might involve analyzing test output (1 turn), identifying the root cause (1 turn), making code changes (1 turn), and verifying the fix (1 turn).

Token Usage

After sending your first message, Goose Desktop and Goose CLI display token usage.

The Desktop displays a colored circle next to the model name at the bottom of the session window. The color provides a visual indicator of your token usage for the session.

  • Green: Normal usage - Plenty of context space available
  • Orange: Warning state - Approaching limit (80% of capacity)
  • Red: Error state - Context limit reached

Hover over this circle to display:

  • The number of tokens used
  • The percentage of available tokens used
  • The total available tokens
  • A progress bar showing your current token usage

Model Context Limit Overrides

Context limits are automatically detected based on your model name, but Goose provides settings to override the default limits:

ModelDescriptionBest ForSetting
MainSet context limit for the main model (also serves as fallback for other models)LiteLLM proxies, custom models with non-standard namesGOOSE_CONTEXT_LIMIT
LeadSet larger context for planning in lead/worker modeComplex planning tasks requiring more contextGOOSE_LEAD_CONTEXT_LIMIT
WorkerSet smaller context for execution in lead/worker modeCost optimization during execution phaseGOOSE_WORKER_CONTEXT_LIMIT
PlannerSet context for planner modelsLarge planning tasks requiring extensive contextGOOSE_PLANNER_CONTEXT_LIMIT
info

This setting only affects the displayed token usage and progress indicators. Actual context management is handled by your LLM, so you may experience more or less usage than the limit you set, regardless of what the display shows.

This feature is particularly useful with:

  • LiteLLM Proxy Models: When using LiteLLM with custom model names that don't match Goose's patterns
  • Enterprise Deployments: Custom model deployments with non-standard naming
  • Fine-tuned Models: Custom models with different context limits than their base versions
  • Development/Testing: Temporarily adjusting context limits for testing purposes

Goose resolves context limits with the following precedence (highest to lowest):

  1. Explicit context_limit in model configuration (if set programmatically)
  2. Specific environment variable (e.g., GOOSE_LEAD_CONTEXT_LIMIT)
  3. Global environment variable (GOOSE_CONTEXT_LIMIT)
  4. Model-specific default based on name pattern matching
  5. Global default (128,000 tokens)

Configuration

Model context limit overrides are not yet available in the Goose Desktop app.

Scenarios

  1. LiteLLM proxy with custom model name
# LiteLLM proxy with custom model name
export GOOSE_PROVIDER="openai"
export GOOSE_MODEL="my-custom-gpt4-proxy"
export GOOSE_CONTEXT_LIMIT=200000 # Override the 32k default
  1. Lead/worker setup with different context limits
# Different context limits for planning vs execution
export GOOSE_LEAD_MODEL="claude-opus-custom"
export GOOSE_LEAD_CONTEXT_LIMIT=500000 # Large context for planning
export GOOSE_WORKER_CONTEXT_LIMIT=128000 # Smaller context for execution
  1. Planner with large context
# Large context for complex planning
export GOOSE_PLANNER_MODEL="gpt-4-custom"
export GOOSE_PLANNER_CONTEXT_LIMIT=1000000

Cost Tracking

Display real-time estimated costs of your session.

To manage live cost tracking:

  1. Click the button in the top-left to open the sidebar
  2. Click the Settings button on the sidebar
  3. Click the App tab
  4. Toggle Cost Tracking on/off

The session cost is shown at the bottom of the Goose window and updates dynamically as tokens are consumed. Hover over the cost to see a detailed breakdown of token usage. If multiple models are used in the session, this includes a cost breakdown by model. Ollama and local deployments always show a cost of $0.00.

Pricing data is regularly fetched from the OpenRouter API and cached locally. The Advanced settings tab shows when the data was last updated and allows you to refresh.

These costs are estimates only, and not connected to your actual provider bill. The cost shown is an approximation based on token counts and public pricing data.