If you've built anything with an LLM in the last year, you've hit the wall: the model can generate beautiful text, but it can't do anything. It can't check a database. It can't call your API. It can't look up the weather, send an email, or query your company's internal knowledge base. It's a brilliant conversationalist trapped in a room with no doors.

Function calling — also called tool use — is the door. It's the mechanism that lets an LLM output structured data to invoke external functions instead of just generating text. And in 2026, it's the single most important skill for any engineer building AI-powered applications.

This guide covers everything: how function calling works under the hood, code examples across OpenAI, Anthropic Claude, and Google Gemini, the MCP protocol that's standardizing tool access, and the production patterns that separate toy demos from real systems.

How Function Calling Works

The core idea is elegant. Instead of just returning text, the model can return a structured request to call a specific function with specific arguments. Your code executes the function, returns the result, and the model uses that result to generate its final response.

User Message
LLM Decides to Call Tool
Your Code Executes Tool
Result Sent Back
LLM Generates Response

The critical insight is that the model never executes the function itself. It only generates the intent — a JSON object with the function name and arguments. Your application code is responsible for the actual execution. This is a safety feature: the model proposes actions, and your code validates and executes them.

What you define for each tool

Every tool definition includes three things:

Function Calling with OpenAI

OpenAI popularized function calling with GPT-3.5 in June 2023 and has since evolved the API significantly. As of 2026, GPT-4.1 achieves 97-99% accuracy on function calling benchmarks and supports up to 128 tools per request.

Python — OpenAI
from openai import OpenAI

client = OpenAI()

# Define the tools
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name"
                }
            },
            "required": ["city"]
        }
    }
}]

# Send message with tools
response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

# The model returns a tool_call, not text
tool_call = response.choices[0].message.tool_calls[0]
# tool_call.function.name == "get_weather"
# tool_call.function.arguments == '{"city": "Tokyo"}'

Function Calling with Anthropic Claude

Anthropic calls it "tool use" and supports parallel tool calls natively, with a maximum of 64 tools per request. Claude Opus 4 and Sonnet 4 are both highly reliable at structured tool calling, and Claude's extended thinking mode gives it an edge on complex multi-step tool chains where reasoning about which tools to call matters.

Python — Anthropic Claude
import anthropic

client = anthropic.Anthropic()

# Define tools
tools = [{
    "name": "get_weather",
    "description": "Get current weather for a city",
    "input_schema": {
        "type": "object",
        "properties": {
            "city": {
                "type": "string",
                "description": "City name"
            }
        },
        "required": ["city"]
    }
}]

# Send message with tools
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}]
)

# Claude returns tool_use content blocks
for block in response.content:
    if block.type == "tool_use":
        # block.name == "get_weather"
        # block.input == {"city": "Tokyo"}
        result = execute_tool(block.name, block.input)

Provider Comparison

Each provider implements function calling differently. Here's what matters in practice:

Feature OpenAI (GPT-4.1) Anthropic (Claude Opus 4) Google (Gemini 2.5 Pro)
Max tools per request 128 64 64
Parallel tool calls Yes (default) Yes (native) Yes
Forced tool use tool_choice: "required" tool_choice: {"type": "any"} tool_config: {mode: "ANY"}
Streaming support Yes Yes Yes
Accuracy (benchmarks) 97-99% 96-98% 93-96%
Schema format JSON Schema JSON Schema JSON Schema (OpenAPI subset)
Token overhead per tool ~100-300 tokens ~100-300 tokens ~150-350 tokens

The MCP Protocol: Universal Tool Access

The Model Context Protocol (MCP), created by Anthropic and open-sourced in November 2024, is rapidly becoming the standard for connecting AI models to external tools. Think of it as USB-C for AI: instead of writing custom function definitions for each provider, you define your tools once as an MCP server, and any MCP-compatible client can use them.

MCP SDK monthly downloads 97 million
Adopted by OpenAI, Google, Microsoft, AWS
Governance AAIF (vendor-neutral)
Transport HTTP + SSE (streamable)

MCP solves three problems that direct function calling doesn't:

Python — MCP Server
from mcp.server import FastMCP

app = FastMCP("weather-server")

# Define a tool — any MCP client can call it
@app.tool()
async def get_weather(city: str) -> str:
    """Get current weather for a city."""
    # Your implementation here
    weather = await fetch_weather_api(city)
    return f"{city}: {weather.temp}°F, {weather.condition}"

@app.tool()
async def get_forecast(city: str, days: int = 5) -> str:
    """Get weather forecast for the next N days."""
    forecast = await fetch_forecast_api(city, days)
    return forecast.to_json()

Production Patterns

Building a demo with function calling takes an hour. Building a production system takes weeks of learning the hard way. Here are the patterns that matter:

1. Write better tool descriptions

The model chooses which tool to call based almost entirely on the description you write. A bad description — "Gets data" — leads to wrong tool selection. A good description explains what the tool does, when to use it, and what it returns.

"The single most impactful thing you can do to improve tool calling accuracy is to write better descriptions. Engineers spend hours optimizing their prompts and five seconds on tool descriptions. Flip that ratio."— Anthropic Engineering, "Writing effective tools for AI agents"

2. Validate tool arguments before execution

The model generates JSON arguments, but it can hallucinate field names, use wrong types, or produce invalid values. Always validate against your schema before executing. Libraries like Pydantic (Python) or Zod (TypeScript) make this trivial.

3. Handle parallel tool calls

Modern models frequently call multiple tools in parallel — "What's the weather in Tokyo and New York?" produces two simultaneous tool calls. Your code must handle this: execute them concurrently, collect results, and send them all back in a single response.

4. Limit tool count for cost and accuracy

Each tool definition adds ~100-300 tokens to every API request. With 20 tools, you're burning 3,000-6,000 tokens before the conversation even starts. More importantly, accuracy degrades as tool count increases. Production systems use tool filtering: only send the 5-10 tools most relevant to the current conversation context.

5. Implement retry with fallback

Tool calls can fail — the external API times out, the database is down, rate limits kick in. Always return a clear error message to the model instead of throwing an exception. The model can often recover gracefully: "The weather API is currently unavailable. Based on historical data for Tokyo in late May, temperatures typically range from 65-78°F."

Function Calling vs. MCP: When to Use What

Quick decision guide

Use direct function calling when: you have fewer than 10 tools, a single LLM provider, and a simple request-response pattern. It's simpler to set up and debug. Use MCP when: you need tool portability across providers, have a large or dynamic tool catalog, or are building multi-agent systems where agents need to discover and share tools. MCP adds complexity upfront but pays off as the system scales.

Building Your First Agent with Tools

Here's the mental model for building a useful agent with function calling:

  1. Start with one tool. Pick the simplest, most useful tool for your use case. Get it working end-to-end before adding more.
  2. Build the loop. The agent needs a loop: receive user message → call model → if tool call, execute and loop → if text, return to user. Most frameworks handle this, but it's worth building manually once to understand the mechanics.
  3. Add error boundaries. Set a maximum number of tool calls per turn (typically 5-10) to prevent infinite loops. Implement timeouts on tool execution. Return structured errors the model can understand.
  4. Instrument everything. Log every tool call: which tool, what arguments, execution time, result. This is your debugging lifeline when the agent behaves unexpectedly.
  5. Test with adversarial inputs. Users will ask the agent to do things your tools can't handle. Test edge cases: empty inputs, invalid cities, SQL injection attempts in search queries. The model is generally good at handling these, but your tool implementations need to be robust.

The Job Market for Tool-Use Skills

Function calling and tool use are now table-stakes skills for AI engineering roles. Every major AI company — from OpenAI and Anthropic to startups building on top of their APIs — requires engineers who can build reliable tool-calling systems.

Roles that specifically require these skills include AI Engineer, ML Platform Engineer, AI Application Developer, and the increasingly common "AI Agent Engineer" title. Compensation for these roles ranges from $180K to $450K+ total comp depending on level and company.

Find AI engineering roles

Browse AI and ML engineering roles at companies that are building the next generation of intelligent systems.

Browse AI/ML Jobs → Explore AI Skills →

Frequently Asked Questions

What is function calling in AI?+
Function calling is the mechanism that lets an LLM generate structured data (usually JSON) to invoke external functions instead of plain text. When the model recognizes that a user's request requires an action — looking up data, calling an API, running a calculation — it outputs a structured function call with the name and arguments. Your code executes the function and returns the result.
What is the difference between function calling and tool use?+
They are the same concept with different names. OpenAI originally called it "function calling" and later renamed it "tool use." Anthropic uses "tool use." Google uses both terms. The underlying mechanism is identical: the model outputs structured data to invoke external capabilities.
What is MCP (Model Context Protocol)?+
MCP is an open standard created by Anthropic that standardizes how AI models connect to external tools and data sources. Think of it as USB-C for AI — a universal plug that lets any model connect to any tool server. As of 2026, MCP has 97 million monthly SDK downloads and support from OpenAI, Google, and most major AI platforms.
Which AI model is best for function calling?+
OpenAI's GPT-4.1 and Claude Opus 4 lead in function calling reliability (97-99% accuracy). For complex multi-step tool chains, Claude's extended thinking mode gives it an edge. For high-volume, simpler calls, GPT-4.1-mini offers the best cost-performance ratio.
How many tokens does function calling add?+
Each tool definition typically costs 100-300 tokens. With 10 tools, you add 1,500-3,000 tokens per request. Production systems use tool filtering to only send relevant tools and keep costs manageable.
What skills do I need to learn function calling?+
You need: basic Python or TypeScript, understanding of JSON Schema, familiarity with at least one LLM API, and knowledge of REST APIs. No machine learning expertise is required — function calling is an application-layer skill. Most engineers can build their first working tool in under an hour.