Mustafa Batın EFE - Software Engineer

Understanding how to enable LLMs to interact with external tools, APIs, and databases through structured function calling.

What is Function Calling?

Function calling enables LLMs like Claude and GPT to achieve complex and practical tasks beyond generating text alone. It bridges the gap between the model's conversational intelligence and external APIs, databases, and custom code.

OpenAI unleashed the power of function calling first, changing the game for how developers interact with LLMs. In a recent announcement from Anthropic, Claude joined the function calling party, bringing this critical capability to Claude 3 models.

How Function Calling Works

The Process

Define Tools: You send a request to the API which includes the user's query and a list of available tools that the model can potentially use. Each tool is defined with a name, description, and input schema.
Model Decides: The LLM analyzes the user query and determines which tool (if any) to use based on tool descriptions.
Returns Function Call: The model returns the function name and arguments in a structured format (JSON).
You Execute: Your code parses the response and actually calls the function with the provided arguments.
Return Results: You send the function results back to the model.
Model Responds: The LLM uses the function results to generate a natural language response to the user.

Critical Point: You Execute Functions

The model does not execute or call functions on the provider's side; it only returns the function name and arguments which you will have to parse and call yourself in your code. This is a security feature—the LLM suggests what to do, but you control what actually happens.

Model Support

Anthropic Claude

Only claude-3* models and newer support function calling using Anthropic's API. This includes:

claude-3-opus-20240229
claude-3-sonnet-20240229
claude-3-haiku-20240307
claude-opus-4-20250514 (latest)

OpenAI

Function calling is supported in:

gpt-4 and all gpt-4-* variants
gpt-3.5-turbo and later versions
gpt-4o (optimized for function calling)

Anthropic's Implementation

Messages API Structure

The Anthropic Messages API provides a clear mechanism for function calling using the tools, tool_use, and tool_result structures. This clean API design makes implementation straightforward.

Tool Definition Format

Tools are defined with:

name: Unique identifier for the tool
description: What the tool does (the model uses this to decide when to call it)
input_schema: JSON Schema defining expected parameters

Common Use Cases

Database Queries

Convert natural language questions into database queries. The model determines what data to fetch, you execute the query, and the model formats results into readable answers.

API Integration

Enable the LLM to interact with external services: send emails, create calendar events, fetch weather data, search the web, or call custom business APIs.

Data Processing

Define functions for complex calculations, data transformations, or analysis that would be difficult or unreliable for the LLM to do through text generation alone.

Multi-Step Workflows

Chain multiple function calls together. The model can call one function, analyze results, and decide to call another function based on those results.

Implementation with LlamaIndex

LlamaIndex provides clean abstractions for function calling with both OpenAI and Anthropic. The FunctionCallingAgent class handles the complex orchestration automatically, letting you focus on defining tools and business logic.

LlamaIndex Benefits

Automatic tool discovery from function signatures
Built-in retry logic for failed function calls
Conversation memory management
Streaming support for function calling
Easy integration with RAG pipelines

Best Practices

Write Clear Tool Descriptions

The model relies entirely on tool descriptions to decide when to use them. Be specific about what each tool does, when to use it, and what parameters it needs.

Validate Inputs

Even though models are good at following schemas, always validate function arguments before execution. Models can occasionally generate invalid parameters.

Handle Errors Gracefully

When a function call fails, return clear error messages to the model. The model can often retry with different parameters or try an alternative approach.

Limit Function Scope

Each function should do one thing well. Don't create overly complex functions with many parameters—the model will struggle to use them correctly.

Return Structured Data

Function results should be structured (JSON) rather than free-form text when possible. This helps the model parse and use the data effectively.

Security Considerations

Never Trust Model Output Blindly

Always validate and sanitize function arguments. Implement authorization checks before executing sensitive operations.

Implement Rate Limiting

Prevent abuse by limiting how many function calls can be made per conversation or per user.

Audit Function Calls

Log all function calls with parameters and results for security auditing and debugging.

Advanced Patterns

Parallel Function Calling

Modern models can request multiple independent function calls simultaneously. Execute them in parallel to reduce latency.

Conditional Tool Availability

Don't always provide all tools. Based on conversation context or user permissions, dynamically adjust which tools are available to the model.

Function Calling Chains

Design workflows where one function's output becomes another's input. The model can orchestrate complex multi-step operations automatically.

Testing Function Calling Agents

Create comprehensive test suites covering:

Correct function selection for various queries
Proper argument extraction and formatting
Error handling when functions fail
Behavior when no appropriate function exists
Multi-step workflows requiring multiple function calls

Sources

This article was generated with the assistance of AI technology and reviewed for accuracy and relevance.

Function Calling in LLMs: OpenAI and Anthropic Claude