Synthetic Data for function calling
This step-by-step tutorial will guide you through creating a system that generates customized function calls using different parameters for each row in a dataset. We'll explore how to override default generation parameters at the row level when using language models.
Introduction
In this tutorial, we'll learn how to:
Create a function call generator using Curator
Define different function tools (APIs)
Configure different generation parameters for each row in a dataset
Handle both successful function calls and regular message responses
Let's dive in!
Step 1: Import Required Libraries
First, let's set up our environment and import the necessary libraries:
Step 2: Define the Function Call Generator
We'll create a custom LLM class that generates function calls based on user requests:
This class does two main things:
Generates a prompt asking the model to create a function call based on a user request
Parses the response to extract either the function call or regular message
Step 3: Define Function Tools
Now, let's define two function tools that our model can use:
These function definitions describe:
A weather API that requires location and units parameters
A local time API that requires location and timezone parameters
Step 4: Create an LLM Instance with Default Parameters
Let's instantiate our function call generator with default parameters:
This LLM instance has:
The "gpt-4o-mini" model
Both function tools available by default
Configuration for retries and response handling
Step 5: Create a Dataset with Row-Level Parameters
Now, let's create a dataset where each row has its own generation parameters:
Important notes:
The first row only has access to the weather function
The second row only has access to the time function
The
generation_params
must be JSON strings to prevent dataset operations from expanding dictionary keys
Step 6: Run the Generator and Display Results
Let's run our function call generator on the dataset:
This will:
Process each row with its specific generation parameters
Generate appropriate function calls for each user request
Display the results in a pandas DataFrame
Practical Applications
This technique is useful for:
Processing diverse user requests with specialized tools
A/B testing different function configurations
Creating targeted function call generators for specific domains
Building efficient pipelines that adapt to different input types
Conclusion
You've learned how to create a flexible function call generation system that can adapt to different rows in a dataset. This approach allows for more targeted and efficient use of language models when generating function calls, particularly when different requests require different tools or configurations.
Remember to properly configure both default and row-level parameters, and to handle both function call and regular message responses in your parsing logic.
Last updated