Synthetic Data for function calling

This step-by-step tutorial will guide you through creating a system that generates customized function calls using different parameters for each row in a dataset. We'll explore how to override default generation parameters at the row level when using language models.

Introduction

In this tutorial, we'll learn how to:

  1. Create a function call generator using Curator

  2. Define different function tools (APIs)

  3. Configure different generation parameters for each row in a dataset

  4. Handle both successful function calls and regular message responses

Let's dive in!

Step 1: Import Required Libraries

First, let's set up our environment and import the necessary libraries:

# pip install bespokelabs-curator 

import json
from typing import Dict

from datasets import Dataset
from bespokelabs import curator

Step 2: Define the Function Call Generator

We'll create a custom LLM class that generates function calls based on user requests:

This class does two main things:

  • Generates a prompt asking the model to create a function call based on a user request

  • Parses the response to extract either the function call or regular message

Step 3: Define Function Tools

Now, let's define two function tools that our model can use:

These function definitions describe:

  • A weather API that requires location and units parameters

  • A local time API that requires location and timezone parameters

Step 4: Create an LLM Instance with Default Parameters

Let's instantiate our function call generator with default parameters:

This LLM instance has:

  • The "gpt-4o-mini" model

  • Both function tools available by default

  • Configuration for retries and response handling

Step 5: Create a Dataset with Row-Level Parameters

Now, let's create a dataset where each row has its own generation parameters:

Important notes:

  • The first row only has access to the weather function

  • The second row only has access to the time function

  • The generation_params must be JSON strings to prevent dataset operations from expanding dictionary keys

Step 6: Run the Generator and Display Results

Let's run our function call generator on the dataset:

This will:

  • Process each row with its specific generation parameters

  • Generate appropriate function calls for each user request

  • Display the results in a pandas DataFrame

Practical Applications

This technique is useful for:

  • Processing diverse user requests with specialized tools

  • A/B testing different function configurations

  • Creating targeted function call generators for specific domains

  • Building efficient pipelines that adapt to different input types

Conclusion

You've learned how to create a flexible function call generation system that can adapt to different rows in a dataset. This approach allows for more targeted and efficient use of language models when generating function calls, particularly when different requests require different tools or configurations.

Remember to properly configure both default and row-level parameters, and to handle both function call and regular message responses in your parsing logic.

Last updated