Using HuggingFace inference providers with Curator
Last updated
Last updated
This guide demonstrates how to use Hugging Face Inference Providers with curator to generate synthetic data using various LLM providers available as Inference Providers on Hugging Face. We’ll walk through an example of generating synthetic recipes, but this approach can be adapted for any synthetic data generation task.
Hugging Face’s give developers streamlined, unified access to hundreds of machine learning models, powered by Hugging Face’s serverless inference partners. Using Inference Providers gives you access to a wide range of state of the art models, with newly released models regularly added by providers.
Since Inference Providers are available using OpenAI compatible APIs, you can use them as a drop in replacement for OpenAI in your project. This is the approach we’ll take in this guide.
Ensure you have a Hugging Face account, you can sign up .
Create a Hugging Face API key, you can create one . This will be used to authenticate your requests to the Inference Providers.
Ensure you have at least one Inference Provider enabled in your Hugging Face account. You can configure this in the page.
First, install the necessary packages:
First, create a class that inherits from curator.LLM. You’ll need to implement two key methods:
prompt()
: Generates the prompt for the LLM
parse()
: Processes the LLM’s response into your desired format
Create a dataset of inputs using the HuggingFace Dataset class:
Since Inference Providers are available using OpenAI compatible APIs, we can use them as a drop in replacement for OpenAI in our project. We just need to configure a few things:
the base_url
the api_key
the model_name
You can view the dataset in Curator Viewer or HuggingFace hub
Curator Viewer
OR
since curator is using 🤗 datasets, you can push the results to the Hub just like any other dataset!
Since Inference Providers are available via a standard protocol, it’s easy to swap out models on providers in your pipeline depending on the needs of your project.
You can find this infromation on the model page of the Inference Provider. For example, here’s the information for the Together Inference Provider:
You can see what the dataset looks like on the Hub !
You can find a list of models that are available via a specific Inference Provider by using a Inference Provider filter on the Hub. For example, that are available via the Together Inference Provider.