Using kluster.ai for batch inference
You can use kluster.ai for batch inference in Curator to generate synthetic data. In this example, we will generate answers for GSM8K dataset, but the approach can be adapted for any data generation task.
Prerequisites
Python 3.10+
Curator: Install via
pip install bespokelabs-curator
kluster.ai API key: Get your key from https://www.kluster.ai/
Steps
1. Setup environment vars
2. Create a curator.LLM subclass
Create a class that inherits from curator.LLM
. Implement two key methods:
prompt()
: Generates the prompt for the LLM.parse()
: Processes the LLM's response into your desired format.
Here’s the implementation:
3. Configure Reasoner to use DeepSeek-R1 through kluster.ai
4 Generate Data
Generate the structured data and output the results as a pandas DataFrame:
Example Output
Using the above example, the output might look like this:
Batch Configuration
Check out complete batch configuration
Last updated