Using kluster.ai for batch inference

You can use kluster.ai for batch inference in Curator to generate synthetic data. In this example, we will generate answers for GSM8K dataset, but the approach can be adapted for any data generation task. The following models are supported with pricing for different completion windows:

Model ID
Realtime
24h
48h
72h

meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8

$0.20/$0.80

$0.25

$0.20

$0.15

meta-llama/Llama-4-Scout-17B-16E-Instruct

$0.08/$0.45

$0.15

$0.12

$0.10

deepseek-ai/DeepSeek-V3-0324

$0.70/$1.40

$0.63

$0.50

$0.35

google/gemma-3-27b-it

$0.35

$0.30

$0.25

$0.20

deepseek-ai/DeepSeek-V3

$1.25

$0.63

$0.50

$0.35

deepseek-ai/DeepSeek-R1

$3.00/$5.00

$3.50

$3.00

$2.50

Qwen/Qwen2.5-VL-7B-Instruct

$0.30

$0.15

$0.10

$0.05

klusterai/Meta-Llama-3.1-405B-Instruct-Turbo

$3.50

$0.99

$0.89

$0.79

klusterai/Meta-Llama-3.3-70B-Instruct-Turbo

$0.70

$0.20

$0.18

$0.15

klusterai/Meta-Llama-3.1-8B-Instruct-Turbo

$0.18

$0.05

$0.04

$0.03

Note: Prices shown as $ per 1M tokens. For Realtime, some models have different input/output prices shown as input/output. Please find the up to date map here: https://api.kluster.ai/v1/models

Prerequisites

  • Python 3.10+

  • Curator: Install via pip install bespokelabs-curator

  • kluster.ai API key: Get your key from https://www.kluster.ai/

Steps

1. Setup environment vars

2. Create a curator.LLM subclass

Create a class that inherits from curator.LLM. Implement two key methods:

  • prompt(): Generates the prompt for the LLM.

  • parse(): Processes the LLM's response into your desired format.

Here’s the implementation:

3. Configure Reasoner to use DeepSeek-R1 through kluster.ai

4 Generate Data

Generate the structured data and output the results as a pandas DataFrame:

Example Output

Using the above example, the output might look like this:

Batch Configuration

Last updated