# Using kluster.ai for batch inference

You can use **kluster.ai** for batch inference in **Curator** to generate  synthetic data. In this example, we will generate answers for GSM8K dataset, but the approach can be adapted for any data generation task. The following models are supported with pricing for different completion windows:

<table><thead><tr><th width="443.73046875">Model ID</th><th>Realtime</th><th>24h</th><th>48h</th><th>72h</th></tr></thead><tbody><tr><td>meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8</td><td>$0.20/$0.80</td><td>$0.25</td><td>$0.20</td><td>$0.15</td></tr><tr><td>meta-llama/Llama-4-Scout-17B-16E-Instruct</td><td>$0.08/$0.45</td><td>$0.15</td><td>$0.12</td><td>$0.10</td></tr><tr><td>deepseek-ai/DeepSeek-V3-0324</td><td>$0.70/$1.40</td><td>$0.63</td><td>$0.50</td><td>$0.35</td></tr><tr><td>google/gemma-3-27b-it</td><td>$0.35</td><td>$0.30</td><td>$0.25</td><td>$0.20</td></tr><tr><td>deepseek-ai/DeepSeek-V3</td><td>$1.25</td><td>$0.63</td><td>$0.50</td><td>$0.35</td></tr><tr><td>deepseek-ai/DeepSeek-R1</td><td>$3.00/$5.00</td><td>$3.50</td><td>$3.00</td><td>$2.50</td></tr><tr><td>Qwen/Qwen2.5-VL-7B-Instruct</td><td>$0.30</td><td>$0.15</td><td>$0.10</td><td>$0.05</td></tr><tr><td>klusterai/Meta-Llama-3.1-405B-Instruct-Turbo</td><td>$3.50</td><td>$0.99</td><td>$0.89</td><td>$0.79</td></tr><tr><td>klusterai/Meta-Llama-3.3-70B-Instruct-Turbo</td><td>$0.70</td><td>$0.20</td><td>$0.18</td><td>$0.15</td></tr><tr><td>klusterai/Meta-Llama-3.1-8B-Instruct-Turbo</td><td>$0.18</td><td>$0.05</td><td>$0.04</td><td>$0.03</td></tr></tbody></table>

*Note: Prices shown as $ per 1M tokens. For Realtime, some models have different input/output prices shown as input/output. Please find the up to date map here:* <https://api.kluster.ai/v1/models>

## **Prerequisites**

* **Python 3.10+**
* **Curator**: Install via `pip install bespokelabs-curator`
* **kluster.ai API key:** Get your key from <https://www.kluster.ai/>&#x20;

## **Steps**

#### **1. Setup environment vars**

```sh
export KLUSTERAI_API_KEY=<your_api_key>
```

**2.  Create a curator.LLM subclass**

Create a class that inherits from `curator.LLM`. Implement two key methods:

* `prompt()`: Generates the prompt for the LLM.
* `parse()`: Processes the LLM's response into your desired format.

Here’s the implementation:

```python
"""Example of reannotating the WildChat dataset using curator."""

import logging
from bespokelabs import curator

# To see more detail about how batches are being processed
logger = logging.getLogger("bespokelabs.curator")
logger.setLevel(logging.INFO)

class Reasoner(curator.LLM):
    """Curator class for processing GSM8K dataset."""

    def prompt(self, input):
        """Create a prompt for the LLM to reason about the problem."""
        return f"Answer the following question: {input['question']}"

    def parse(self, input, response):
        """Parse the LLM response to extract reasoning and solution.

        The response format is expected to be '<think>reasoning</think>answer'
        """
        full_response = response

        # Extract reasoning and answer using regex
        import re

        reasoning_pattern = r"<think>(.*?)</think>"
        reasoning_match = re.search(reasoning_pattern, full_response, re.DOTALL)

        reasoning = reasoning_match.group(1).strip() if reasoning_match else ""
        # Answer is everything after </think>
        answer = re.sub(reasoning_pattern, "", full_response, flags=re.DOTALL).strip()

        return [
            {
                "question": input["question"],
                "reasoning": reasoning,
                "deepseek_solution": answer,
                "gold_answer": input["answer"],
            }
        ]


```

#### **3. Configure Reasoner to use DeepSeek-R1 through kluster.ai**&#x20;

```python
reasoner = Reasoner(model_name="deepseek-ai/DeepSeek-R1", 
                    backend="klusterai", 
                    batch=True, 
                    backend_params={"max_retries": 1, "completion_window": "1h"})
```

#### **4 Generate Data**

Generate the structured data and output the results as a pandas DataFrame:

```python
from datasets import load_dataset

dataset = load_dataset("openai/gsm8k", name="main")
dataset_to_use = dataset["train"].take(3)
output = reasoner(dataset).dataset
```

### **Example Output**

Using the above example, the output might look like this:

```python
from IPython.display import HTML, display, Markdown
which = 0
question = output[which]['question']
gold_answer = output[which]['gold_answer']
model_answer = output[which]['deepseek_solution']
thought = output[which]['reasoning']

to_display_input = question.replace("\n", "<br>")
to_display_output = model_answer.replace("\n", "<br>")

display(Markdown(
    "<h1>Question</h1>"
    f"<h3>{question}</h3>"
))
display(Markdown(
    "<h1>Model answer</h1>"
    f"<p>{model_answer}</p>"
))
display(Markdown(
    "<h1>Gold answer</h1>"
    f"<p>{gold_answer}</p>"
))
display(Markdown(
    "<h1>Model Thought</h1>"
    f"<p>{thought}</p>"
))
```

<figure><img src="https://1831689742-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FINVkBnpIXpC4135NE6Ex%2Fuploads%2FwcW8XO6RtdOyid7IiwlF%2FScreenshot%202025-01-30%20at%2012.42.39%E2%80%AFPM.png?alt=media&#x26;token=ebb826db-015f-446e-8081-8ed12a62e997" alt=""><figcaption></figcaption></figure>

## **Batch Configuration**

* Check out complete [batch configuration ](https://docs.bespokelabs.ai/bespoke-curator/api-reference/llm-api-documentation#batch-processing-parameters)
