# Save $$$ on LLM inference

Providers like OpenAI and Anthropic offer batch mode, which allows you to upload a bunch of prompts to be processed asynchronously, for lower costs (typically 50%). However, these APIs are often very cumbersome to manage:

* You typically have to prepare your batch file, upload it, and poll for responses periodically.
* Large datasets will typically not fit in a single batch due to batch size limits, and so you will need to split your dataset into mutiple smaller batches, increasing the complexity you need to manage.&#x20;

With Curator, you only need to toggle a single flag to save $$$, without any headache!&#x20;

## Using batch mode

Let's look at a simple example of reannotating instructions from the [WildChat](https://huggingface.co/datasets/allenai/WildChat) dataset with new responses from gpt-4o-mini.

First, we need to load the WildChat dataset using HuggingFace:

```python
from datasets import load_dataset

dataset = load_dataset("allenai/WildChat", split="train")
dataset = dataset.select(range(3_000))  # Select a subset of 3,000 samples
```

We then create a new `LLM` class and apply to `dataset`. All you need to do to enable batching is setting `batch=True` when initializing your `LLM` object, and you're done!

```python
from bespokelabs import curator

class WildChatReannotator(curator.LLM):
    """A reannotator for the WildChat dataset."""

    def prompt(self, input: dict) -> str:
        """Extract the first message from a conversation to use as the prompt."""
        return input["conversation"][0]["content"]

    def parse(self, input: dict, response: str) -> dict:
        """Parse the model response along with the input to the model into the desired output format."""
        instruction = input["conversation"][0]["content"]
        return {"instruction": instruction, "new_response": response}
        
# Initialize the reannotator with batch processing
reannotator = WildChatReannotator(
    model_name="gpt-4o-mini",
    batch=True,  # Enable batch processing
    backend_params={"batch_size": 1_000},  # Specify batch size
)

reannotated_dataset = reannotator(dataset).dataset
```

## Supported Models

Check out how-to guides for using batch mode with our supported providers:

* [using-openai-for-batch-inference](https://docs.bespokelabs.ai/bespoke-curator/save-usdusdusd-on-llm-inference/using-openai-for-batch-inference "mention")
* [using-anthropic-for-batch-inference](https://docs.bespokelabs.ai/bespoke-curator/save-usdusdusd-on-llm-inference/using-anthropic-for-batch-inference "mention")
* [using-gemini-for-batch-inference](https://docs.bespokelabs.ai/bespoke-curator/save-usdusdusd-on-llm-inference/using-gemini-for-batch-inference "mention")
* [using-kluster.ai-for-batch-inference](https://docs.bespokelabs.ai/bespoke-curator/save-usdusdusd-on-llm-inference/using-kluster.ai-for-batch-inference "mention")

Feel free to tell us which providers you want us to add support for, or send a PR if you want to [contribute](https://github.com/bespokelabsai/curator/blob/main/CONTRIBUTING.md)!&#x20;
