# Using OpenAI for batch inference

You can use **OpenAI** for batch inference in **Curator** to generate  synthetic data. In this example, we will generate reannotation of wildchat dataset, but the approach can be adapted for any data generation task.

## **Prerequisites**

* **Python 3.10+**
* **Curator**: Install via `pip install bespokelabs-curator`
* **OpenAI:** OpenAI API key&#x20;

## **Steps**

#### **1. Setup environment vars**

```sh
export OPENAI_API_KEY=<your_api_key>
```

**2.  Create a curator.LLM subclass**

Create a class that inherits from `curator.LLM`. Implement two key methods:

* `prompt()`: Generates the prompt for the LLM.
* `parse()`: Processes the LLM's response into your desired format.

Here’s the implementation:

```python
"""Example of reannotating the WildChat dataset using curator."""

import logging
from bespokelabs import curator

# To see more detail about how batches are being processed
logger = logging.getLogger("bespokelabs.curator")
logger.setLevel(logging.INFO)


class WildChatReannotator(curator.LLM):
    """A reannotator for the WildChat dataset."""

    def prompt(self, input: dict) -> str:
        """Extract the first message from a conversation to use as the prompt."""
        return input["conversation"][0]["content"]

    def parse(self, input: dict, response: str) -> dict:
        """Parse the model response along with the input to the model into the desired output format.."""
        instruction = input["conversation"][0]["content"]
        return {"instruction": instruction, "new_response": response}

```

#### **3. Configure the OpenAI model**

```python
distiller = WildChatReannotator(model_name="gpt-4o-mini", 
                                batch=True 
                                )
```

#### **4. Generate Data**

Generate the structured data and output the results as a pandas DataFrame:

```python
from datasets import load_dataset
dataset = load_dataset("allenai/WildChat", split="train")
dataset = dataset.select(range(100))

distilled_dataset = distiller(dataset)
print(distilled_dataset.dataset)
print(distilled_dataset.dataset[0])
```

### **Example Output**

Using the above example, the output might look like this:

| instruction                                       | new\_response                                   |
| ------------------------------------------------- | ----------------------------------------------- |
| Write a very long, elaborate, descriptive and ... | Scene: Omelette Apocalypse\n\n\*\*INT. DINER... |
| what are you?                                     | I am a large language model, trained by OpenAI  |

## **Batch Configuration**

* Check out complete [batch configuration ](https://docs.bespokelabs.ai/bespoke-curator/api-reference/llm-api-documentation#batch-processing-parameters)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bespokelabs.ai/bespoke-curator/save-usdusdusd-on-llm-inference/using-openai-for-batch-inference.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
