# API Reference

## curator.LLM

The `LLM` class serves as the primary interface for prompting Large Language Models in Curator. It provides a flexible and extensible way to generate synthetic data using various LLM providers. Returns `CuratorResponse` which holds dataset, statistics (performance, token usage, cost etc) attributes.

### Class Definition

```python
class LLM:
    def __init__(
        self,
        model_name: str,
        response_format: Type[BaseModel] | None = None,
        batch: bool = False,
        backend: Optional[str] = None,
        generation_params: dict | None = None,
        backend_params: BackendParamsType | None = None,
    )
```

### Constructor Parameters

<table data-full-width="true"><thead><tr><th>Parameter</th><th>Type</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><code>model_name</code></td><td><code>str</code></td><td>Required</td><td>Name of the LLM to use</td></tr><tr><td><code>response_format</code></td><td><code>Type[BaseModel] | None</code></td><td><code>None</code></td><td>Pydantic model specifying the expected response format</td></tr><tr><td><code>batch</code></td><td><code>bool</code></td><td><code>False</code></td><td>Enable batch processing mode</td></tr><tr><td><code>backend</code></td><td><code>Optional[str]</code></td><td><code>None</code></td><td>LLM backend to use ("openai", "litellm", or "vllm"). Auto-determined if None</td></tr><tr><td><code>generation_params</code></td><td><code>dict | None</code></td><td><code>None</code></td><td>Additional parameters for the generation API</td></tr><tr><td><code>backend_params</code></td><td><code>BackendParamsType | None</code></td><td><code>None</code></td><td>Configuration parameters for request processor</td></tr></tbody></table>

### Backend Parameters Configuration

The `backend_params` dictionary supports various configuration options based on the execution mode. Here's a comprehensive breakdown:

#### Common Parameters

These parameters are available across all backends:

| Parameter                | Type             | Default                        | Description                                             |
| ------------------------ | ---------------- | ------------------------------ | ------------------------------------------------------- |
| `max_retries`            | `int`            | `3`                            | Maximum number of retry attempts for failed requests    |
| `require_all_responses`  | `bool`           | `False`                        | Whether to require successful responses for all prompts |
| `base_url`               | `Optional[str]`  | `None`                         | Optional base URL for API endpoint                      |
| `request_timeout`        | `int`            | `600`                          | Timeout in seconds for each request                     |
| `api_key`                | `Optional[str]`  | `None`                         | Api key for the selected model.                         |
| `in_mtok_cost`           | `Optional[int]`  | `None`                         | Optional cost per million input tokens.                 |
| out`_mtok_cost`          | `Optional[int]`  | `None`                         | Optional cost per million output tokens.                |
| `invalid_finish_reasons` | `Optional[list]` | `['content_filter', 'length'`] | List of api finish reasons which are considered failed. |

```python
# Example: Common parameters configuration
backend_params = {
    "max_retries": 3,
    "require_all_responses": True,
    "base_url": "https://custom-endpoint.com/v1",
    "request_timeout": 300
}
```

#### Online Mode Parameters

Parameters for online processor mode:

| Parameter                        | Type    | Description                                                                                                      |
| -------------------------------- | ------- | ---------------------------------------------------------------------------------------------------------------- |
| `max_requests_per_minute`        | `int`   | Maximum number of API requests per minute                                                                        |
| `max_tokens_per_minute`          | `int`   | Maximum number of tokens per minute                                                                              |
| `seconds_to_pause_on_rate_limit` | `float` | Duration to pause when rate limited                                                                              |
| `max_concurrent_requests`        | `int`   | Maximum number of concurrent requests.                                                                           |
| `max_input_tokens_per_minute`    | `int`   | Maximum number of input tokens allowed per minute. Note: Only valid with seperate token strategy i.e Anthropic.  |
| `max_output_tokens_per_minute`   | `int`   | Maximum number of output tokens allowed per minute. Note: Only valid with seperate token strategy i.e Anthropic. |

```python
# Example: Online mode configuration
backend_params = {
    "max_requests_per_minute": 2000,
    "max_tokens_per_minute": 4_000_000,
    "seconds_to_pause_on_rate_limit": 15.0
}
```

#### Batch Processing Parameters

Parameters available when `batch=True`:

| Parameter                       | Type    | Description                                                                                  |
| ------------------------------- | ------- | -------------------------------------------------------------------------------------------- |
| `batch_size`                    | `int`   | Number of prompts to process in each batch                                                   |
| `batch_check_interval`          | `float` | Time interval between batch completion checks                                                |
| `delete_successful_batch_files` | `bool`  | Whether to delete successful batch files                                                     |
| `delete_failed_batch_files`     | `bool`  | Whether to delete failed batch files                                                         |
| `completion_window`             | `str`   | <p>Time window to wait for batch completion. </p><p>Note: only valid for some providers.</p> |

```python
# Example: Batch processing configuration
backend_params = {
    "batch_size": 100,
    "batch_check_interval": 1.0,
    "delete_successful_batch_files": True,
    "delete_failed_batch_files": False,
}
```

#### Offline Mode Parameters (VLLM)

Parameters for local model deployment with VLLM:

| Parameter                | Type    | Description                                |
| ------------------------ | ------- | ------------------------------------------ |
| `tensor_parallel_size`   | `int`   | Number of GPUs for tensor parallelism      |
| `enforce_eager`          | `bool`  | Whether to enforce eager execution         |
| `max_model_length`       | `int`   | Maximum sequence length for the model      |
| `max_tokens`             | `int`   | Maximum tokens for generation              |
| `min_tokens`             | `int`   | Minimum tokens for generation              |
| `gpu_memory_utilization` | `float` | Target GPU memory utilization (0.0 to 1.0) |
| `batch_size`             | `int`   | Batch size for VLLM processing             |

```python
# Example: VLLM configuration
backend_params = {
    "tensor_parallel_size": 2,
    "max_model_length": 4096,
    "max_tokens": 2048,
    "min_tokens": 1,
    "gpu_memory_utilization": 0.85,
    "batch_size": 32
}
```

### Methods

### prompt()

```python
def prompt(self, input: _DictOrBaseModel) -> _DictOrBaseModel
```

Generates a prompt for the LLM based on the input data.

**Parameters**

* `input`: Input row used to construct the prompt

**Returns**

A prompt that can be either:

1. A string for a single user prompt
2. A list of dictionaries for multiple messages

**Example**

```python
def prompt(self, input: dict) -> str:
    return f"Generate a {input['type']} about {input['topic']}"
```

### parse()

```python
def parse(self, input: _DictOrBaseModel, response: _DictOrBaseModel) -> _DictOrBaseModel
```

Processes the LLM's response and optionally can be used to combine it with the input data.

**Parameters**

* `input`: Original input row used for the prompt
* `response`: Raw response from the LLM

**Returns**

A parsed output combining the input and response data

**Example**

```python
def parse(self, input: dict, response: str) -> dict:
    return {
        "prompt_topic": input["topic"],
        "generated_text": response,
        "timestamp": datetime.now().isoformat()
    }
```

### Returns

A `CuratorResponse` object which consists statistics about token usage, performance and cost along with the dataset and viewer link.

#### `CuratorResponse`

#### Attributes

#### **Core Data**

* dataset (Dataset): The curated dataset
* cache\_dir (Optional\[str]): Directory for caching results
* failed\_requests\_path (Optional\[Path]): Path to file containing failed requests
* viewer\_url (Optional\[str]): URL for Curator Viewer
* batch\_mode (bool): Whether the processing was done in batch mode

#### Model Information

* model\_name (str): Name of the LLM model used
* max\_requests\_per\_minute (int | None): Rate limit for requests per minute
* max\_tokens\_per\_minute (int | None): Rate limit for tokens per minute

#### Statistics

* token\_usage (TokenUsage): Statistics about token usage
* cost\_info (CostInfo): Information about processing costs
* request\_stats (RequestStats): Statistics about request processing
* performance\_stats (PerformanceStats): Performance metrics
* metadata (Dict\[str, Any]): Additional metadata

### Response Format (Optional)

The `response_format` class attribute can be set to a Pydantic model to enforce structured output:

```python
from pydantic import BaseModel

class RecipeResponse(BaseModel):
    title: str
    ingredients: List[str]
    instructions: List[str]

class RecipeGenerator(LLM):
    response_format = RecipeResponse
```

### Usage Examples

#### Basic Usage

```python

class Cuisines(BaseModel):
    """A list of cuisines."""

    cuisines_list: List[str] = Field(description="A list of cuisines.")


class CuisineGenerator(curator.LLM):
    """A cuisine generator that generates diverse cuisines."""

    response_format = Cuisines

    def prompt(self, input: dict) -> str:
        """Generate a prompt for the cuisine generator."""
        return "Generate 10 diverse cuisines."

    def parse(self, input: dict, response: Cuisines) -> dict:
        """Parse the model response along with the input to the model into the desired output format.."""
        return [{"cuisine": t} for t in response.cuisines_list]

```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bespokelabs.ai/bespoke-curator/api-reference.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
