Bespoke Labs
  • Welcome
  • BESPOKE CURATOR
    • Getting Started
      • Quick Tour
      • Key Concepts
      • Visualize your dataset with the Bespoke Curator Viewer
      • Automatic recovery and caching
      • Structured Output
    • Save $$$ on LLM inference
      • Using OpenAI for batch inference
      • Using Anthropic for batch inference
      • Using Gemini for batch inference
      • Using Mistral for batch inference
      • Using kluster.ai for batch inference
    • How-to Guides
      • Using vLLM with Curator
      • Using Ollama with Curator
      • Using LiteLLM with curator
      • Handling Multimodal Data in Curator
      • Executing LLM-generated code
      • Using HuggingFace inference providers with Curator
    • Data Curation Recipes
      • Generating a diverse QA dataset
      • Using SimpleStrat block for generating diverse data
      • Curate Reasoning data with Claude-3.7 Sonnet
      • Synthetic Data for function calling
    • Finetuning Examples
      • Aspect based sentiment analysis
      • Finetuning a model to identify features of a product
    • API Reference
  • Models
    • Bespoke MiniCheck
      • Self-Hosting
      • Integrations
      • API Service
    • Bespoke MiniChart
    • OpenThinker
Powered by GitBook
On this page
  • Using batch mode
  • Supported Models
  1. BESPOKE CURATOR

Save $$$ on LLM inference

PreviousStructured OutputNextUsing OpenAI for batch inference

Last updated 3 months ago

Providers like OpenAI and Anthropic offer batch mode, which allows you to upload a bunch of prompts to be processed asynchronously, for lower costs (typically 50%). However, these APIs are often very cumbersome to manage:

  • You typically have to prepare your batch file, upload it, and poll for responses periodically.

  • Large datasets will typically not fit in a single batch due to batch size limits, and so you will need to split your dataset into mutiple smaller batches, increasing the complexity you need to manage.

With Curator, you only need to toggle a single flag to save $$$, without any headache!

Using batch mode

Let's look at a simple example of reannotating instructions from the dataset with new responses from gpt-4o-mini.

First, we need to load the WildChat dataset using HuggingFace:

from datasets import load_dataset

dataset = load_dataset("allenai/WildChat", split="train")
dataset = dataset.select(range(3_000))  # Select a subset of 3,000 samples

We then create a new LLM class and apply to dataset. All you need to do to enable batching is setting batch=True when initializing your LLM object, and you're done!

from bespokelabs import curator

class WildChatReannotator(curator.LLM):
    """A reannotator for the WildChat dataset."""

    def prompt(self, input: dict) -> str:
        """Extract the first message from a conversation to use as the prompt."""
        return input["conversation"][0]["content"]

    def parse(self, input: dict, response: str) -> dict:
        """Parse the model response along with the input to the model into the desired output format."""
        instruction = input["conversation"][0]["content"]
        return {"instruction": instruction, "new_response": response}
        
# Initialize the reannotator with batch processing
reannotator = WildChatReannotator(
    model_name="gpt-4o-mini",
    batch=True,  # Enable batch processing
    backend_params={"batch_size": 1_000},  # Specify batch size
)

reannotated_dataset = reannotator(dataset)

Supported Models

Check out how-to guides for using batch mode with our supported providers:

  • Using OpenAI for batch inference

  • Using Anthropic for batch inference

  • Using Gemini for batch inference

  • Using kluster.ai for batch inference

Feel free to tell us which providers you want us to add support for, or send a PR if you want to !

WildChat
contribute