Batch Processing
Prerequisites
Steps
1. Load and Prepare the Dataset
from datasets import load_dataset
dataset = load_dataset("allenai/WildChat", split="train")
dataset = dataset.select(range(3_000)) # Select a subset of 3,000 samples2. Create a Curator.LLM Subclass
from bespokelabs import curator
class WildChatReannotator(curator.LLM):
"""A reannotator for the WildChat dataset."""
def prompt(self, input: dict) -> str:
"""Extract the first message from a conversation to use as the prompt."""
return input["conversation"][0]["content"]
def parse(self, input: dict, response: str) -> dict:
"""Parse the model response along with the input to the model into the desired output format."""
instruction = input["conversation"][0]["content"]
return {"instruction": instruction, "new_response": response}3. Configurer Batch Processing
4. Process the Dataset
5. Inspect the Results
Batch Processing Configuration:
1. Supported models
2. Batch Size
3. Batch Check Interval
4. Delete Successful Batch Files
5. Delete Failed Batch Files
Example Configuration
Last updated