Bespoke Labs
  • Welcome
  • BESPOKE CURATOR
    • Getting Started
      • Quick Tour
      • Key Concepts
      • Visualize your dataset with the Bespoke Curator Viewer
      • Automatic recovery and caching
      • Structured Output
    • Save $$$ on LLM inference
      • Using OpenAI for batch inference
      • Using Anthropic for batch inference
      • Using Gemini for batch inference
      • Using Mistral for batch inference
      • Using kluster.ai for batch inference
    • How-to Guides
      • Using vLLM with Curator
      • Using Ollama with Curator
      • Using LiteLLM with curator
      • Handling Multimodal Data in Curator
      • Executing LLM-generated code
      • Using HuggingFace inference providers with Curator
    • Data Curation Recipes
      • Generating a diverse QA dataset
      • Using SimpleStrat block for generating diverse data
      • Curate Reasoning data with Claude-3.7 Sonnet
      • Synthetic Data for function calling
    • Finetuning Examples
      • Aspect based sentiment analysis
      • Finetuning a model to identify features of a product
    • API Reference
  • Models
    • Bespoke MiniCheck
      • Self-Hosting
      • Integrations
      • API Service
    • Bespoke MiniChart
    • OpenThinker
Powered by GitBook
On this page
  • Prerequisites
  • Steps
  • Example Output
  • Api Reference
  1. BESPOKE CURATOR
  2. Data Curation Recipes

Curate Reasoning data with Claude-3.7 Sonnet

You can use Sonnet reasoning model in Curator to generate synthetic data. In this example, we will answer some questions with reasoning traces from claude sonnet 3.7, but the approach can be adapted for any data generation task.

Prerequisites

  • Python 3.10+

  • Curator: Install via pip install bespokelabs-curator

  • Anthropic: Anthropic API key

Steps

1. Setup environment vars

export ANTHROPIC_API_KEY=<your_api_key>

2. Create a curator.LLM subclass

Create a class that inherits from curator.LLM. Implement two key methods:

  • prompt(): Generates the prompt for the LLM.

  • parse(): Processes the LLM's response into your desired format.

Here’s the implementation:

"""Example of reasoning on simple questions using curator."""

import os
from datasets import load_dataset
from bespokelabs import curator

class Reasoner(curator.LLM):
    return_completions_object = True

    def prompt(self, input):
        return input["question"]

    def parse(self, input, response):
        """Parse the LLM response to extract reasoning and solution."""
        content = response["content"]
        thinking = ""
        text = ""
        for content_block in content:
            if content_block["type"] == "thinking":
                thinking = content_block["thinking"]
            elif content_block["type"] == "text":
                text = content_block["text"]
            elif content_block["type"] == "redacted_thinking":
                print("Redacted thinking block! (notifying you for fun)")

        input["claude_thinking_trajectory"] = thinking
        input["claude_attempt"] = text
        return input

3. Configure the Anthropic model

llm = Reasoner(
    model_name="claude-3-7-sonnet-20250219",
    generation_params={"max_tokens": 20000, "thinking": {"type": "enabled", "budget_tokens": 18000}},
    batch=False,
    backend="anthropic",
    backend_params={"require_all_responses": False},
)

4. Generate Data

Generate the structured data and output the results as a pandas DataFrame:

ds = llm([
    {"question": "How to solve for world peace?"},
    {"question": "What is the fifteenth prime number?"},
])
print(ds)
print(ds[0])

Example Output

Using the above example, the output might look like this:

question
claude_thinking_trajectory
claude_attempt

How to solve for world peace?

This is a question about solving for world pea...

The Path to World Peace\n\nWorld peace is on...

What is the fifteenth prime number?

Let me list out the prime numbers in order to ...

The fifteenth prime number is 47.\n\nThe seque...

Api Reference

PreviousUsing SimpleStrat block for generating diverse dataNextSynthetic Data for function calling

Last updated 2 months ago

Check out complete

configuration