Bespoke Labs
  • Welcome
  • BESPOKE CURATOR
    • Getting Started
      • Quick Tour
      • Key Concepts
      • Visualize your dataset with the Bespoke Curator Viewer
      • Automatic recovery and caching
      • Structured Output
    • Save $$$ on LLM inference
      • Using OpenAI for batch inference
      • Using Anthropic for batch inference
      • Using Gemini for batch inference
      • Using Mistral for batch inference
      • Using kluster.ai for batch inference
    • How-to Guides
      • Using vLLM with Curator
      • Using Ollama with Curator
      • Using LiteLLM with curator
      • Handling Multimodal Data in Curator
      • Executing LLM-generated code
      • Using HuggingFace inference providers with Curator
    • Data Curation Recipes
      • Generating a diverse QA dataset
      • Using SimpleStrat block for generating diverse data
      • Curate Reasoning data with Claude-3.7 Sonnet
      • Synthetic Data for function calling
    • Finetuning Examples
      • Aspect based sentiment analysis
      • Finetuning a model to identify features of a product
    • API Reference
  • Models
    • Bespoke MiniCheck
      • Self-Hosting
      • Integrations
      • API Service
    • Bespoke MiniChart
    • OpenThinker
Powered by GitBook
On this page
  1. BESPOKE CURATOR

Data Curation Recipes

PreviousUsing HuggingFace inference providers with CuratorNextGenerating a diverse QA dataset

Last updated 2 months ago

Here are some simple data curation recipes to get you started with generating synthetic data at scale using Curator:

  • Generating a diverse QA dataset

  • Curate Reasoning data with Claude-3.7 Sonnet

  • Using SimpleStrat block for generating diverse data

  • Using SimpleStrat block for generating diverse data

In addition to these examples, we also have the following larger examples in our github repo:

Task

Link(s)

Goal

Reasoning dataset generation (Bespoke Stratos)

Generate the Bespoke-Stratos-17k dataset, focusing on reasoning traces from math, coding, and problem-solving datasets.

Reasoning dataset generation (Open Thoughts)

Generate the Open-Thoughts-114k dataset, focusing on reasoning traces from math, coding, and problem-solving datasets.

3Blue1Brown video generation

Generate videos similar to 3Blue1Brown and render them using code execution.

Code
Code
Code