Bespoke Labs
  • Welcome
  • BESPOKE CURATOR
    • Getting Started
      • Quick Tour
      • Key Concepts
      • Visualize your dataset with the Bespoke Curator Viewer
      • Automatic recovery and caching
      • Structured Output
    • Save $$$ on LLM inference
      • Using OpenAI for batch inference
      • Using Anthropic for batch inference
      • Using Gemini for batch inference
      • Using Mistral for batch inference
      • Using kluster.ai for batch inference
    • How-to Guides
      • Using vLLM with Curator
      • Using Ollama with Curator
      • Using LiteLLM with curator
      • Handling Multimodal Data in Curator
      • Executing LLM-generated code
      • Using HuggingFace inference providers with Curator
    • Data Curation Recipes
      • Generating a diverse QA dataset
      • Using SimpleStrat block for generating diverse data
      • Curate Reasoning data with Claude-3.7 Sonnet
      • Synthetic Data for function calling
    • Finetuning Examples
      • Aspect based sentiment analysis
      • Finetuning a model to identify features of a product
    • API Reference
  • Models
    • Bespoke MiniCheck
      • Self-Hosting
      • Integrations
      • API Service
    • Bespoke MiniChart
    • OpenThinker
Powered by GitBook
On this page
  1. BESPOKE CURATOR

Getting Started

PreviousWelcomeNextQuick Tour

Last updated 4 months ago

Bespoke Curator makes it easy to create synthetic data pipelines. Whether you are training a model or extracting structure, Curator will prepare high-quality data quickly and robustly.

  • Rich Python based library for generating and curating synthetic data.

  • Interactive viewer to monitor data while it is being generated

  • First class support for structured outputs

  • Built-in performance optimizations for asynchronous operations, caching, and fault recovery at every scale

  • Support for a wide range of inference options via LiteLLM, vLLM, and popular batch APIs

In addition, we are actively working on improving the library. Expect more changes to come in the future:

  1. Verifiers: filter outputs to improve your data quality with models like , or with code executors.

  2. MCTS: explore reasoning trajectories using Monte Carlo Tree Search.

  3. Data versioning: version your data along with the code that generates it.

  4. Diversity and data quality indicators: understand the quality of your data.

  5. Curator viewer: visualize and explore your generated data.

Next, let's take a Quick Tour of the Curator library!