Getting Started

Bespoke Curator makes it easy to create synthetic data pipelines. Whether you are training a model or extracting structure, Curator will prepare high-quality data quickly and robustly.

  • Rich Python based library for generating and curating synthetic data.

  • Interactive viewer to monitor data while it is being generated

  • First class support for structured outputs

  • Built-in performance optimizations for asynchronous operations, caching, and fault recovery at every scale

  • Support for a wide range of inference options via LiteLLM, vLLM, and popular batch APIs

In addition, we are actively working on improving the library. Expect more changes to come in the future:

  1. Verifiers: filter outputs to improve your data quality with models like , or with code executors.

  2. MCTS: explore reasoning trajectories using Monte Carlo Tree Search.

  3. Data versioning: version your data along with the code that generates it.

  4. Diversity and data quality indicators: understand the quality of your data.

  5. Curator viewer: visualize and explore your generated data.

Next, let's take a Quick Tour of the Curator library!

Last updated