Getting Started
Last updated
Last updated
Bespoke Curator makes it easy to create synthetic data pipelines. Whether you are training a model or extracting structure, Curator will prepare high-quality data quickly and robustly.
Rich Python based library for generating and curating synthetic data.
Interactive viewer to monitor data while it is being generated
First class support for structured outputs
Built-in performance optimizations for asynchronous operations, caching, and fault recovery at every scale
Support for a wide range of inference options via LiteLLM, vLLM, and popular batch APIs
In addition, we are actively working on improving the library. Expect more changes to come in the future:
Verifiers: filter outputs to improve your data quality with models like Bespoke-MiniCheck, or with code executors.
MCTS: explore reasoning trajectories using Monte Carlo Tree Search.
Data versioning: version your data along with the code that generates it.
Diversity and data quality indicators: understand the quality of your data.
Curator viewer: visualize and explore your generated data.
Next, let's take a Quick Tour of the Curator library!