Data Curation Recipes
Last updated
Last updated
Here are some simple data curation recipes to get you started with generating synthetic data at scale using Curator:
In addition to these examples, we also have the following larger examples in our github repo:
Task
Link(s)
Goal
Reasoning dataset generation (Bespoke Stratos)
Generate the Bespoke-Stratos-17k dataset, focusing on reasoning traces from math, coding, and problem-solving datasets.
Reasoning dataset generation (Open Thoughts)
Generate the Open-Thoughts-114k dataset, focusing on reasoning traces from math, coding, and problem-solving datasets.
3Blue1Brown video generation
Generate videos similar to 3Blue1Brown and render them using code execution.