Announcing DataChain
![](/static/46bac2c44c6d11914e456cdb734d58d2/ce93b/datachain.png)
Introducing DataChain - a new open-source tool to curate and process unstructured data using local ML models, and LLM calls. Read more.
DataChain Open-Source Release
A New Way to Manage your Unstructured Data
Data Version Control
– and much more –
for the GenAI era
Free and open source, forever.
Manage and version images, audio, video, and text files in storage and organize your ML modeling process into a reproducible workflow.
Explore and enrich annotated datasets with custom embeddings, auto-labeling, and bias removal at billion-file scale — without modifying your data.
Connect to versioned data sources and code with pipelines, track experiments, register models — all based on GitOps principles.
Get Started with
🔗DataChain and DVC: Better Together
Build the datasets you need without modifying your data sources. Create pipelines that connect your versioned datasets, code, and models together for effective experiment tracking the GitOps way.