August'21 Heartbeat

This month you will find:

  • πŸ§‘πŸ½β€πŸ’» Data-centric for the win,
  • 🧐 Comparison of DVC, MLFlow and Metaflow,
  • πŸ›  Tutorials and Tool Stacks,
  • πŸ“ˆ DVC + Streamlit = ❀️,
  • πŸ“– Doc Updates,
  • πŸŽ₯ July Meetup Video available,
  • πŸš€ and more!
  • Jeny De Figueiredo
  • August 17, 2021 β€’ 9 min read

It's all about that Data!

Data! Data! Data!

From the Community

This month we are seeing the progression of a couple of pieces from the June Heartbeat as well as checking out a use case, tool stack, and some great tutorials of our Community members.

LJ Miranda synthesizes the MLOps space once again!

LJ Miranda writes another amazing article after the series of articles he wrote covering the MLOps tools landscape we covered in the June Heartbeat. This time he focuses on the wave of data-centric focus taking over the space giving a review of the methods, approaches, and techniques to ensure quality data for ML projects. If the adroit summaries of complex concepts doesn't thrill you, the links to no less than 63 (😱) resources will get you on your way to data-centric nirvana.

Data Centric Framework LJ Miranda's Framework for putting data-centric machine learning into context Source link

Neda Sultova's Comparison of DVC, MLFlow and Metaflow

Also covered in the June Hearbeat was Neda Sultova's piece on the rubric she is using to decide on the what MLOps tools to use for the teams at Helmholtz AI. This next article reviews her research into DVC, MLFlow and Metaflow and offers a thorough analysis of the tools across multiple dimensions. Beyond the article, check out her MLOps Comparison repository as well as her Comparison Table. They will not disappoint!

Machine Learning Lifecycle Machine Learning Lifecycle Source link

Amit Kulkarni's Tutorials

Writing for the Analytics Vidhya Data Science Blogathon, Amit Kulkarni created two tutorials on DVC.
Tracking ML Experiments with Data Version Control reviews DVC and takes you through getting started, setup, fetching data and pre-processing, and the steps of an ML project. Next it sets up DVC, the pipeline, and shows how to run model metrics and plots. In MLOps| Versioning with Git & DVC, Amit continues with an explanation how data and model versioning works with Github paired with DVC.

In a previous article entitled Bring DevOps to Data Science with MLOps Amit walks through a tutorial using CML to bring CI/CD functionality to your ML project and automate the process. All great posts to check out!πŸ‘‡πŸΌ

Tracking ML Experiments With Data Version Control

Amit Kulkarni's tutorial on getting started with DVC and tracking eperiments
Tracking ML Experiments With Data Version Control

MLOps | Versioning Datasets with Git & DVC

Amit Kulkarni's tutorial on how to DVC works with Git to version your datasets.
MLOps | Versioning Datasets with Git & DVC

Bring DevOps To Data Science With MLOps

Amit Kulkarni's tutorial on how to use CML to bring the CI/CD functionality of DevOps to your data science projects.
Bring DevOps To Data Science With MLOps

Andreas Malekos' MLOps Tool Stack at Continuum Industries

Last but not least, we bring you a great article from Andreas Malekos, Chief Scientist at Continuum Industries. In the post he outlines the tool stack and MLOps platform they use to do their work automating and optimizing the design of linear infrastructure assets like water pipelines, overhead transmission lines, subsea power lines, or telecommunication cables.

Amongst their tool stack are DVC and CML, and the article outlines what they like (!πŸ™ˆSpoiler alertπŸ™Š! DVC making repeatability achievable) and the things that they don't like that still need to be improved.

Continuum Industries MLOps Tool Stack Continuum Industries MLOps Tool Stack Source link

DVC News

Though the team has been taking some vacation time in the last month, there's still a lot going on!

Typing Cat

Docs Updates

This month we are introducing docs updates so that you will always be aware of what has changed as our open source projects mature.

Our docs team made up of Jorge Orpinel, Emre Şahin, Casper de la Costa, and David de la Iglesia-Castro, has been hard at work updating our docs to make sure you have what you need to be successful using our tools! Updates include:

Batuhan Taskaya's Refactor Project hits First Page in HackerNews!

A Refactor Project created by team Member Batuhan Taskaya (AKA @isidentical), was shared by someone on HackerNews and made it to the main page! You can catch all the comments here!

Explanation of the project:

refactor is an end-to-end refactoring framework that is built on top of the 'simple but effective refactorings' assumption. It is much easier to write a simple script with it rather than trying to figure out what sort of a regex you need in order to replace a pattern (if it is even matchable with regexes).

Every refactoring rule offers a single entrypoint, match(), where they accept an AST node (from the ast module in the standard library) and respond with either returning an action to refactor or nothing. If the rule succeeds on the input, then the returned action will build a replacement node and refactor will simply replace the code segment that belong to the input with the new version.

Way to go Batuhan! πŸš€

July Office Hour Meetup

If you missed our July Office Hours, good news! It's now available on our YouTube Channel and you can see JoΓ£o Santiago shares about {dvthis}, and how his team at Billie.io uses DVC to productionize rstats.

Also in the Meetup is a DVC Studio demo by Tapa Dipti Situala, Senior Product Engineer for Studio. You can catch the presentations along with great questions and discussion from the Community!

Next Meetup

So remember when I told you last month about DVC + Streamlit = ❀️ ? Well at our August Office Hours Meetup, Antoine Toubhans of Sicara will be presenting his tutorial on how to do just that! Join us in the integrating fun on August 19th at 3:00 pm UTC! RSVP at this link below! πŸ‘‡πŸΌ

DVC Office Hours - DVC and Streamlit Integration

Antoine Toubhans of Sicara shares his tutorial for using Streamlit with DVC to create a customizable web UI
DVC Office Hours - DVC and Streamlit Integration

Learning Opportunities

This week's DVC Learn Meetup (August 18th) will be the last in our series of DVC Learn Meetups designed to get teams up and running with DVC. We will digest our learnings from this first cohort and revamp for the next set of three classes that will begin in September. Subscribe to our Meetup group and and follow us in Twitter and LinkedIn to stay in the know about all of our upcoming events!

If you are interested in weighing in on what kinds of educational content you would like to see from us, we'd be grateful if you'd fill out this survey to help us plan! πŸ™πŸΌ

DVC Online Course survey Help us plan our Online Course! πŸ™πŸΌ Source link)

Open Positions

Looking for a great opportunity at an amazing company? Check out our open postions at this link to find details of all the positions including:

  • Senior Front-End Engineer (TypeScript, Node, React)
  • Senior Software Engineer (ML, Dev Tools, Python)
  • Senior Software Engineer (ML, Data Infra, GoLang)
  • Machine Learning Engineer/Field Data Scientist
  • Developer Advocate (ML)
  • Director/VP of Engineering (ML, DevTools)
  • Director/VP of Product (ML, Data Infra, SaaS)
  • Director/VP of Operations/Chief of Staff

Please pass this info on to anyone you know that may fit the bill. We look forward to new team members! πŸŽ‰

Tweet Love ❀️


Do you have any use case questions or need support? Join us in Discord!

Head to the DVC Forum to discuss your ideas and best practices.

Subscribe for updates. We won't spam you.