#48 Paris Women in Machine Learning and Data Science X MLOps: Lessons learned for ML in production

4 min readJul 3, 2024

Edouard, Maya, Marie, Catalina, Sofia, Anaël, Florent

Our last meetup on June 12th was generously hosted by The Fork in the heart of Paris. This event was particularly special as it was a collaborative effort with our friends from the MLOps Meetup. We were thrilled to have two talented female speakers share their insights on Machine Learning in Production.

We began the evening with a talk by Anaël Beaugnon, a Full Stack Data Scientist at Alan. Anaël discussed her journey from earning a PhD in Machine Learning at ANSSI to deploying robust models in production at Roche and more recently at Alan. Despite gaining essential foundational skills during her PhD, transitioning to production posed significant challenges.

At Roche, a global pharmaceutical company, Anaël deployed Machine Learning models across more than 80 countries. As a Data Scientist and later an OPS Tech Lead, she focused on enhancing ML pipeline robustness and fostering collaboration between Software Engineers and Data Scientists. She shared two key lessons on deploying Machine Learning models in production:

How to easily iterate on a model already deployed

Anaël emphasized the importance of an automated evaluation framework for offline testing to quickly assess multiple candidates. She also highlighted the need for a real-world impact analysis, including manual checks to compare current and candidate solutions before deployment. Maintaining separate production and benchmarking code is crucial for stability while allowing efficient candidate testing.

Full Stack Data Scientists: Key to efficiently delivering great data products

Anaël advocated for reducing silos by employing full-stack data scientists who deploy and maintain their data pipelines, owning the entire process from development to production. This approach ensures mastery of pipelines in production and facilitates easier deployment of improvements. It addresses common issues where data scientists, detached from production deployments, struggle to troubleshoot and iterate on their models effectively.

If you want to know more, check out her slides below:

From Machine Learning Scientist to Full Stack Data Scientist: Lessons learned for ML in production…

From Machine Learning Scientist to Full Stack Data Scientist: Lessons learned for ML in production by Anaël Beaugnon …

www.slideshare.net

Then Sofia Calcagno, an ML Engineer at Octo Technology, delved into the intricacies of CI/CD in the Machine Learning domain. As one of the two main authors of the book “Culture MLOps,” Sofia shared with us insights into the unique challenges faced in ML CI/CD.

CI/CD in ML is more complex than traditional CI/CD due to the introduction of new artifacts and sources of change. Sofia highlighted three critical areas that vary depending on the project stage: where to train models, where to store model versions, and when to load the desired model version.

Prototyping or Demonstration Phase

During the prototyping phase, Sofia focuses on quickly demonstrating the potential of ML to marketing teams by predicting product appeal. The main challenges here are speed and cost-efficiency, as the prototype may not proceed to production. Models are trained within the inference API, versions are stored in memory during inference, and models are loaded from disk during inference.

Developing a Product

As the project moves into product development, the aim is to serve initial models to users iteratively. This phase involves measuring product value and being able to pivot quickly based on user feedback. Models are trained on the Data Scientist’s computer, and if the model is smaller than 100MB, it is stored in Git. The model and code are deployed together as a single artifact using a Dockerfile.

Scaling Phase

In the scaling phase, the focus shifts to serving a large number of users and updating models quickly. Maintaining robust production, managing model drift, and enabling on-demand deployments are key challenges. Training occurs on a dedicated production service, and models are stored in dedicated storage or artifact repositories like Azure Blob Storage or specialised solutions like Hugging Face and MLflow. Model versions are updated during inference when changes occur or at each start of the service.

For more details, check out her slides below:

CI CD in the age of machine learning by Sofia Calcagno

CI CD in the age of machine learning by Sofia Calcagno - Download as a PDF or view online for free

www.slideshare.net

An exciting event is being planned for October. Stay tuned for more updates!

If you do not want to miss our events, you can:

🔗 follow us on Twitter, Meetup, and LinkedIn

📑 check our Google spreadsheet if you want to speak 📣, host 💙, help 🌠

📍join our Slack channel for information, discussions, and opportunities

📩 send an email to the Paris WiMLDS team to paris@wimlds.org

🎬 subscribe to our WiMLDS Paris medium page and Youtube channel

📸 follow the global WiMLDS on Instagram, LinkedIn, and Facebook

🔥 share your company or lab’s job offers for free on the global WiMLDS’ website.

#48 Paris Women in Machine Learning and Data Science X MLOps: Lessons learned for ML in production

From Machine Learning Scientist to Full Stack Data Scientist: Lessons learned for ML in production…

From Machine Learning Scientist to Full Stack Data Scientist: Lessons learned for ML in production by Anaël Beaugnon …

CI CD in the age of machine learning by Sofia Calcagno

CI CD in the age of machine learning by Sofia Calcagno - Download as a PDF or view online for free

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by WiMLDS Paris

No responses yet