MLops at ML6
March 15, 2021

MLops at ML6

ML in Production

The rise of Artificial Intelligence (AI) has become omnipresent in recent years. In reality we see that only a small percentage of models makes it to production and stays so. In a series of blog posts on MLOps, we explain why and how companies can adopt MLOps practices to unlock the business value of AI.

At ML6, we see MLOps as an indispensable aspect to unlock the true business value of machine learning models.  That’s why MLOps best practices are strongly embedded in our ways of working. When we deliver customer machine learning solutions for our clients, MLOps is considered a de facto component of the overall solution.

Thanks to the wide range of machine learning projects we work on at ML6, we have been able to test and learn what generally works and doesn’t work when it comes to MLOps. As we often get questions about our MLOps best practices and the tech stack we use, we gladly give a short overview of how we apply MLOps at ML6 below. 

Our MLOps Best practices

In ML6 fashion, you can find six of our MLOps best practices below:

  1. In order to be able to build a range of simple to complex machine learning workflows, we modularize code into logical machine learning related steps.
  2. We containerize code after the experimentation phase. Notebooks are great to quickly experiment at an early stage, but after that initial exploratory phase the experiments that return a positive result should be industrialized by applying engineering best practices. 
  3. We version control data and models for reproducibility and audit purposes. 
  4. We work with mixed, autonomous teams to ensure models are properly managed throughout their entire life cycle. 
  5. We conduct peer reviews, peer reviews and more peer reviews to ensure consistent quality and to do checks on readability and common understanding of code among team members. Typically, we tag some team members that either need to be aware of the content of the pull request or that are most knowledgeable on the content related to the pull request.  Pull requests are an important way to communicate with team members on the progress of the feature or project.
  6. To avoid forgetting about security best practices during the development phase and when setting up automated machine learning workflows, we integrate our security best practices into GCCP (Google Cookiecutter Platform) to be secure by design. This way we make sure to think about potential vulnerabilities with regards to access and exposure, we follow the least privilege principle to safeguard unintended access to resources and to avoid public exposure for infrastructure.

These are some best practices that have proven to work for us so far. For more information about these best practices, we gladly point you to this post on Medium by our MLOps expert Sven Degroote. If you follow us on social media, you can also keep an eye out for our “ML in Production Tips”.

We want to emphasize that it is especially important to foster a mindset of continuous learning and to adapt where necessary. For us, companies like Google and Spotify, which rely heavily on machine learning, continue to be a great source of inspiration. Our ML in production chapter researches how to deploy machine learning models into a production environment. This includes technical areas such as model serving and automation, but also focuses heavily on best practices for ML development such as MLOps. This is how we track, evaluate and test new best practices and recommendations.

Our MLOps Tech Stack

As you might have noticed, there is a lot of tooling out there today to support MLOps practices and it is easy to get lost. 

At ML6, we use TensorFlow Extended (TFX) and Kubeflow Pipelines as the main tools to drive the MLOps aspects on our projects. Both are open-sourced, backed by Google and Spotify, and have a large community.  The open-source aspect is very important as it allows us to easily port our code between different infrastructure settings (multi-cloud, on-premise) and thus avoid vendor lock-in. Moreover, there is transparency on what exactly is happening underneath as the code is exposed, and we can alter it if needed.

Kubeflow pipelines is in essence another orchestrator, similar to Apache Airflow for example, and its main purpose is to run the designed ML workflows. The differentiator for Kubeflow Pipelines is that it is strongly tailored to machine learning and data science workloads. The tool makes it easier for ML engineers to efficiently get their ML workflows running.

The design of machine learning pipelines has been a significant engineering challenge in the last few years, and it should come as no surprise that many major Silicon Valley companies have developed their own pipeline frameworks. Since the frameworks originated from corporations, they were designed with specific engineering stacks in mind. TFX is no different in this regard. At this point, TFX architectures and data structures assume that you are using TensorFlow (or Keras) as your machine learning framework. Some TFX components can be used in combination with other machine learning frameworks. For example, data can be analyzed with TensorFlow Data Validation and later consumed by a scikit-learn model. However, the TFX framework is closely tied to TensorFlow or Keras models. Since TFX is backed by the TensorFlow community and more companies like Spotify are adopting TFX, we believe it is a stable and mature framework that will ultimately be adopted by a broader base of machine learning engineers.

We implement open-source in production by adding just enough infrastructure as code, internally developed  examples/templates and documented best practices. This increases efficiency and greatly reduces the risk of misconfiguration and vendor lock-in. Our tech stack also includes Weights & Biases.  To learn more about how we use these tools you can read the following blogposts  on Medium.

There are many alternatives of course. And remember, MLOps tooling is just a means to help apply MLOps best practices.

Innovating Together 

Every organization is unique and needs to find its own optimal way of working. In our projects, we carefully give thought to how we can apply our learnings and best practices to our clients’ organizations and the way we collaborate on the project. 

We work together with internal Data Science and DevOps teams to align on DevSecOps/DataOps/MLOps best practices and we configure the tools to monitor the MLOps pipeline and infrastructure. By working closely together on a tangible project, we help teams embrace MLOps. The MLOps building blocks, based on Kubeflow Pipelines and Tensorflow Extended, that we use can be reused in other ML projects and continue to serve as best practices to teams after our project is finished. This way we ensure optimal knowledge sharing and help upskill client teams. 

More questions on MLOPs?

Join our free Q&A Session - Register Here

This post is  part of a series of blog posts on the topic of MLOps. In this series, we explain why and how companies can adopt MLOps practices to unlock the business value of AI. Find the other content here.

Related posts

Want to learn more?

Let’s have a chat.
Contact us
No items found.