The rise of Artificial Intelligence (AI) has become omnipresent in recent years. In reality we see that only a small percentage of models makes it to production and stays so. In a series of blog posts on MLOps, we explain why and how companies can adopt MLOps practices to unlock the business value of AI.
At ML6, we see MLOps as an indispensable aspect to unlock the true business value of machine learning models. That’s why MLOps best practices are strongly embedded in our ways of working. When we deliver customer machine learning solutions for our clients, MLOps is considered a de facto component of the overall solution.
Thanks to the wide range of machine learning projects we work on at ML6, we have been able to test and learn what generally works and doesn’t work when it comes to MLOps. As we often get questions about our MLOps best practices and the tech stack we use, we gladly give a short overview of how we apply MLOps at ML6 below.
In ML6 fashion, you can find six of our MLOps best practices below:
These are some best practices that have proven to work for us so far. For more information about these best practices, we gladly point you to this post on Medium by our MLOps expert Sven Degroote. If you follow us on social media, you can also keep an eye out for our “ML in Production Tips”.
We want to emphasize that it is especially important to foster a mindset of continuous learning and to adapt where necessary. For us, companies like Google and Spotify, which rely heavily on machine learning, continue to be a great source of inspiration. Our ML in production chapter researches how to deploy machine learning models into a production environment. This includes technical areas such as model serving and automation, but also focuses heavily on best practices for ML development such as MLOps. This is how we track, evaluate and test new best practices and recommendations.
As you might have noticed, there is a lot of tooling out there today to support MLOps practices and it is easy to get lost.
At ML6, we use TensorFlow Extended (TFX) and Kubeflow Pipelines as the main tools to drive the MLOps aspects on our projects. Both are open-sourced, backed by Google and Spotify, and have a large community. The open-source aspect is very important as it allows us to easily port our code between different infrastructure settings (multi-cloud, on-premise) and thus avoid vendor lock-in. Moreover, there is transparency on what exactly is happening underneath as the code is exposed, and we can alter it if needed.
Kubeflow pipelines is in essence another orchestrator, similar to Apache Airflow for example, and its main purpose is to run the designed ML workflows. The differentiator for Kubeflow Pipelines is that it is strongly tailored to machine learning and data science workloads. The tool makes it easier for ML engineers to efficiently get their ML workflows running.
The design of machine learning pipelines has been a significant engineering challenge in the last few years, and it should come as no surprise that many major Silicon Valley companies have developed their own pipeline frameworks. Since the frameworks originated from corporations, they were designed with specific engineering stacks in mind. TFX is no different in this regard. At this point, TFX architectures and data structures assume that you are using TensorFlow (or Keras) as your machine learning framework. Some TFX components can be used in combination with other machine learning frameworks. For example, data can be analyzed with TensorFlow Data Validation and later consumed by a scikit-learn model. However, the TFX framework is closely tied to TensorFlow or Keras models. Since TFX is backed by the TensorFlow community and more companies like Spotify are adopting TFX, we believe it is a stable and mature framework that will ultimately be adopted by a broader base of machine learning engineers.
We implement open-source in production by adding just enough infrastructure as code, internally developed examples/templates and documented best practices. This increases efficiency and greatly reduces the risk of misconfiguration and vendor lock-in. Our tech stack also includes Weights & Biases. To learn more about how we use these tools you can read the following blogposts on Medium.
There are many alternatives of course. And remember, MLOps tooling is just a means to help apply MLOps best practices.
Every organization is unique and needs to find its own optimal way of working. In our projects, we carefully give thought to how we can apply our learnings and best practices to our clients’ organizations and the way we collaborate on the project.
We work together with internal Data Science and DevOps teams to align on DevSecOps/DataOps/MLOps best practices and we configure the tools to monitor the MLOps pipeline and infrastructure. By working closely together on a tangible project, we help teams embrace MLOps. The MLOps building blocks, based on Kubeflow Pipelines and Tensorflow Extended, that we use can be reused in other ML projects and continue to serve as best practices to teams after our project is finished. This way we ensure optimal knowledge sharing and help upskill client teams.
This post is part of a series of blog posts on the topic of MLOps. In this series, we explain why and how companies can adopt MLOps practices to unlock the business value of AI. Find the other content here.