Starting from a small AI/ML experiment or a proof of concept, all the way down to the production-grade system, a machine learning solution lifecycle and infrastructure cover much broader space than just an ML model code. It often consists of multiple stages, workflows and many different building blocks and components.
Maintaining multiple models in production requires well-established processes and workflows. In the same time, machine learning introduces fundamentally new challenges for traditional SDLC and CI/CD lifecycles due to its “data-driven” nature that defines the behavior of the system:
- ML-specific operations that depend on the data (i.e. data versioning, feature extraction, model training, evaluation, tuning and serving)
- Complex landscape of various ML tools, libraries, frameworks, platforms and hardware accelerators
- Large scale ML workloads often involve multiple data sources of different kinds and ownership
- Success depends on cooperation of multiple teams and stakeholders with poorly separated responsibilities and different workflow speeds
- Production ML deployments require constant monitoring, quality control and ability to debug and interpret critical issues
In this session, SoftServe will share design recommendations and CI/CD best practices in building large-scale AI and Machine Learning systems using open source cloud-native solutions to address nowadays business and technical challenges and bridge the gap between data, science, IT, business stakeholders and end-users.