Mirantis AI Factory Reference Architecture provides a guideline for secure, composable, scalable, and sovereign platforms
CAMPBELL, Calif.–(BUSINESS WIRE)–#AI—Mirantis, the Kubernetes-native AI infrastructure company enabling enterprises to build and operate scalable, secure, and sovereign AI infrastructure across any environment, today announced the industry’s first comprehensive reference architecture for IT infrastructure to support AI workloads.
The Mirantis AI Factory Reference Architecture, built on Mirantis k0rdent AI, provides a secure, composable, scalable, and sovereign platform for building, operating, and optimizing AI and ML infrastructure at scale. It enables:
- AI workloads to be deployed within days of hardware installation using k0rdent AI’s templated, declarative model for rapid provisioning;
- Faster prototyping, iteration, and deployment of models and services to dramatically shorten the AI development lifecycle;
- Curated integrations (via the k0rdent Catalog) for AI/ML tools, observability, CI/CD, security, and more, which leverage open standards.
“We’ve built and shared the reference architecture to help enterprises and service providers efficiently deploy and manage large-scale multi-tenant sovereign infrastructure solutions for AI and ML workloads,” said Shaun O’Meara, chief technology officer, Mirantis. “This is in response to the significant increase in the need for specialized resources (GPU and CPU) to run AI models while providing a good user experience for developers and data scientists who don’t want to learn infrastructure.”
With the reference architecture, Mirantis addresses complex issues related to high-performance computing that include remote direct memory access (RDMA) networking, GPU allocation and slicing, sophisticated scheduling requirements, performance tuning, and Kubernetes scaling. The architecture can also integrate a choice of AI Platform Services, including Gcore Everywhere Inference and the NVIDIA AI Enterprise software ecosystem.
Cloud native workloads, which are typically designed for scale-out and multi-core operations, are quite different from AI workloads, that can require turning many GPU-based servers into one single supercomputer with aggregated memory that requires RDMA and ultra-high performance networking.
The reference architecture leverages Kubernetes and supports multiple AI workload types (training, fine-tuning, inference) across: dedicated or shared servers; virtualized environments (KubeVirt/OpenStack); public cloud or hybrid/multi-cloud; and edge locations. It addresses the novel challenges related to provisioning, configuration, and maintenance of AI infrastructure and supporting the unique needs of workloads, including high-performance storage, and ultra-high-speed networking (Ethernet, Infiniband, NVLink, NVSwitch, CXL) to keep up with AI data movement needs. They include:
- Fine-tuning and configuration, which typically take longer to implement and learn than traditional compute systems;
- Hard multi-tenancy for data security and isolation, resource allocation, and contention management;
- Data sovereignty of AI and ML workloads that are typically data-driven or contain unique intellectual property in their models, which makes it critical to control how and where this data is used;
- Compliance with regional and regulatory requirements;
- Managing scale and sprawl because the infrastructure used for AI and ML is typically comprised of a large number of compute systems that can be highly distributed for edge workloads;
- Resource sharing of GPUs and other vital compute resources that are scarce and expensive and thus must be shared effectively and/or leveraged wherever they are available;
- Skills availability because many AI and ML projects are run by data scientists or developers who are not specialists in IT infrastructure.
The Mirantis AI Factory Reference Architecture is designed to be composable so that users can assemble infrastructure from reusable templates across compute, storage, GPU, and networking layers tailored to their specific AI workload needs. It includes support for NVIDIA, AMD, and Intel AI accelerators.
Access the complete reference architecture document, along with more information.
About Mirantis
Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure, and sovereign infrastructure for modern AI, machine learning, and data-intensive applications. By combining open source innovation with deep expertise in Kubernetes orchestration, Mirantis empowers platform engineering teams to deliver composable, production-ready developer platforms across any environment – on-premises, in the cloud, at the edge, or in data centers. As enterprises navigate the growing complexity of AI-driven workloads, Mirantis delivers the automation, GPU orchestration, and policy-driven control needed to cost-effectively manage infrastructure with confidence and agility. Committed to open standards and freedom from lock-in, Mirantis ensures that customers retain full control of their infrastructure strategy.
Mirantis serves many of the world’s leading enterprises, including Adobe, Ericsson, Inmarsat, PayPal, and Societe Generale. Learn more at www.mirantis.com.
Contacts
Joseph Eckert for Mirantis
jeckert@eckertcomms.com