Skip to main content

Infrastructure Design


Compute architecture

The infrastructure is built on a containerized orchestration platform (e.g., Kubernetes) with workload isolation:

Training Nodes: High-memory or GPU-enabled nodes for offline training.

Inference Nodes: CPU-optimized nodes for low-latency prediction.

System Nodes: Run control-plane services such as orchestration and monitoring.

Storage Architecture

Object Storage: Stores raw data, processed datasets, and model artifacts.

Feature Store:

Offline store for training

Online store for real-time inference

Metadata Store: Tracks experiments, pipelines, and lineage.

Networking and Security

Internal service communication is restricted using network policies.

External access is routed through an API gateway.

All traffic is encrypted using TLS.

Secrets are stored in a centralized secrets manager.