Dive into ML Infrastructure with Kubernetes: Your First Concrete Step Today
ML Infrastructure Kubernetes: The Essentials in One Article — Real Code, Diagrams and Concrete Steps, Excerpts from a 41-Lesson Course.
The best way to learn ML Infrastructure Kubernetes is by doing. This article gives you a head start with practical excerpts from a 41-lesson course — enough to get your first result today.
- Install the Kubernetes environment
- Discover Kubernetes
- Essential Kubernetes Objects
- YAML Files and Configuration
- Deploy an ML API with Flask
Final Project – Complete Step-by-Step Guide
Guide • 5 parts • Backend • Frontend • Helm • Monitoring • CI/CD
Part 1: Backend API (FastAPI + ML Model)
1.1 Initialize the project
mkdir -p ml-prediction-platform/{backend/{app,train,tests},frontend,helm,k8s/{security,monitoring},.github/workflows,docs}
cd ml-prediction-platform
git init1.2 Train the model
# backend/train/train_model.py
import joblib
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.2, random_state=42
)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
accuracy = accuracy_score(y_test, model.predict(X_test))
print(f"Accuracy: {accuracy:.4f}")
model_info = {
"model": model,
"feature_names": list(iris.feature_names),
"target_names": list(iris.target_names),
"accuracy": accuracy,
"version": "1.0.0"
}
joblib.dump(model_info, "model.pkl")cd backend/train pip install scikit-learn joblib numpy python train_model.py
1.3 Create the Pydantic schemas
# backend/app/schemas.py
from pydantic import BaseModel, Field
from typing import List
class PredictionRequest(BaseModel):
features: List[float] = Field(..., min_length=4, max_length=4)
class Config:
json_schema_extra = {"example": {"features": [5.1, 3.5, 1.4, 0.2]}}
class PredictionResponse(BaseModel):
prediction: str
prediction_id: int
confidence: float
probabilities: dict
model_version: str
class HealthResponse(BaseModel):
status: str
model_loaded: bool
version: str1.4 Create the model loading module
# backend/app/model.py
import os
import joblib
import numpy as np
import logging
logger = logging.getLogger(__name__)
class MLModel:
def __init__(self):
self.model = None
self.feature_names = None
self.target_names = None
self.version = None
self.loaded = False
def load(self, path: str = None):
path = path or os.getenv("MODEL_PATH", "/models/model.pkl")
info = joblib.load(path)
self.model = info["model"]
self.feature_names = info["feature_names"]
self.target_names = info["target_names"]
self.version = info["version"]
self.loaded = True
logger.info(f"Model v{self.version} loaded")
def predict(self, features: list) -> dict:
X = np.array(features).reshape(1, -1)
pred = self.model.predict(X)[0]
proba = self.model.predict_proba(X)[0]
return {
"prediction": self.target_names[pred],
"prediction_id": int(pred),
"confidence": float(max(proba)),
"probabilities": {
n: float(p) for n, p in zip(self.target_names, proba)
}
}
ml_model = MLModel()1.5 Create the FastAPI application
# backend/app/main.py
import os, time, logging
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from prometheus_client import Counter, Histogram, generate_latest
from starlette.responses import Response
from .model import ml_model
from .schemas import PredictionRequest, PredictionResponse, HealthResponse
logging.basicConfig(level=os.getenv("LOG_LEVEL", "INFO"))
app = FastAPI(title="ML Prediction API", version="1.0.0")
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_methods=["*"], allow_headers=["*"])
PREDICTIONS = Counter("predictions_total", "Total predictions", ["status"])
LATENCY = Histogram("prediction_latency_seconds", "Prediction latency")
@app.on_event("startup")
async def startup():
ml_model.load()
@app.get("/health", response_model=HealthResponse)
async def health():
return HealthResponse(
status="healthy" if ml_model.loaded else "unhealthy",
model_loaded=ml_model.loaded,
version=ml_model.version or "unknown"
)
@app.post("/predict", response_model=PredictionResponse)
async def predict(req: PredictionRequest):
start = time.time()
try:
result = ml_model.predict(req.features)
PREDICTIONS.labels(status="success").inc()
LATENCY.observe(time.time() - start)
return PredictionResponse(model_version=ml_model.version, **result)
except Exception as e:
PREDICTIONS.labels(status="error").inc()
raise HTTPException(status_code=500, detail=str(e))
@app.get("/metrics")
async def metrics():
return Response(content=generate_latest(), media_type="text/plain")ConfigMaps and Secrets
Learning objectives
1. Why externalize configuration?
In ML, your API needs parameters that vary by environment:
Development
Staging
Production
With ConfigMaps, you change configuration without rebuilding the Docker image.
2. ConfigMaps: storing configuration
2.1 Create a ConfigMap from literals
# Create a ConfigMap with key-value pairs kubectl create configmap ml-config \ --from-literal=MODEL_NAME=iris_classifier \ --from-literal=MODEL_VERSION=v2 \ --from-literal=LOG_LEVEL=INFO \ --from-literal=MAX_BATCH_SIZE=32
2.2 Create a ConfigMap from a file
First create a configuration file:
# config.properties model.name=iris_classifier model.version=v2 model.threshold=0.85 api.port=5000 api.workers=4 log.level=INFO
# Create the ConfigMap from the file kubectl create configmap ml-config --from-file=config.properties # Create from an entire directory kubectl create configmap ml-config --from-file=./config/
2.3 Declarative ConfigMap in YAML
apiVersion: v1
kind: ConfigMap
metadata:
name: ml-config
labels:
app: ml-api
data:
MODEL_NAME: "iris_classifier"
MODEL_VERSION: "v2"
LOG_LEVEL: "INFO"
MAX_BATCH_SIZE: "32"
FEATURE_COLUMNS: "sepal_length,sepal_width,petal_length,petal_width"
config.yaml: |
model:
name: iris_classifier
version: v2
threshold: 0.85
api:
port: 5000
workers: 4| symbol lets you include an entire file as a key value. Very useful for complete configuration files.3. Using ConfigMaps in Pods
3.1 As environment variables
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-api
spec:
replicas: 2
selector:
matchLabels:
app: ml-api
template:
metadata:
labels:
app: ml-api
spec:
containers:
- name: ml-api
image: monregistry/ml-api:v1
ports:
- containerPort: 5000
envFrom:
- configMapRef:
name: ml-config
env:
- name: SPECIFIC_KEY
valueFrom:
configMapKeyRef:
name: ml-config
key: MODEL_NAME| Method | Usage | Description |
|---|---|---|
envFrom | All keys | Injects all keys from the ConfigMap as environment variables |
valueFrom | Specific key | Injects a single key from the ConfigMap into a named variable |
3.2 As a mounted volume
apiVersion: v1
kind: Pod
metadata:
name: ml-pod-config
spec:
containers:
- name: ml-api
image: monregistry/ml-api:v1
volumeMounts:
- name: config-volume
mountPath: /app/config
readOnly: true
volumes:
- name: config-volume
configMap:
name: ml-configAnatomy of a Kubernetes YAML file
Learning objectives
1. Introduction to YAML
YAML stands for “YAML Ain’t Markup Language”. It is a human-readable data serialization format widely used for configuration.
1.1 Basic YAML rules
| Rule | Description | Example |
|---|---|---|
| Indentation | Only spaces (never tabs), usually 2 spaces | key: value |
| Key-value | Separated by : followed by a space | name: my-pod |
| Lists | Prefixed by a dash - | - item1 |
| Comments | Start with # | # This is a comment |
| Strings | Quotes optional unless special characters | name: "my:pod" |
| Booleans | true / false | enabled: true |
1.2 Key-value pairs
The simplest structure in YAML — a key associated with a value:
name: flask-ml-api version: "1.0" replicas: 3 debug: false
1.3 Lists (sequences)
Lists use a dash - for each item:
frameworks: - scikit-learn - tensorflow - pytorch - fastapi
1.4 Nested maps (dictionaries)
Maps let you create hierarchical structures:
server:
host: 0.0.0.0
port: 5000
options:
debug: true
workers: 41.5 YAML data types
Strings
simple: hello quotes: "world" multi: | line 1 line 2
Numbers
integer: 42 float: 3.14 scientific: 1e+6 octal: 0o14
Special
true: true false: false null: null date: 2026-03-05
This article covers the most useful excerpts — the complete ML Infrastructure Kubernetes course (12 chapters, 41 lessons, corrected exercises and final project) takes you all the way.
./access-the-full-course free course: Mastering Claude CodeFAQ
How long does it take to learn ML Infrastructure Kubernetes?
Are there any prerequisites?
Where to start concretely?
📬 Want to receive this type of guide every week? Subscribe for free — real code, zero fluff.