MLOps Fundamentals: The 9 Key Steps to Go from Zero to Operational
MLOps Fundamentals: the essentials in one article — real code, diagrams and concrete steps, excerpts from a 72-lesson course.
Everyone can learn MLOps Fundamentals — provided they follow the steps in the right order. We have condensed a complete 72-lesson course into a clear path, with the most useful code snippets.
- Install the MLOps environment
- Discover MLOps
- Data and model versioning
- ML Pipelines with MLflow
- Containerize ML models
Complete Guide to the Final Project – Step by Step
MLOps Fundamentals Course • Credit Card Fraud Detection • End-to-End MLOps Pipeline
This detailed guide walks you step by step through the final project. Each section corresponds to a key stage of the MLOps pipeline. Follow the steps in order. All code is provided and explained. Estimated duration: 8–12 hours total.
① Step 1 – Project and environment initialization
Create the project structure and Conda environment:
mkdir fraud-detection-mlops
cd fraud-detection-mlops
git init
git config user.email "vous@email.com"
git config user.name "Votre Nom"
conda create -n fraud-mlops python=3.10 -y
conda activate fraud-mlops
pip install scikit-learn xgboost pandas numpy mlflow dvc \
fastapi uvicorn pydantic evidently \
pytest pytest-cov httpx joblib matplotlib \
seaborn imbalanced-learn flake8 black
conda env export > environment.ymlCreate the requirements.txt file:
scikit-learn==1.4.0 xgboost==2.0.3 pandas==2.1.4 numpy==1.26.3 mlflow==2.10.0 dvc==3.38.1 fastapi==0.109.0 uvicorn==0.27.0 pydantic==2.5.3 evidently==0.4.16 pytest==7.4.4 pytest-cov==4.1.0 httpx==0.26.0 joblib==1.3.2 matplotlib==3.8.2 seaborn==0.13.1 imbalanced-learn==0.11.0
Create the folder structure:
mkdir -p data/raw data/processed src api monitoring/reports tests models .github/workflows
__pycache__/ *.pyc *.pyo .env models/ mlruns/ *.pkl data/raw/ data/processed/ monitoring/reports/*.html monitoring/reports/*.json .coverage htmlcov/
dvc init git add . git commit -m "feat: init project structure and DVC"
② Step 2 – Data generation and versioning
Create src/generate_synthetic_data.py (if you do not have access to the Kaggle dataset):
"""Generate a synthetic credit card fraud dataset."""
import pandas as pd
import numpy as np
def generate_fraud_dataset(n_samples=50000, fraud_ratio=0.002, random_state=42):
np.random.seed(random_state)
n_fraud = int(n_samples * fraud_ratio)
n_legit = n_samples - n_fraud
legit = pd.DataFrame({
f'V{i}': np.random.normal(0, 1, n_legit) for i in range(1, 29)
})
legit['Amount'] = np.abs(np.random.exponential(scale=88, size=n_legit))
legit['Time'] = np.sort(np.random.uniform(0, 172800, n_legit))
legit['Class'] = 0
fraud = pd.DataFrame({
f'V{i}': np.random.normal(np.random.uniform(-3, 3), 2, n_fraud) for i in range(1, 29)
})
fraud['Amount'] = np.abs(np.random.exponential(scale=122, size=n_fraud))
fraud['Time'] = np.random.uniform(0, 172800, n_fraud)
fraud['Class'] = 1
df = pd.concat([legit, fraud], ignore_index=True)
df = df.sample(frac=1, random_state=random_state).reset_index(drop=True)
return df
if __name__ == "__main__":
df = generate_fraud_dataset()
df.to_csv("data/raw/creditcard.csv", index=False)
print(f"Dataset generated: {len(df)} rows")
print(f"Frauds: {df['Class'].sum()} ({df['Class'].mean()*100:.3f}%)")Dockerfile for an ML model: build and run
MLOps Fundamentals Course — Chapter 04 — Containerize ML models
- Understand every instruction in a Dockerfile for an ML model
- Write a complete Dockerfile for a scikit-learn API with FastAPI
- Build a Docker image with
docker build - Run and test an ML container with
docker run - Debug common ML container issues
1. ML project structure to containerize
Before writing the Dockerfile, let’s look at the project structure we are going to containerize. It is a wine classification API based on a Random Forest model trained with scikit-learn, served via FastAPI.
wine-classifier/
ââboxur;ââboxh;ââboxh; Dockerfile # Our Docker configuration file
ââboxur;ââboxh;ââboxh; .dockerignore # Files to exclude from the image
ââboxur;ââboxh;ââboxh; requirements.txt # Python dependencies
ââboxur;ââboxh;ââboxh; src/
ââboxvr;ââboxh;ââboxh; app.py # FastAPI application (entry point)
ââboxvr;ââboxh;ââboxh; predict.py # Prediction logic
ââboxvr;ââboxh;ââboxh; preprocessing.py # Feature preprocessing
ââboxur;ââboxh;ââboxh; models/
ââboxur;ââboxh;ââboxh; rf_classifier.pkl # Serialized Random Forest model
ââboxur;ââboxh;ââboxh; scaler.pkl # Saved StandardScalerThe requirements.txt file
The requirements.txt lists all Python dependencies with their pinned versions. This is crucial for reproducibility: without a fixed version, pip install could download a newer, incompatible version.
# requirements.txt # Web framework fastapi==0.110.0 uvicorn[standard]==0.27.1 # Machine Learning scikit-learn==1.4.1 numpy==1.26.4 pandas==2.2.1 # Model serialization joblib==1.3.2 # Data validation pydantic==2.6.3
Avoid
scikit-learn>=1.0 or scikit-learn without a version. If scikit-learn releases version 2.0 with API changes, your container will break on the next rebuild. Always use scikit-learn==1.4.1.The FastAPI application (src/app.py)
# src/app.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib
import numpy as np
import os
app = FastAPI(
title="Wine Quality Classifier API",
description="Random Forest based wine quality classification API",
version="1.0.0"
)
# Load the model at startup (not on every request)
MODEL_PATH = os.getenv("MODEL_PATH", "/app/models/rf_classifier.pkl")
SCALER_PATH = os.getenv("SCALER_PATH", "/app/models/scaler.pkl")
model = joblib.load(MODEL_PATH)
scaler = joblib.load(SCALER_PATH)
class WineFeatures(BaseModel):
fixed_acidity: float
volatile_acidity: float
citric_acid: float
residual_sugar: float
chlorides: float
free_sulfur_dioxide: float
total_sulfur_dioxide: float
density: float
pH: float
sulphates: float
alcohol: float
@app.get("/health")
def health_check():
return {"status": "ok", "model": "rf_classifier", "version": "1.0.0"}
@app.post("/predict")
def predict(features: WineFeatures):
try:
X = np.array([[
features.fixed_acidity, features.volatile_acidity,
features.citric_acid, features.residual_sugar,
features.chlorides, features.free_sulfur_dioxide,
features.total_sulfur_dioxide, features.density,
features.pH, features.sulphates, features.alcohol
]])
X_scaled = scaler.transform(X)
prediction = model.predict(X_scaled)[0]
probability = model.predict_proba(X_scaled)[0].max()
return {
"quality": int(prediction),
"confidence": round(float(probability), 4),
"label": "good wine" if prediction >= 6 else "average wine"
}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))2. The complete commented Dockerfile
Here is the complete Dockerfile. We will analyze each instruction in detail right after.
# ============================================================
# Dockerfile for a scikit-learn model served with FastAPI
# ============================================================
# Step 1: Base image
FROM python:3.11-slim
# Step 2: Image metadata
LABEL maintainer="mlops-team@exemple.com"
LABEL version="1.0.0"
LABEL description="Wine Quality Classifier - Random Forest API"
# Step 3: System environment variables
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1
# Step 4: Working directory inside the container
WORKDIR /app
# Step 5: System dependencies (if needed)
RUN apt-get update && apt-get install -y --no-install-recommends \
libgomp1 \
&& rm -rf /var/lib/apt/lists/*
# Step 6: Copy AND install Python dependencies (cached layer)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Step 7: Copy source code
COPY src/ ./src/
# Step 8: Copy the ML model
COPY models/ ./models/
# Step 9: Expose the API port
EXPOSE 8000
# Step 10: Non-root user for security
RUN useradd --create-home --shell /bin/bash appuser
USER appuser
# Step 11: Startup command
CMD ["uvicorn", "src.app:app", "--host", "0.0.0.0", "--port", "8000"]3. Detailed analysis of each instruction
3.1 FROM python:3.11-slim
FROM defines the base image. python:3.11-slim is the lightweight version of the official Python image:
For scikit-learn, slim is the right compromise: small enough, yet compatible with the C extensions of numpy/scipy.
:latestFROM python:latest can jump from Python 3.11 to 3.12 overnight and break your code. Always specify the exact version: python:3.11-slim or even python:3.11.8-slim.If your model uses the GPU (TensorFlow, PyTorch), use:
FROM nvidia/cuda:12.1-cudnn8-runtime-ubuntu22.043.2 Environment variables ENV
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1| Variable | Effect | Why enable it? |
|---|---|---|
PYTHONDONTWRITEBYTECODE=1 |
No .pyc files |
Reduces image size, avoids unnecessary files |
PYTHONUNBUFFERED=1 |
Real-time logs | Essential to see logs in docker logs immediately |
PIP_NO_CACHE_DIR=1 |
No pip cache in the image | Reduces final image size (can save 100-200 MB) |
PIP_DISABLE_PIP_VERSION_CHECK=1 |
No pip update check | Speeds up the build, avoids spurious warnings |
3.3 WORKDIR /app
WORKDIR defines the working directory inside the container. All subsequent instructions (COPY, RUN, CMD) run from this directory. If the directory does not exist, Docker creates it automatically.
/app as the working directory for Python applications. Avoid working directly in / or /root.Optimizing ML Docker images
MLOps Fundamentals Course — Chapter 04 — Containerize ML models
- Understand why ML Docker image size is a critical concern in production
- Master multi-stage builds to separate build and runtime
- Configure an effective
.dockerignorefor ML projects - Leverage Docker layer caching to the fullest
- Apply best practices for production ML images
1. Why ML image size is critical
Unlike a classic web application (a few tens of MB), an ML Docker image can easily reach 2 to 8 GB. This size has direct consequences on your MLOps workflow:
🚬 Slow deployment
Downloading a 5 GB image on a Kubernetes cluster takes 10 to 20 minutes. Under load with auto-scaling, this is unacceptable.
💸 Storage cost
Storing 50 versions of a 4 GB image in AWS ECR = 200 GB × $0.10/GB/month = $20/month just for storage.
🔐 Attack surface
Every extra tool (compilers, system utilities) is a potential security vulnerability. A minimal image is more secure.
2. Comparison: fat images vs slim images
| Approach | Base image | Typical size | Pull time (100 Mbit/s) | Vulnerabilities |
|---|---|---|---|---|
| 🔴 Fat image (naive) | python:3.11 + all tools |
~2.1 GB | ~170 s | Very numerous |
| 🟡 Slim image (good practice) | python:3.11-slim |
~700 MB | ~56 s | Moderate |
| 🟢 Multi-stage slim | Build: python:3.11-slimRun: python:3.11-slim |
~350 MB | ~28 s | Few |
| 🆕 Distroless | gcr.io/distroless/python3 |
~180 MB | ~14 s | Very few |
| 🟢 Alpine (with precautions) | python:3.11-alpine |
~100 MB | ~8 s | Minimal |
Alpine uses
musl libc instead of glibc. Numpy, scikit-learn, pandas and PyTorch are compiled for glibc. Using Alpine forces compilation from source, which takes 30 to 60 minutes and may fail. Reserve Alpine for microservices without C dependencies. For ML, prefer python:3.11-slim.3. The .dockerignore: your first line of defense
Even before the multi-stage build, .dockerignore is the simplest and most immediate optimization. It prevents Docker from sending unnecessary (or even dangerous) files to the build daemon.
When you run
docker build ., Docker sends the entire current directory to the Docker daemon (called the “build context”). Without .dockerignore, this includes your 10 GB dataset, Jupyter notebooks, Python virtual environments, secret configuration files…# .dockerignore for an ML project # Version control .git/ .gitignore .github/ # Python virtual environments (critical! can be hundreds of MB) .venv/ venv/ env/ ENV/ __pycache__/ *.py[cod] *.pyo .pytest_cache/ .mypy_cache/ # ML data (never ship inside the image!) data/ datasets/ *.csv *.parquet *.arrow *.feather # Jupyter notebooks (not needed in production) *.ipynb .ipynb_checkpoints/ # Local configuration files .env .env.local .env.development *.env # Documentation and tests (useless in production) docs/ tests/ README.md CHANGELOG.md # Build artifacts dist/ build/ *.egg-info/ htmlcov/ .coverage # IDEs and editors .vscode/ .idea/ *.swp *.swo .DS_Store Thumbs.db # Logs and temporary files logs/ *.log tmp/ temp/ # MLflow files and experiments mlruns/ mlflow.db # Test models (keep only the production model) models/experiments/ models/checkpoints/
Measuring the impact of .dockerignore
# Check build context size BEFORE .dockerignore docker build -t test-before . 2>&1 | head -5 # Sending build context to Docker daemon 4.521GB <-- without .dockerignore! # Check build context size AFTER .dockerignore docker build -t test-after . 2>&1 | head -5 # Sending build context to Docker daemon 12.34MB <-- with .dockerignore
4. Multi-stage builds for ML
A multi-stage build uses multiple FROM instructions in a single Dockerfile. Each stage produces an intermediate layer, and only the final stage is kept in the resulting image. This allows you to:
4.1 Multi-stage build for a scikit-learn model
# ============================================================
# Multi-stage Dockerfile for Wine Classifier (production)
# ============================================================
# ---- STEP 1: Builder ----
# This stage installs everything, including build tools
FROM python:3.11-slim AS builder
# Environment variables for the build
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1
WORKDIR /build
# Install system build tools (will NOT be in the final image)
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc \
g++ \
build-essential \
libgomp1 \
&& rm -rf /var/lib/apt/lists/*
# Copy and install into an isolated local directory (--prefix)
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
# ---- STEP 2: Production image (final) ----
# Final ultra-light image: does NOT contain gcc, build-essential, etc.
FROM python:3.11-slim AS production
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
WORKDIR /app
# Copy only the libraries installed from the builder
COPY --from=builder /install /usr/local
# Minimal runtime system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
libgomp1 \
&& rm -rf /var/lib/apt/lists/*
# Copy source code and model
COPY src/ ./src/
COPY models/ ./models/
# Security: non-root user
RUN useradd --create-home --shell /bin/bash appuser && \
chown -R appuser:appuser /app
USER appuser
EXPOSE 8000
CMD ["uvicorn", "src.app:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]By removing
gcc, g++, build-essential and intermediate compilation files from the final image, you typically save between 200 and 600 MB depending on the C/C++ dependencies used.4.2 Multi-stage build with test stage
# ============================================================ # Multi-stage Dockerfile with integrated test stage # ============================================================ FROM python:3.11-slim AS base ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1 PIP_NO_CACHE_DIR=1 WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # ---- Test stage ---- FROM base AS test COPY requirements-dev.txt . RUN pip install --no-cache-dir -r requirements-dev.txt COPY src/ ./src/ COPY models/ ./models/ COPY tests/ ./tests/ RUN pytest tests/ -v --tb=short # ---- Production stage ---- FROM base AS production COPY src/ ./src/ COPY models/ ./models/ RUN useradd --create-home appuser && chown -R appuser /app USER appuser EXPOSE 8000 CMD ["uvicorn", "src.app:app", "--host", "0.0.0.0", "--port", "8000"]
This article covers the most useful snippets — the complete MLOps Fundamentals course (13 chapters, 72 lessons, corrected exercises and final project) takes you all the way.
./access-the-complete-course free course: Mastering Claude CodeFAQ
How long does it take to learn MLOps Fundamentals?
Are there any prerequisites?
Where to start concretely?
📬 Want to receive this type of guide every week? Subscribe for free — real code, zero fluff.