MLOps Fundamentals: The 9 Key Steps to Go from Zero to Operational

MLOps Fundamentals: the essentials in one article — real code, diagrams and concrete steps, excerpts from a 72-lesson course.

MLOps Fundamentals: The 9 Key Steps to Go from Zero to Operational

Everyone can learn MLOps Fundamentals — provided they follow the steps in the right order. We have condensed a complete 72-lesson course into a clear path, with the most useful code snippets.

tl;dr
  • Install the MLOps environment
  • Discover MLOps
  • Data and model versioning
  • ML Pipelines with MLflow
  • Containerize ML models
~$ cat ./parcours.md # MLOps Fundamentals — 12 chapters
01
Install the MLOps environment
→ Install Python, conda and essential tools→ Configure Git, GitHub and best practices+ 1 more lessons
02
Discover MLOps
→ What is MLOps and why is it essential?→ The lifecycle of an ML model in production+ 1 more lessons
03
Data and model versioning
→ Why version data and models?→ DVC — installation and first steps+ 1 more lessons
04
ML Pipelines with MLflow
→ MLflow Tracking: log ML experiments→ MLflow Model Registry: manage the model lifecycle+ 1 more lessons
05
Containerize ML models
→ Why Docker is essential for MLOps→ Dockerfile for an ML model: build and run+ 1 more lessons
06
Deploy with FastAPI
→ FastAPI to serve ML models→ Build a complete prediction API+ 1 more lessons
07
CI/CD for ML
→ What is CI/CD for ML?→ GitHub Actions: automate ML tests+ 2 more lessons
08
Monitoring models in production
→ Why do models degrade in production?→ Detect Data Drift with Evidently+ 1 more lessons
🏁
Final project (+ 4 chapters along the way)
→ You leave with a concrete and demonstrable project

Complete Guide to the Final Project – Step by Step

MLOps Fundamentals Course • Credit Card Fraud Detection • End-to-End MLOps Pipeline

NOTE📚 About this guide
This detailed guide walks you step by step through the final project. Each section corresponds to a key stage of the MLOps pipeline. Follow the steps in order. All code is provided and explained. Estimated duration: 8–12 hours total.

① Step 1 – Project and environment initialization

Create the project structure and Conda environment:

bash
mkdir fraud-detection-mlops
cd fraud-detection-mlops
git init
git config user.email "vous@email.com"
git config user.name "Votre Nom"

conda create -n fraud-mlops python=3.10 -y
conda activate fraud-mlops

pip install scikit-learn xgboost pandas numpy mlflow dvc \
            fastapi uvicorn pydantic evidently \
            pytest pytest-cov httpx joblib matplotlib \
            seaborn imbalanced-learn flake8 black

conda env export > environment.yml

Create the requirements.txt file:

output
scikit-learn==1.4.0
xgboost==2.0.3
pandas==2.1.4
numpy==1.26.3
mlflow==2.10.0
dvc==3.38.1
fastapi==0.109.0
uvicorn==0.27.0
pydantic==2.5.3
evidently==0.4.16
pytest==7.4.4
pytest-cov==4.1.0
httpx==0.26.0
joblib==1.3.2
matplotlib==3.8.2
seaborn==0.13.1
imbalanced-learn==0.11.0

Create the folder structure:

bash
mkdir -p data/raw data/processed src api monitoring/reports tests models .github/workflows
output
__pycache__/
*.pyc
*.pyo
.env
models/
mlruns/
*.pkl
data/raw/
data/processed/
monitoring/reports/*.html
monitoring/reports/*.json
.coverage
htmlcov/
bash
dvc init
git add .
git commit -m "feat: init project structure and DVC"

② Step 2 – Data generation and versioning

Create src/generate_synthetic_data.py (if you do not have access to the Kaggle dataset):

output
"""Generate a synthetic credit card fraud dataset."""
import pandas as pd
import numpy as np

def generate_fraud_dataset(n_samples=50000, fraud_ratio=0.002, random_state=42):
    np.random.seed(random_state)
    n_fraud = int(n_samples * fraud_ratio)
    n_legit = n_samples - n_fraud

    legit = pd.DataFrame({
        f'V{i}': np.random.normal(0, 1, n_legit) for i in range(1, 29)
    })
    legit['Amount'] = np.abs(np.random.exponential(scale=88, size=n_legit))
    legit['Time'] = np.sort(np.random.uniform(0, 172800, n_legit))
    legit['Class'] = 0

    fraud = pd.DataFrame({
        f'V{i}': np.random.normal(np.random.uniform(-3, 3), 2, n_fraud) for i in range(1, 29)
    })
    fraud['Amount'] = np.abs(np.random.exponential(scale=122, size=n_fraud))
    fraud['Time'] = np.random.uniform(0, 172800, n_fraud)
    fraud['Class'] = 1

    df = pd.concat([legit, fraud], ignore_index=True)
    df = df.sample(frac=1, random_state=random_state).reset_index(drop=True)
    return df

if __name__ == "__main__":
    df = generate_fraud_dataset()
    df.to_csv("data/raw/creditcard.csv", index=False)
    print(f"Dataset generated: {len(df)} rows")
    print(f"Frauds: {df['Class'].sum()} ({df['Class'].mean()*100:.3f}%)")

Dockerfile for an ML model: build and run

MLOps Fundamentals Course — Chapter 04 — Containerize ML models

NOTE🎯 Learning objectives
  • Understand every instruction in a Dockerfile for an ML model
  • Write a complete Dockerfile for a scikit-learn API with FastAPI
  • Build a Docker image with docker build
  • Run and test an ML container with docker run
  • Debug common ML container issues

1. ML project structure to containerize

Before writing the Dockerfile, let’s look at the project structure we are going to containerize. It is a wine classification API based on a Random Forest model trained with scikit-learn, served via FastAPI.

output
wine-classifier/
ââboxur;ââboxh;ââboxh; Dockerfile              # Our Docker configuration file
ââboxur;ââboxh;ââboxh; .dockerignore           # Files to exclude from the image
ââboxur;ââboxh;ââboxh; requirements.txt        # Python dependencies
ââboxur;ââboxh;ââboxh; src/
ââboxvr;ââboxh;ââboxh;   app.py               # FastAPI application (entry point)
ââboxvr;ââboxh;ââboxh;   predict.py           # Prediction logic
ââboxvr;ââboxh;ââboxh;   preprocessing.py     # Feature preprocessing
ââboxur;ââboxh;ââboxh; models/
    ââboxur;ââboxh;ââboxh; rf_classifier.pkl   # Serialized Random Forest model
    ââboxur;ââboxh;ââboxh; scaler.pkl           # Saved StandardScaler

The requirements.txt file

The requirements.txt lists all Python dependencies with their pinned versions. This is crucial for reproducibility: without a fixed version, pip install could download a newer, incompatible version.

output
# requirements.txt
# Web framework
fastapi==0.110.0
uvicorn[standard]==0.27.1

# Machine Learning
scikit-learn==1.4.1
numpy==1.26.4
pandas==2.2.1

# Model serialization
joblib==1.3.2

# Data validation
pydantic==2.6.3
WARNING⚠ Always pin versions!
Avoid scikit-learn>=1.0 or scikit-learn without a version. If scikit-learn releases version 2.0 with API changes, your container will break on the next rebuild. Always use scikit-learn==1.4.1.

The FastAPI application (src/app.py)

output
# src/app.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib
import numpy as np
import os

app = FastAPI(
    title="Wine Quality Classifier API",
    description="Random Forest based wine quality classification API",
    version="1.0.0"
)

# Load the model at startup (not on every request)
MODEL_PATH = os.getenv("MODEL_PATH", "/app/models/rf_classifier.pkl")
SCALER_PATH = os.getenv("SCALER_PATH", "/app/models/scaler.pkl")

model = joblib.load(MODEL_PATH)
scaler = joblib.load(SCALER_PATH)

class WineFeatures(BaseModel):
    fixed_acidity: float
    volatile_acidity: float
    citric_acid: float
    residual_sugar: float
    chlorides: float
    free_sulfur_dioxide: float
    total_sulfur_dioxide: float
    density: float
    pH: float
    sulphates: float
    alcohol: float

@app.get("/health")
def health_check():
    return {"status": "ok", "model": "rf_classifier", "version": "1.0.0"}

@app.post("/predict")
def predict(features: WineFeatures):
    try:
        X = np.array([[
            features.fixed_acidity, features.volatile_acidity,
            features.citric_acid, features.residual_sugar,
            features.chlorides, features.free_sulfur_dioxide,
            features.total_sulfur_dioxide, features.density,
            features.pH, features.sulphates, features.alcohol
        ]])
        X_scaled = scaler.transform(X)
        prediction = model.predict(X_scaled)[0]
        probability = model.predict_proba(X_scaled)[0].max()
        return {
            "quality": int(prediction),
            "confidence": round(float(probability), 4),
            "label": "good wine" if prediction >= 6 else "average wine"
        }
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

2. The complete commented Dockerfile

Here is the complete Dockerfile. We will analyze each instruction in detail right after.

output
# ============================================================
# Dockerfile for a scikit-learn model served with FastAPI
# ============================================================

# Step 1: Base image
FROM python:3.11-slim

# Step 2: Image metadata
LABEL maintainer="mlops-team@exemple.com"
LABEL version="1.0.0"
LABEL description="Wine Quality Classifier - Random Forest API"

# Step 3: System environment variables
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1

# Step 4: Working directory inside the container
WORKDIR /app

# Step 5: System dependencies (if needed)
RUN apt-get update && apt-get install -y --no-install-recommends \
    libgomp1 \
    && rm -rf /var/lib/apt/lists/*

# Step 6: Copy AND install Python dependencies (cached layer)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Step 7: Copy source code
COPY src/ ./src/

# Step 8: Copy the ML model
COPY models/ ./models/

# Step 9: Expose the API port
EXPOSE 8000

# Step 10: Non-root user for security
RUN useradd --create-home --shell /bin/bash appuser
USER appuser

# Step 11: Startup command
CMD ["uvicorn", "src.app:app", "--host", "0.0.0.0", "--port", "8000"]

3. Detailed analysis of each instruction

3.1 FROM python:3.11-slim

FROM defines the base image. python:3.11-slim is the lightweight version of the official Python image:

For scikit-learn, slim is the right compromise: small enough, yet compatible with the C extensions of numpy/scipy.

WARNING⚠ Avoid :latest
FROM python:latest can jump from Python 3.11 to 3.12 overnight and break your code. Always specify the exact version: python:3.11-slim or even python:3.11.8-slim.
TIP💡 For GPU
If your model uses the GPU (TensorFlow, PyTorch), use:
FROM nvidia/cuda:12.1-cudnn8-runtime-ubuntu22.04

3.2 Environment variables ENV

output
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1
Variable Effect Why enable it?
PYTHONDONTWRITEBYTECODE=1 No .pyc files Reduces image size, avoids unnecessary files
PYTHONUNBUFFERED=1 Real-time logs Essential to see logs in docker logs immediately
PIP_NO_CACHE_DIR=1 No pip cache in the image Reduces final image size (can save 100-200 MB)
PIP_DISABLE_PIP_VERSION_CHECK=1 No pip update check Speeds up the build, avoids spurious warnings

3.3 WORKDIR /app

WORKDIR defines the working directory inside the container. All subsequent instructions (COPY, RUN, CMD) run from this directory. If the directory does not exist, Docker creates it automatically.

TIP💡 Best practice: Always use /app as the working directory for Python applications. Avoid working directly in / or /root.

Optimizing ML Docker images

MLOps Fundamentals Course — Chapter 04 — Containerize ML models

NOTE🎯 Learning objectives
  • Understand why ML Docker image size is a critical concern in production
  • Master multi-stage builds to separate build and runtime
  • Configure an effective .dockerignore for ML projects
  • Leverage Docker layer caching to the fullest
  • Apply best practices for production ML images

1. Why ML image size is critical

Unlike a classic web application (a few tens of MB), an ML Docker image can easily reach 2 to 8 GB. This size has direct consequences on your MLOps workflow:

🚬 Slow deployment

Downloading a 5 GB image on a Kubernetes cluster takes 10 to 20 minutes. Under load with auto-scaling, this is unacceptable.

💸 Storage cost

Storing 50 versions of a 4 GB image in AWS ECR = 200 GB × $0.10/GB/month = $20/month just for storage.

🔐 Attack surface

Every extra tool (compilers, system utilities) is a potential security vulnerability. A minimal image is more secure.

2. Comparison: fat images vs slim images

Approach Base image Typical size Pull time (100 Mbit/s) Vulnerabilities
🔴 Fat image (naive) python:3.11 + all tools ~2.1 GB ~170 s Very numerous
🟡 Slim image (good practice) python:3.11-slim ~700 MB ~56 s Moderate
🟢 Multi-stage slim Build: python:3.11-slim
Run: python:3.11-slim
~350 MB ~28 s Few
🆕 Distroless gcr.io/distroless/python3 ~180 MB ~14 s Very few
🟢 Alpine (with precautions) python:3.11-alpine ~100 MB ~8 s Minimal
WARNING⚠ Alpine and ML libraries
Alpine uses musl libc instead of glibc. Numpy, scikit-learn, pandas and PyTorch are compiled for glibc. Using Alpine forces compilation from source, which takes 30 to 60 minutes and may fail. Reserve Alpine for microservices without C dependencies. For ML, prefer python:3.11-slim.

3. The .dockerignore: your first line of defense

Even before the multi-stage build, .dockerignore is the simplest and most immediate optimization. It prevents Docker from sending unnecessary (or even dangerous) files to the build daemon.

TIP💡 Docker build context
When you run docker build ., Docker sends the entire current directory to the Docker daemon (called the “build context”). Without .dockerignore, this includes your 10 GB dataset, Jupyter notebooks, Python virtual environments, secret configuration files…
output
# .dockerignore for an ML project

# Version control
.git/
.gitignore
.github/

# Python virtual environments (critical! can be hundreds of MB)
.venv/
venv/
env/
ENV/
__pycache__/
*.py[cod]
*.pyo
.pytest_cache/
.mypy_cache/

# ML data (never ship inside the image!)
data/
datasets/
*.csv
*.parquet
*.arrow
*.feather

# Jupyter notebooks (not needed in production)
*.ipynb
.ipynb_checkpoints/

# Local configuration files
.env
.env.local
.env.development
*.env

# Documentation and tests (useless in production)
docs/
tests/
README.md
CHANGELOG.md

# Build artifacts
dist/
build/
*.egg-info/
htmlcov/
.coverage

# IDEs and editors
.vscode/
.idea/
*.swp
*.swo
.DS_Store
Thumbs.db

# Logs and temporary files
logs/
*.log
tmp/
temp/

# MLflow files and experiments
mlruns/
mlflow.db

# Test models (keep only the production model)
models/experiments/
models/checkpoints/

Measuring the impact of .dockerignore

bash
# Check build context size BEFORE .dockerignore
docker build -t test-before . 2>&1 | head -5
# Sending build context to Docker daemon  4.521GB  <-- without .dockerignore!

# Check build context size AFTER .dockerignore
docker build -t test-after . 2>&1 | head -5
# Sending build context to Docker daemon  12.34MB  <-- with .dockerignore

4. Multi-stage builds for ML

A multi-stage build uses multiple FROM instructions in a single Dockerfile. Each stage produces an intermediate layer, and only the final stage is kept in the resulting image. This allows you to:

4.1 Multi-stage build for a scikit-learn model

output
# ============================================================
# Multi-stage Dockerfile for Wine Classifier (production)
# ============================================================

# ---- STEP 1: Builder ----
# This stage installs everything, including build tools
FROM python:3.11-slim AS builder

# Environment variables for the build
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1

WORKDIR /build

# Install system build tools (will NOT be in the final image)
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    g++ \
    build-essential \
    libgomp1 \
    && rm -rf /var/lib/apt/lists/*

# Copy and install into an isolated local directory (--prefix)
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt


# ---- STEP 2: Production image (final) ----
# Final ultra-light image: does NOT contain gcc, build-essential, etc.
FROM python:3.11-slim AS production

ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

WORKDIR /app

# Copy only the libraries installed from the builder
COPY --from=builder /install /usr/local

# Minimal runtime system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    libgomp1 \
    && rm -rf /var/lib/apt/lists/*

# Copy source code and model
COPY src/ ./src/
COPY models/ ./models/

# Security: non-root user
RUN useradd --create-home --shell /bin/bash appuser && \
    chown -R appuser:appuser /app
USER appuser

EXPOSE 8000

CMD ["uvicorn", "src.app:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]
TIP💡 Typical size gain with multi-stage
By removing gcc, g++, build-essential and intermediate compilation files from the final image, you typically save between 200 and 600 MB depending on the C/C++ dependencies used.

4.2 Multi-stage build with test stage

output
# ============================================================
# Multi-stage Dockerfile with integrated test stage
# ============================================================

FROM python:3.11-slim AS base
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1 PIP_NO_CACHE_DIR=1
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt


# ---- Test stage ----
FROM base AS test
COPY requirements-dev.txt .
RUN pip install --no-cache-dir -r requirements-dev.txt
COPY src/ ./src/
COPY models/ ./models/
COPY tests/ ./tests/
RUN pytest tests/ -v --tb=short


# ---- Production stage ----
FROM base AS production
COPY src/ ./src/
COPY models/ ./models/
RUN useradd --create-home appuser && chown -R appuser /app
USER appuser
EXPOSE 8000
CMD ["uvicorn", "src.app:app", "--host", "0.0.0.0", "--port", "8000"]
go-further

This article covers the most useful snippets — the complete MLOps Fundamentals course (13 chapters, 72 lessons, corrected exercises and final project) takes you all the way.

./access-the-complete-course free course: Mastering Claude Code

FAQ

How long does it take to learn MLOps Fundamentals?
With a structured progression (13 chapters, 72 short practical lessons), you reach an operational level in a few weeks at 30 to 60 minutes per day. The key is to practice each concept immediately.
Are there any prerequisites?
It is best to be comfortable with the fundamentals of the domain: this content goes in depth, with real-world cases.
Where to start concretely?
Reproduce the commands in this article, then follow the complete MLOps Fundamentals course: it chains the 72 lessons in order, with exercises and a final project.

📬 Want to receive this type of guide every week? Subscribe for free — real code, zero fluff.