Data & Big Data

EDA with pandas, NumPy, Matplotlib & Seaborn: The 9 Key Steps from Zero to Operational

EDA pandas NumPy Matplotlib Seaborn: the essentials in one article — real code, diagrams and concrete steps, excerpts from a 44-lesson course.

REHOUMA Haythem

12 Jun 2026 • 12 min read

Everyone can learn EDA pandas NumPy Matplotlib Seaborn — provided they follow the steps in the right order. We have condensed a complete 44-lesson course into a clear path, with the most useful code snippets.

tl;dr

Introduction to Data Analysis
Introduction and installation
Getting started with Pandas DataFrames
Cleaning and Preparing Data
Descriptive Statistics and Aggregation

~$ cat ./parcours.md # EDA pandas NumPy Matplotlib Seaborn — 9 chapters

Introduction to Data Analysis

→ Data Analysis — The Job of the Century→ Chapter 00 — Course data sources

Introduction and installation

→ Why EDA and these four libraries?→ Install your working environment+ 2 more lessons

Getting started with Pandas DataFrames

→ Create and load a DataFrame (CSV, Excel, JSON)→ Explore a DataFrame — head, info, describe, shape+ 1 more lessons

Clean and Prepare Data

→ Detect and handle missing values→ Remove duplicates and correct data types+ 2 more lessons

Descriptive Statistics and Aggregation

→ Central tendency and dispersion — mean, median, standard deviation→ Correlation and covariance between variables+ 1 more lessons

Visualization with Matplotlib

→ Introduction to Matplotlib: Figure, Axes and subplots→ Essential charts: bars, lines, scatter+ 1 more lessons

Advanced Visualization with Seaborn

→ Introduction to Seaborn: histplot, boxplot, violinplot→ Visualize relationships: scatterplot and correlation heatmap+ 2 more lessons

Complete Exploratory Analysis

→ EDA Methodology: the 5 steps of a good analysis→ Detect outliers and anomalies in the data+ 1 more lessons

🏁

Final project (+ 1 chapters along the way)

→ You leave with a concrete and demonstrable project

Set up your working environment

NOTEWhat you will learn — Choose between Google Colab (zero installation, in the browser) and Anaconda + Jupyter (local installation), then install NumPy, Pandas, Matplotlib and Seaborn, and verify that everything works with a test script.

0. Google Colab — The zero-installation option

Google Colaboratory (Colab) is a free Jupyter environment that runs directly in your browser, with nothing to install. It runs on Google’s servers and already includes NumPy, Pandas, Matplotlib and Seaborn pre-installed.

TIPAnalogy — Google Colab is like working in a fully equipped office that Google lends you for free. You bring nothing: the desk, tools and libraries are already there. You open your browser and start immediately.

How to get started with Google Colab

Check the pre-installed versions in Colab

In the first cell of your Colab notebook, copy and run this code:

output

import numpy as np
import pandas as pd
import matplotlib
import seaborn as sns

print("NumPy     :", np.__version__)
print("Pandas    :", pd.__version__)
print("Matplotlib:", matplotlib.__version__)
print("Seaborn   :", sns.__version__)
print("\nEverything is ready. Happy analysis!")

output

# Method 1: Upload a file from your computer
from google.colab import files
uploaded = files.upload()   # a file-selection dialog opens

import pandas as pd
import io
df = pd.read_csv(io.BytesIO(uploaded['mon_fichier.csv']))

# Method 2: Read from Google Drive
from google.colab import drive
drive.mount('/content/drive')
df = pd.read_csv('/content/drive/MyDrive/mon_fichier.csv')

# Method 3: Read directly from a public URL
df = pd.read_csv('https://raw.githubusercontent.com/exemple/repo/main/data.csv')

Python only

Anaconda (recommended)

TIPAnalogy — Choosing between plain Python and Anaconda is like choosing between buying IKEA furniture piece by piece or buying a fully furnished apartment. Both work, but Anaconda saves you a lot of time at the start.

2. Step 1 — Download and install Anaconda

Download

Installation on Windows

WARNINGWindows only — If you do not check “Add Anaconda to PATH”, always use the Anaconda Prompt (not the regular Windows terminal) to run your conda and jupyter commands.

Verify the installation

Open Anaconda Prompt (Windows) or Terminal (macOS/Linux) and type:

output

conda --version

output

# Create an environment named "eda-cours" with Python 3.11
conda create -n eda-cours python=3.11

# Activate the environment
conda activate eda-cours

# Verify that the environment is active (the name appears in parentheses)
# (eda-cours) C:\Users\votre_nom>

Option A — With conda (recommended)

output

# Install all libraries in one command
conda install numpy pandas matplotlib seaborn jupyter -y

Launch from the terminal

output

# Make sure your environment is active
conda activate eda-cours

# Launch Jupyter Notebook
jupyter notebook

Chapter 08 – Introduction to data-science libraries

NOTEModule objectives

Understand what a Python library is
Import a library (import)
Import a specific module from a library (from ... import)
Use aliases (import numpy as np)
Use the math library as a first example
Install, update and verify a library’s configuration with PIP

1. What is a library?

Libraries are collections of ready-made modules that let you perform complex operations in just a few lines. There are many of them:

💻 CPU libraries Standard

🌞 GPU libraries NVIDIA RAPIDS

2. Importing a library — the `math` library

The math library is the perfect example for understanding imports. It is built into Python; no installation is required.

Official documentation: docs.python.org/2/library/math.html

2.1 Full import

output

import math

# Round up
print(math.ceil(0.1))    # round up

NOTERule — %command applies to a single line. %%command applies to the entire cell. The %% command must be on the first line of the cell.

6.1 Measuring execution time

Command	Description	Example
`%time`	Measures the time of a single line	`%time sum(range(1_000_000))`
`%%time`	Measures the time of the entire cell	Place on the first line of the cell
`%timeit`	Runs the line N times, returns the average	`%timeit sum(range(1_000_000))`
`%%timeit`	Runs the cell N times, returns the average	Place on the first line of the cell

output

%%time
# %%time — measures the TOTAL time of the cell (single execution)
import numpy as np
a = np.random.randn(1_000_000)
result = np.sort(a)

output

%timeit np.random.randn(1_000_000)
# %timeit — runs the line multiple times for an accurate measurement

output

%%timeit
# %%timeit — precise measurement of the entire cell (multiple executions)
import numpy as np
a = np.random.randn(10_000)
np.sort(a)

TIPWhen to use what?
• %%time → to quickly measure a cell (1 execution)
• %%timeit → for a reliable benchmark (multiple executions, average)
• %timeit → to compare two expressions on a single line

6.2 Profiling — detailed performance analysis

output

%prun sum(range(1_000_000))
# Displays the time spent in each called function

output

%%prun
# Profiling of the entire cell
import numpy as np
a = np.random.randn(100_000)
b = np.sort(a)
c = np.cumsum(b)

Chapter 08 – Practice 2: Pandas — DataFrame manipulation (CPU)

NOTEPandas

Extremely popular in data science
Allows manipulation of very large data tables (a kind of Excel on steroids)
Enormous number of features (filters, transformations, analyses…)
Bridges to other libraries (ML, data viz…)

1. Create a DataFrame

1.1 From a dictionary

output

import pandas as pd

produitsDict = {
    'smartphone': {'prix': 1000, 'enStock': True},
    'chaussures':  {'prix': 100,  'enStock': False},
    'console':     {'prix': 400,  'enStock': True}
}
print(produitsDict)

df = pd.DataFrame(produitsDict)
df

1.2 From a list of lists

output

pays = [
    [70, 55, 85],           # Population in millions
    [0.901, 0.922, 0.936],  # HDI
    [2091, 2077, 3045]      # GDP
]
df = pd.DataFrame(pays, columns=['France', 'England', 'Germany'])
df

1.3 Import a CSV file

output

import pandas as pd

data = pd.read_csv('metal-bands.csv', encoding='latin-1', sep=';')
data.head()

2. First look at the data

output

data.head(3)          # first 3 rows
data.info()           # types, non-null values, memory
data.dtypes           # type of each column
data.fans.dtypes      # type of a specific column
data.shape            # (rows, columns)
len(data)             # number of rows

3. Navigating a DataFrame — `iloc` and `loc`

NOTERule — iloc = numeric index (position). loc = label index (row/column name).

3.1 Select one or more columns

output

data['band_name'].head(10)             # 1 column
data[['band_name', 'fans']].head(15)   # multiple columns

3.2 `iloc` — by numeric position

output

data.iloc[0, 0]        # row 0, column 0
data.iloc[0:5, 0]      # rows 0-4, column 0
data.iloc[0, 0:5]      # row 0, columns 0-4
data.iloc[0:3, 0:5]    # 3-row × 5-column block

go-further

This article covers the most useful snippets — the complete EDA pandas NumPy Matplotlib Seaborn course (12 chapters, 44 lessons, corrected exercises and final project) takes you all the way.

./access-the-full-course free course: Mastering Claude Code

FAQ

How long does it take to learn EDA pandas NumPy Matplotlib Seaborn?

With a structured progression (12 chapters, 44 short practical lessons), you reach an operational level in a few weeks at 30–60 minutes per day. The key is to practice each concept immediately.

Are there any prerequisites?

Basic computer knowledge is enough. If you can use a terminal and read simple code, you are ready.

Where to start concretely?

Reproduce the commands in this article, then follow the complete EDA pandas NumPy Matplotlib Seaborn course: it chains the 44 lessons in order, with exercises and a final project.

./further-reading

→ AWS Data Engineering Bootcamp explained simply (with diagrams and real code)→ Get started with AWS Real-Time Data: your first concrete step today → Python Data Science: the 9 key steps to go from zero to operational

📬 Want to receive this kind of guide every week? Subscribe for free — real code, zero fluff.

Set up your working environment

0. Google Colab — The zero-installation option

How to get started with Google Colab

Check the pre-installed versions in Colab

Python only

Anaconda (recommended)

2. Step 1 — Download and install Anaconda

Download

Installation on Windows

Verify the installation

Option A — With conda (recommended)

Launch from the terminal

Chapter 08 – Introduction to data-science libraries

1. What is a library?

💻 CPU libraries Standard

🌞 GPU libraries NVIDIA RAPIDS

2. Importing a library — the math library

2.1 Full import

6.1 Measuring execution time

6.2 Profiling — detailed performance analysis

Chapter 08 – Practice 2: Pandas — DataFrame manipulation (CPU)

1. Create a DataFrame

1.1 From a dictionary

1.2 From a list of lists

1.3 Import a CSV file

2. First look at the data

3. Navigating a DataFrame — iloc and loc

3.1 Select one or more columns

3.2 iloc — by numeric position

FAQ

Stay up to date

2. Importing a library — the `math` library

3. Navigating a DataFrame — `iloc` and `loc`

3.2 `iloc` — by numeric position