Pandas is the Python library for tabular data manipulation. It’s built on top of NumPy and provides a structure called a DataFrame — essentially a table with rows and columns, with column names that act as labels. A DataFrame is mentally a spreadsheet living inside a Python program.

The conventional import:

import pandas as pd

Reading data into a DataFrame is one method call per format:

df = pd.read_csv("my_data.csv")           # CSV
df = pd.read_json("my_data.json")         # JSON
df = pd.read_excel("my_data.xlsx")        # Excel
df = pd.read_sql(query, connection)       # SQL database
df = pd.read_hdf("my_data.h5", "key")     # HDF5

Three patterns cover most of what we do with DataFrames in practice:

Column access by name. Returns a one-dimensional Pandas Series:

df['Name']

Position-based indexing with .iloc. NumPy-style slicing on numeric row and column positions:

df.iloc[0:2, 0]          # rows 0 and 1, column 0
df.iloc[:, -1]           # all rows, last column

Conditional filtering with .loc and a Boolean expression:

df.loc[df['Height'] > 5.8, :]    # rows where Height > 5.8, all columns

The Boolean array df['Height'] > 5.8 is evaluated row by row, and .loc keeps the rows where it’s True.

Pandas also provides the Pandas rolling method for windowed computations — moving averages, rolling features — and fillna, interpolate, dropna for Missing data handling. The pd.merge() and pd.concat() functions handle table joins and concatenations.

Pandas pairs naturally with Matplotlib (DataFrames have a .plot() method that wraps Matplotlib calls), scikit-learn (most sklearn estimators accept DataFrames directly), and NumPy (under the hood, columns are backed by NumPy arrays by default, though recent pandas versions also support PyArrow-backed dtypes). It’s essentially the lingua franca for tabular data in the Python data-science stack.