BLOB (Binary Large Object) is the SQL data type for storing raw bytes inside a database — images, PDFs, audio files, serialized objects, anything that doesn’t fit into a text or numeric type. MySQL has BLOB variants of increasing size: TINYBLOB (up to 255 bytes), BLOB (up to 65,535 bytes), MEDIUMBLOB (up to ~16 MB), LONGBLOB (up to ~4 GB).
Other dialects spell it differently. PostgreSQL has BYTEA (one type, up to 1 GB) plus a separate “large object” system with its own API. SQLite has a single BLOB storage class with no declared size limit beyond the per-row maximum. SQL Server uses VARBINARY(MAX) for up to 2 GB. The MySQL nomenclature below is one dialect among several.
CREATE TABLE attachments (
attachment_id INT NOT NULL AUTO_INCREMENT,
file_name VARCHAR(255),
content_type VARCHAR(100),
bytes MEDIUMBLOB,
PRIMARY KEY (attachment_id)
);Whether to store binary files inside the database or to keep them on the filesystem and store their paths in the database is a design question that comes up often.
Inside the database (BLOB):
- Transactional with the rest of the row — backups, restores, replication treat the file as part of the data.
- No risk of files going missing while their database row still exists.
- Permissions, encryption, and replication apply uniformly.
- But: bloats the database, slows queries that don’t need the file content, complicates streaming.
On the filesystem, path in the database:
- The database stays small and fast.
- Streaming large files is easy through the filesystem or a CDN.
- But: needs careful synchronization — orphan files when rows are deleted, missing files when files are deleted out from under rows, transactional consistency has to be enforced by the application.
For the Introduction to Data Science course context, the textbook generally favors keeping large binary data outside the database. BLOB is what you’d use if you wanted it inside. Modern object stores (S3, GCS, Azure Blob Storage) are usually the right answer for production binary data, with the database storing the object’s identifier and metadata.