Dive Deep into IMDB: SQL Analysis of Top Movies with Python & DuckDB
Want to explore the world of cinema? This project provides a fun and insightful SQL analysis of top-rated IMDB movies using Python, DuckDB, and Quarto. Discover hidden trends and fascinating facts about your favorite films!
Why Analyze IMDB Movie Data with SQL?
Uncover hidden insights that go beyond simple ratings. SQL allows you to:
- Filter and sort movies based on specific criteria (genre, year, director).
- Calculate averages for scores, runtime, and more.
- Identify correlations between different movie attributes.
Project Features: Your Toolkit for Cinematic Discovery
This project isn't just about running queries; it's about making data accessible and engaging:
- SQL-Powered Analysis: Harness the power of SQL to delve into movie data.
- Interactive HTML Tables: Explore results in dynamic, sortable tables using 'itables'.
- DuckDB Integration: Benefit from the speed and efficiency of DuckDB for local SQL querying.
- Reproducible Research: Quarto ensures your analysis is well-documented and easily shareable.
Getting Started: Your Adventure into Movie Data Analysis
Ready to explore? Here's how to set up the project:
-
Clone the Repository:
-
Create a Virtual Environment: Ensure a clean and isolated environment for the project dependencies:
-
Dataset: You must place your SQLite database
movies.db
inside thedata/
folder. It should include the proper columns:- title - director - year - rating - genres - runtime - country - language - imdb_score - imdb_votes - metacritic_score
-
Render the Notebook: Transform your Quarto file into a browsable HTML report:
Alternatively, open it interactively in Jupyter:
Diving Deeper: Understanding the Data
The heart of the analysis lies in the movies.db
SQLite database. This database fuels your exploration of IMDB movie data, containing a movies
table with crucial information, including:
title
: The movie's title.director
: The director of the movie.year
: The release year.rating
: The movie's rating label (e.g., PG-13, R).genres
: The movie's genres.runtime
: The movie's duration in minutes.country
: The country of origin.language
: The language spoken.imdb_score
: The IMDB score.imdb_votes
: The number of IMDB votes.metacritic_score
: The Metacritic score.
License & Contributions
This project is open-source under the MIT License. Feel free to contribute, fork, and adapt it to your own SQL movie database analysis needs!