This project builds and visualizes a Knowledge Graph of movies, directors, and genres using the Kaggle Movies Dataset, processed in Python with pandas and visualized with networkx.
The project was developed and tested on Google Colab. https://colab.research.google.com/drive/1Kp0fB5VcTfnefknErd8omrzohUlwOZeC?usp=sharing
We use the following files from The Movies Dataset on Kaggle:
movies_metadata.csvcredits.csv
These files include metadata about movies, including:
- Movie titles
- Genres
- Directors (from crew information)
✅ Parses genres and directors from the dataset
✅ Builds triples:
- Movie → has_genre → Genre
- Director → directed → Movie
✅ Filters a small sample of movies with directors for better visualization
✅ Builds a directed knowledge graph
✅ Visualizes the graph with nodes colored by type:
- Movies (pink)
- Directors (purple)
- Genres (blue)
📁 Movie_Knowledge_Graph/
├── credits.csv
├── Example.png
├── Movie_Knowledge_Graph.ipynb
├── movies_metadata.csv
├── README.md
1️⃣ Download the two CSV files from Kaggle and save them to your computer.
2️⃣ Open Google Colab and upload:
movies_metadata.csvcredits.csv
3️⃣ Upload and run the Movie_Knowledge_Graph.ipynb file in Colab.
4️⃣ The notebook will:
- Process and clean the data
- Build the graph
- Visualize it as a plot
You’ll get a graph like this:
- pink nodes: movies
- purple nodes: directors
- Blue nodes: genres
Edges show the relationships (has_genre,directed).
- Google Colab (recommended) or Python 3.x
- Python packages:
- pandas
- matplotlib
- networkx
All packages are already available in Colab!
Pull requests are welcome!
Feel free to open an issue if you have ideas, questions, or improvements.
This project is open-source and free to use under the MIT License.
⭐ If you like it, please give the repo a ⭐ on GitHub!
