Analyzing Trends in Movie Durations on Netflix

This project explores the intriguing possibility of changing movie durations on Netflix, leveraging my skills in exploratory data analysis to uncover trends in the entertainment industry.

Project Overview

Objectives:

To verify the trend of decreasing movie durations using exploratory data analysis techniques and identify factors that could be contributing to changes in movie lengths.

Methods:

  • Utilized Python for data manipulation and analysis, focusing on cleaning and structuring the data for effective analysis.
  • Conducted statistical analyses to understand the central tendencies and distributions within the dataset.
  • Applied visualization techniques to illustrate trends and patterns related to movie durations.

Results:

The analysis confirmed a noticeable trend in the shortening of movie lengths on Netflix. Several factors influencing this trend were identified, including genre shifts, production constraints, and changing viewer preferences.

Conclusion:

This project not only supported the initial hypothesis but also sharpened my analytical skills and provided insights into the dynamics of content duration within major streaming platforms. The findings have implications for content creators and marketers in the entertainment industry.

The Dataset:

netflix_data

Below is a snapshot of the dataset used for this analysis:

Column Description
show_id The ID of the show
type Type of show
title Title of the show
director Director of the show
cast Cast of the show
country Country of origin
date_added Date added to Netflix
release_year Year of Netflix release
duration Duration of the show in minutes
description Description of the show
genre Show genre

Python Code and Outputs:

1. Load the CSV file and store as netflix_df.

import pandas as pd

# Load the CSV file and store as netflix_df
netflix_df = pd.read_csv("netflix_data.csv")

# Display the first few rows of the dataframe
print(netflix_df.head())
                    
Output:
show_id type title director cast country date_added release_year duration description genre
s1 TV Show 3% null João Miguel, Bianca Comparato, Michel Gomes, Rodolfo Valente, Vaneza Oliveira, Rafael Lozano, Viviane Porto, Mel Fronckowiak, Sergio Mamberti, Zezé Motta, Celso Frateschi Brazil August 14, 2020 2020 4 In a future where the elite inhabit an island paradise far from the crowded slums, you get one chance to join the 3% saved from squalor. International TV
s2 Movie 7:19 Jorge Michel Grau Demián Bichir, Héctor Bonilla, Oscar Serrano, Azalia Ortiz, Octavio Michel, Carmen Beato Mexico December 23, 2016 2016 93 After a devastating earthquake hits Mexico City, trapped survivors from all walks of life wait to be rescued while trying desperately to stay alive. Dramas
s3 Movie 23:59 Gilbert Chan Tedd Chan, Stella Chung, Henley Hii, Lawrence Koh, Tommy Kuan, Josh Lai, Mark Lee, Susan Leong, Benjamin Lim Singapore December 20, 2018 2011 78 When an army recruit is found dead, his fellow soldiers are forced to confront a terrifying secret that's haunting their jungle island training camp. Horror Movies
s4 Movie 9 Shane Acker Elijah Wood, John C. Reilly, Jennifer Connelly, Christopher Plummer, Crispin Glover, Martin Landau, Fred Tatasciore, Alan Oppenheimer, Tom Kane United States November 16, 2017 2009 80 In a postapocalyptic world, rag-doll robots hide in fear from dangerous machines out to exterminate them, until a brave newcomer joins the group. Action
s5 Movie 21 Robert Luketic Jim Sturgess, Kevin Spacey, Kate Bosworth, Aaron Yoo, Liza Lapira, Jacob Pitts, Laurence Fishburne, Jack McGee, Josh Gad, Sam Golzari, Helen Carey, Jack Gilpin United States January 1, 2020 2008 123 A brilliant group of students become card-counting experts with the intent of swindling millions out of Las Vegas casinos by playing blackjack. Dramas
2. Filter the data to remove TV shows and store as netflix_subset.

# Filter the data to remove TV shows and store as netflix_subset
netflix_subset = netflix_df[netflix_df["type"] != 'TV Show']

# Display the first few rows of the subsetted dataframe
print(netflix_subset.head())
                    
Output:
show_id type title director cast country date_added release_year duration description genre
s2 Movie 7:19 Jorge Michel Grau Demián Bichir, Héctor Bonilla, Oscar Serrano, Azalia Ortiz, Octavio Michel, Carmen Beato Mexico December 23, 2016 2016 93 After a devastating earthquake hits Mexico City, trapped survivors from all walks of life wait to be rescued while trying desperately to stay alive. Dramas
s3 Movie 23:59 Gilbert Chan Tedd Chan, Stella Chung, Henley Hii, Lawrence Koh, Tommy Kuan, Josh Lai, Mark Lee, Susan Leong, Benjamin Lim Singapore December 20, 2018 2011 78 When an army recruit is found dead, his fellow soldiers are forced to confront a terrifying secret that's haunting their jungle island training camp. Horror Movies
s4 Movie 9 Shane Acker Elijah Wood, John C. Reilly, Jennifer Connelly, Christopher Plummer, Crispin Glover, Martin Landau, Fred Tatasciore, Alan Oppenheimer, Tom Kane United States November 16, 2017 2009 80 In a postapocalyptic world, rag-doll robots hide in fear from dangerous machines out to exterminate them, until a brave newcomer joins the group. Action
s5 Movie 21 Robert Luketic Jim Sturgess, Kevin Spacey, Kate Bosworth, Aaron Yoo, Liza Lapira, Jacob Pitts, Laurence Fishburne, Jack McGee, Josh Gad, Sam Golzari, Helen Carey, Jack Gilpin United States January 1, 2020 2008 123 A brilliant group of students become card-counting experts with the intent of swindling millions out of Las Vegas casinos by playing blackjack. Dramas
s7 Movie 122 Yasir Al Yasiri Amina Khalil, Ahmed Dawood, Tarek Lotfy, Ahmed El Fishawy, Mahmoud Hijazi, Jihane Khalil, Asmaa Galal, Tara Emad Egypt June 1, 2020 2019 95 After an awful accident, a couple admitted to a grisly hospital are separated and must find each other to escape — before death finds them. Horror Movies
3. Subset the Netflix movie data, keeping only the columns "title", "country", "genre", "release_year", "duration" into a new DataFrame netflix_movies.

# Subset the Netflix movie data, keeping only the columns "title", "country", "genre", "release_year", "duration"
netflix_movies = netflix_subset[['title', 'country', 'genre', 'release_year', 'duration']]

# Display the first few rows of the new dataframe
print(netflix_movies.head())
                    
Output:
title country genre release_year duration
7:19 Mexico Dramas 2016 93
23:59 Singapore Horror Movies 2011 78
9 United States Action 2009 80
21 United States Dramas 2008 123
122 Egypt Horror Movies 2019 95
4. Filter movies that are shorter than 1 hour and save as short_movies.

# Filter movies that are shorter than 1 hour and save as short_movies
short_movies = netflix_movies[netflix_movies['duration']<60]

# Display the first few rows of the filtered dataframe
print(short_movies)
                    
Output:
title country genre release_year duration
#Rucker50 United States Documentaries 2016 56
100 Things to do Before High School United States Uncategorized 2014 44
13TH: A Conversation with Oprah Winfrey & Ava DuVernay null Uncategorized 2017 37
3 Seconds Divorce Canada Documentaries 2018 53
A 3 Minute Hug Mexico Documentaries 2019 28
A Christmas Special: Miraculous: Tales of Ladybug & Cat Noir France Uncategorized 2016 22
A Family Reunion Christmas United States Uncategorized 2019 29
A Go! Go! Cory Carson Christmas United States Children 2020 22
A Go! Go! Cory Carson Halloween null Children 2020 22
A Go! Go! Cory Carson Summer Camp null Children 2020 21
5. Create a scatter plot for movie duration by release year using genre-based colors.

import matplotlib.pyplot as plt

# Create colors list based on genre
colors = []
for lab, row in netflix_movies.iterrows():
    if row['genre'] == "Children":
        colors.append('yellow')
    elif row['genre'] == "Documentaries":
        colors.append('brown')
    elif row['genre'] == "Stand-Up":
        colors.append('blue')
    else:
        colors.append('grey')
        
# Initialize a matplotlib figure object called fig and create a scatter plot
fig = plt.figure()
plt.scatter(netflix_movies['release_year'], netflix_movies['duration'], color = colors)
plt.xlabel('Release year'); plt.ylabel('Duration (min)'); plt.title('Movie Duration by Year of Release')
plt.show()
                    
Output:
Movie Duration by Year of Release
6. Answer the question "Are we certain that movies are getting shorter?"

Based on the scatter plot, it is not definitively clear that movies are consistently getting shorter over time. There are fluctuations in movie durations across different release years, suggesting that other factors, such as genre and production constraints, may play a significant role in determining movie lengths.