BCG Customer Churn Analysis
Complete end-to-end data science project developing customer churn prediction models with advanced feature engineering and Random Forest optimization.
View ProjectThis data science initiative applies analytical methodologies to investigate Netflix's content evolution, using statistical analysis and data visualization to understand temporal patterns and strategic shifts in the streaming entertainment industry.
To apply data science methodologies to analyze Netflix content evolution, using statistical analysis and temporal trend investigation to understand content strategy patterns and their implications for the entertainment industry.
The data science investigation revealed temporal patterns in Netflix content duration, with statistical analysis identifying correlations between content length, genre categories, and production timelines that inform strategic content decisions.
This analytical framework demonstrates the application of data science methodologies to entertainment industry challenges, providing actionable insights for content strategy optimization and stakeholder decision-making in the streaming media landscape.
Below is a snapshot of the dataset used for this analysis:
| Column | Description |
|---|---|
| show_id | The ID of the show |
| type | Type of show |
| title | Title of the show |
| director | Director of the show |
| cast | Cast of the show |
| country | Country of origin |
| date_added | Date added to Netflix |
| release_year | Year of Netflix release |
| duration | Duration of the show in minutes |
| description | Description of the show |
| genre | Show genre |
import pandas as pd
# Load the CSV file and store as netflix_df
netflix_df = pd.read_csv("netflix_data.csv")
# Display the first few rows of the dataframe
print(netflix_df.head())
| show_id | type | title | director | cast | country | date_added | release_year | duration | description | genre |
|---|---|---|---|---|---|---|---|---|---|---|
| s1 | TV Show | 3% | null | João Miguel, Bianca Comparato, Michel Gomes, Rodolfo Valente, Vaneza Oliveira, Rafael Lozano, Viviane Porto, Mel Fronckowiak, Sergio Mamberti, Zezé Motta, Celso Frateschi | Brazil | August 14, 2020 | 2020 | 4 | In a future where the elite inhabit an island paradise far from the crowded slums, you get one chance to join the 3% saved from squalor. | International TV |
| s2 | Movie | 7:19 | Jorge Michel Grau | Demián Bichir, Héctor Bonilla, Oscar Serrano, Azalia Ortiz, Octavio Michel, Carmen Beato | Mexico | December 23, 2016 | 2016 | 93 | After a devastating earthquake hits Mexico City, trapped survivors from all walks of life wait to be rescued while trying desperately to stay alive. | Dramas |
| s3 | Movie | 23:59 | Gilbert Chan | Tedd Chan, Stella Chung, Henley Hii, Lawrence Koh, Tommy Kuan, Josh Lai, Mark Lee, Susan Leong, Benjamin Lim | Singapore | December 20, 2018 | 2011 | 78 | When an army recruit is found dead, his fellow soldiers are forced to confront a terrifying secret that's haunting their jungle island training camp. | Horror Movies |
| s4 | Movie | 9 | Shane Acker | Elijah Wood, John C. Reilly, Jennifer Connelly, Christopher Plummer, Crispin Glover, Martin Landau, Fred Tatasciore, Alan Oppenheimer, Tom Kane | United States | November 16, 2017 | 2009 | 80 | In a postapocalyptic world, rag-doll robots hide in fear from dangerous machines out to exterminate them, until a brave newcomer joins the group. | Action |
| s5 | Movie | 21 | Robert Luketic | Jim Sturgess, Kevin Spacey, Kate Bosworth, Aaron Yoo, Liza Lapira, Jacob Pitts, Laurence Fishburne, Jack McGee, Josh Gad, Sam Golzari, Helen Carey, Jack Gilpin | United States | January 1, 2020 | 2008 | 123 | A brilliant group of students become card-counting experts with the intent of swindling millions out of Las Vegas casinos by playing blackjack. | Dramas |
# Filter the data to remove TV shows and store as netflix_subset
netflix_subset = netflix_df[netflix_df["type"] != 'TV Show']
# Display the first few rows of the subsetted dataframe
print(netflix_subset.head())
| show_id | type | title | director | cast | country | date_added | release_year | duration | description | genre |
|---|---|---|---|---|---|---|---|---|---|---|
| s2 | Movie | 7:19 | Jorge Michel Grau | Demián Bichir, Héctor Bonilla, Oscar Serrano, Azalia Ortiz, Octavio Michel, Carmen Beato | Mexico | December 23, 2016 | 2016 | 93 | After a devastating earthquake hits Mexico City, trapped survivors from all walks of life wait to be rescued while trying desperately to stay alive. | Dramas |
| s3 | Movie | 23:59 | Gilbert Chan | Tedd Chan, Stella Chung, Henley Hii, Lawrence Koh, Tommy Kuan, Josh Lai, Mark Lee, Susan Leong, Benjamin Lim | Singapore | December 20, 2018 | 2011 | 78 | When an army recruit is found dead, his fellow soldiers are forced to confront a terrifying secret that's haunting their jungle island training camp. | Horror Movies |
| s4 | Movie | 9 | Shane Acker | Elijah Wood, John C. Reilly, Jennifer Connelly, Christopher Plummer, Crispin Glover, Martin Landau, Fred Tatasciore, Alan Oppenheimer, Tom Kane | United States | November 16, 2017 | 2009 | 80 | In a postapocalyptic world, rag-doll robots hide in fear from dangerous machines out to exterminate them, until a brave newcomer joins the group. | Action |
| s5 | Movie | 21 | Robert Luketic | Jim Sturgess, Kevin Spacey, Kate Bosworth, Aaron Yoo, Liza Lapira, Jacob Pitts, Laurence Fishburne, Jack McGee, Josh Gad, Sam Golzari, Helen Carey, Jack Gilpin | United States | January 1, 2020 | 2008 | 123 | A brilliant group of students become card-counting experts with the intent of swindling millions out of Las Vegas casinos by playing blackjack. | Dramas |
| s7 | Movie | 122 | Yasir Al Yasiri | Amina Khalil, Ahmed Dawood, Tarek Lotfy, Ahmed El Fishawy, Mahmoud Hijazi, Jihane Khalil, Asmaa Galal, Tara Emad | Egypt | June 1, 2020 | 2019 | 95 | After an awful accident, a couple admitted to a grisly hospital are separated and must find each other to escape — before death finds them. | Horror Movies |
# Subset the Netflix movie data, keeping only the columns "title", "country", "genre", "release_year", "duration"
netflix_movies = netflix_subset[['title', 'country', 'genre', 'release_year', 'duration']]
# Display the first few rows of the new dataframe
print(netflix_movies.head())
| title | country | genre | release_year | duration |
|---|---|---|---|---|
| 7:19 | Mexico | Dramas | 2016 | 93 |
| 23:59 | Singapore | Horror Movies | 2011 | 78 |
| 9 | United States | Action | 2009 | 80 |
| 21 | United States | Dramas | 2008 | 123 |
| 122 | Egypt | Horror Movies | 2019 | 95 |
# Filter movies that are shorter than 1 hour and save as short_movies
short_movies = netflix_movies[netflix_movies['duration']<60]
# Display the first few rows of the filtered dataframe
print(short_movies)
| title | country | genre | release_year | duration |
|---|---|---|---|---|
| #Rucker50 | United States | Documentaries | 2016 | 56 |
| 100 Things to do Before High School | United States | Uncategorized | 2014 | 44 |
| 13TH: A Conversation with Oprah Winfrey & Ava DuVernay | null | Uncategorized | 2017 | 37 |
| 3 Seconds Divorce | Canada | Documentaries | 2018 | 53 |
| A 3 Minute Hug | Mexico | Documentaries | 2019 | 28 |
| A Christmas Special: Miraculous: Tales of Ladybug & Cat Noir | France | Uncategorized | 2016 | 22 |
| A Family Reunion Christmas | United States | Uncategorized | 2019 | 29 |
| A Go! Go! Cory Carson Christmas | United States | Children | 2020 | 22 |
| A Go! Go! Cory Carson Halloween | null | Children | 2020 | 22 |
| A Go! Go! Cory Carson Summer Camp | null | Children | 2020 | 21 |
import matplotlib.pyplot as plt
# Create colors list based on genre
colors = []
for lab, row in netflix_movies.iterrows():
if row['genre'] == "Children":
colors.append('yellow')
elif row['genre'] == "Documentaries":
colors.append('brown')
elif row['genre'] == "Stand-Up":
colors.append('blue')
else:
colors.append('grey')
# Initialize a matplotlib figure object called fig and create a scatter plot
fig = plt.figure()
plt.scatter(netflix_movies['release_year'], netflix_movies['duration'], color = colors)
plt.xlabel('Release year'); plt.ylabel('Duration (min)'); plt.title('Movie Duration by Year of Release')
plt.show()
Based on the scatter plot, it is not definitively clear that movies are consistently getting shorter over time. There are fluctuations in movie durations across different release years, suggesting that other factors, such as genre and production constraints, may play a significant role in determining movie lengths.
Explore more data science projects demonstrating end-to-end analytical workflows and advanced visualization techniques.
Complete end-to-end data science project developing customer churn prediction models with advanced feature engineering and Random Forest optimization.
View Project
Educational analytics using Python and Pandas to analyze SAT performance data and identify top-performing schools across NYC.
View Project