BCG Customer Churn Analysis
Complete end-to-end data science project developing customer churn prediction models with advanced feature engineering and Random Forest optimization.
View ProjectThis analytical framework applies data science methodologies to evaluate NYC public school SAT performance data, using Python-based analysis to identify excellence patterns and provide evidence-based insights for educational stakeholders and policy decision-making.
To apply data science techniques to NYC public school SAT performance data, developing analytical frameworks that identify performance patterns and support evidence-based educational policy and resource allocation decisions.
The analytical dataset contains comprehensive SAT performance metrics across NYC public schools, including subject-specific scores and geographic classifications enabling data-driven educational insights.
This data science framework demonstrates the application of analytical methodologies to educational assessment data, providing evidence-based insights that support strategic decision-making in public education policy, resource optimization, and academic program development.
import pandas as pd
# Importing the data
schools = pd.read_csv("schools.csv")
# Top Schools for Math Performance
best_math_schools = schools[["school_name", "average_math"]]
best_math_schools = best_math_schools[best_math_schools["average_math"] >= 800 * 0.8].sort_values("average_math", ascending=False)
print(best_math_schools.head())
| School Name | Average Math Score |
|---|---|
| Stuyvesant High School | 754 |
| Bronx High School of Science | 714 |
| Staten Island Technical High School | 711 |
| Queens High School for the Sciences at York College | 701 |
| High School for Mathematics, Science, and Engineering at City College | 683 |
# Creating a column for total SAT scores
schools["total_SAT"] = schools["average_math"] + schools["average_writing"] + schools["average_reading"]
top_10_schools = schools[["school_name", "total_SAT"]].sort_values("total_SAT", ascending=False).head(10)
print(top_10_schools)
| School Name | Total SAT Score |
|---|---|
| Stuyvesant High School | 2144 |
| Bronx High School of Science | 2041 |
| Staten Island Technical High School | 2041 |
| High School of American Studies at Lehman College | 2013 |
| Townsend Harris High School | 1981 |
| Queens High School for the Sciences at York College | 1947 |
| Bard High School Early College | 1914 |
| Brooklyn Technical High School | 1896 |
| Eleanor Roosevelt High School | 1889 |
| High School for Mathematics, Science, and Engineering at City College | 1889 |
# Calculating the standard deviation for each borough
largest_std_dev = schools.groupby("borough")["total_SAT"].agg(["count", "mean", "std"]).round(2).rename(columns={"count": "num_schools", "mean": "average_SAT", "std": "std_SAT"}).sort_values("std_SAT", ascending=False).head(1)
largest_std_dev.reset_index(inplace=True)
print(largest_std_dev)
| Borough | Number of Schools | Average SAT Score | Standard Deviation of SAT Scores |
|---|---|---|---|
| Manhattan | 89 | 1340.13 | 230.29 |
Explore more data science projects demonstrating end-to-end analytical workflows and advanced visualization techniques.
Complete end-to-end data science project developing customer churn prediction models with advanced feature engineering and Random Forest optimization.
View Project
Interactive business intelligence platform built with Tableau for pandemic monitoring and epidemiological pattern analysis.
View Project