Educational Data Science: NYC School Performance Analysis

This analytical framework applies data science methodologies to evaluate NYC public school SAT performance data, using Python-based analysis to identify excellence patterns and provide evidence-based insights for educational stakeholders and policy decision-making.

Project Overview

Problem/Goal:

To apply data science techniques to NYC public school SAT performance data, developing analytical frameworks that identify performance patterns and support evidence-based educational policy and resource allocation decisions.

Data Source:

schools_data

The analytical dataset contains comprehensive SAT performance metrics across NYC public schools, including subject-specific scores and geographic classifications enabling data-driven educational insights.

Key Analytical Insights:

  • Performance Excellence Identification: Data-driven identification of high-achieving educational institutions using quantitative performance metrics.
  • Comprehensive Ranking Framework: Statistical ranking methodology evaluating overall academic performance across multiple assessment dimensions.
  • Geographic Performance Distribution Analysis: Statistical analysis of performance variability across urban educational districts to inform resource allocation strategies.

Conclusion:

This data science framework demonstrates the application of analytical methodologies to educational assessment data, providing evidence-based insights that support strategic decision-making in public education policy, resource optimization, and academic program development.

Python Code and Outputs:

1. Which NYC schools have the best math results?


import pandas as pd

# Importing the data
schools = pd.read_csv("schools.csv")

# Top Schools for Math Performance
best_math_schools = schools[["school_name", "average_math"]]
best_math_schools = best_math_schools[best_math_schools["average_math"] >= 800 * 0.8].sort_values("average_math", ascending=False)
print(best_math_schools.head())
                    
Output:
School Name Average Math Score
Stuyvesant High School 754
Bronx High School of Science 714
Staten Island Technical High School 711
Queens High School for the Sciences at York College 701
High School for Mathematics, Science, and Engineering at City College 683

2. What are the top 10 performing schools based on the combined SAT scores?


# Creating a column for total SAT scores
schools["total_SAT"] = schools["average_math"] + schools["average_writing"] + schools["average_reading"]
top_10_schools = schools[["school_name", "total_SAT"]].sort_values("total_SAT", ascending=False).head(10)
print(top_10_schools)
                    
Output:
School Name Total SAT Score
Stuyvesant High School 2144
Bronx High School of Science 2041
Staten Island Technical High School 2041
High School of American Studies at Lehman College 2013
Townsend Harris High School 1981
Queens High School for the Sciences at York College 1947
Bard High School Early College 1914
Brooklyn Technical High School 1896
Eleanor Roosevelt High School 1889
High School for Mathematics, Science, and Engineering at City College 1889

3. Which single borough has the largest standard deviation in the combined SAT score?


# Calculating the standard deviation for each borough
largest_std_dev = schools.groupby("borough")["total_SAT"].agg(["count", "mean", "std"]).round(2).rename(columns={"count": "num_schools", "mean": "average_SAT", "std": "std_SAT"}).sort_values("std_SAT", ascending=False).head(1)
largest_std_dev.reset_index(inplace=True)
print(largest_std_dev)
                    
Output:
Borough Number of Schools Average SAT Score Standard Deviation of SAT Scores
Manhattan 89 1340.13 230.29

Related Projects

Explore more data science projects demonstrating end-to-end analytical workflows and advanced visualization techniques.

BCG Logo

BCG Customer Churn Analysis

Complete end-to-end data science project developing customer churn prediction models with advanced feature engineering and Random Forest optimization.

View Project
COVID-19 Dashboard

Public Health Dashboard: COVID-19 Data Visualization

Interactive business intelligence platform built with Tableau for pandemic monitoring and epidemiological pattern analysis.

View Project