COVID-19 Project Showcase

Summary:

This COVID-19 project demonstrates my data science expertise by leveraging SQL and Tableau for epidemiological analysis. I engineered SQL queries using Google BigQuery to transform raw health data into analytical frameworks, then created interactive Tableau dashboards for real-time pandemic monitoring. Key insights from January 2023 reveal that approximately 1% of infected global population succumbed to COVID-19, with Europe showing the highest infection and mortality rates. The project showcases my proficiency in data engineering, business intelligence, and translating complex health data into actionable epidemiological intelligence.

Author

Table of Contents

Problem

This epidemiological case study applies data science methodologies to COVID-19 infection and mortality data. The objective is to develop predictive insights into viral spread patterns and demographic trends to support evidence-based public health policy decisions. Raw datasets are transformed through SQL data engineering, then visualized using business intelligence tools to enable data-driven pandemic response strategies.

Data Source

https://ourworldindata.org/covid-deaths

Methods

  • Google BigQuery SQL was employed to engineer complex data pipelines and derive 4 analytical frameworks for epidemiological analysis
  • Tableau business intelligence platform was utilized to transform SQL outputs into interactive dashboards for stakeholder decision support

Data Cleaning

Table 1:

Wrote a query to calculate the death percentage on the global scale by summing up all the 'new cases' column and 'new deaths' column from the entire world and then calculating the percentage of deaths from the cases. I specified WHERE continent IS NOT NULL because every country belongs to a continent, So by specifying WHERE continent IS NOT NULL will ensure that only the rows for the countries that have a continent specified will be included in the final results. This will give accurate results for the death percentage on the global scale as it will not consider the rows where the continent is not specified.

--1. compare the overall deaths to cases ratio for the entire world
SELECT 
  SUM(new_cases) as total_cases,
  SUM(new_deaths) as total_deaths,
  (SUM(new_deaths)/SUM(new_cases))*100 AS death_percentage
FROM
  `sql-portfolio-project-375703.Portfolio_Project.Covid Deaths `
WHERE 
  continent IS NOT NULL
ORDER BY
  1,2
image

Table 2:

Wrote a query to pull the total death count for each continent while filtering out specific values in the 'location' column ('income'(upper, middle, lower class), 'World', 'European Union' and 'International') that is displayed in the 'location' column. This query gives the total death count for continents and makes sure that the rows with 'income'(upper, middle, lower class), 'World', 'European Union' and 'International' are not considered in the final results.

--2. Pull the 'total_death_count' for each continent in the location column while filtering out the income (upper, middle, lower class), World, European Union, and International rows that's displayed in the location column
SELECT 
 location,
 SUM(new_deaths) AS total_death_count
FROM 
 `sql-portfolio-project-375703.Portfolio_Project.Covid Deaths `
WHERE
 continent IS NULL AND 
 location NOT LIKE '%income%' AND 
 location NOT LIKE '%World%' AND
 location NOT LIKE '%European Union%' AND
 location NOT LIKE '%International%' 
GROUP BY
 location
ORDER BY 
 total_death_count DESC
image

Table 3:

Wrote a query to pull the data for the countries with the highest infection rate per population. I then computed the infection percent per population by dividing the total cases by population and multiplying the result by 100. It filters the rows where the location contains 'income' and groups the results by location and population and orders the results by the highest infection percent per population.

--3. Countries with highest infection rate per population
SELECT
 location,
 population,
 MAX(total_cases) AS highest_infection_count,
 MAX((total_cases/population)) * 100 AS infection_percent_per_population
FROM
 `sql-portfolio-project-375703.Portfolio_Project.Covid Deaths `
WHERE
 location NOT LIKE '%income%'
GROUP BY
 location,
 population
ORDER BY
 infection_percent_per_population DESC
image

Table 4:

This last query is similiar to the previous one but it also includes the 'date' column in the SELECT statement and in the GROUP BY clause. It is pulling the data for the countries with the highest infection rate per population grouped by date. It is computing the infection percent per population by dividing the total cases by population and multiplying the result by 100. It filters the rows where the location contains 'income' and groups the results by location, population, and date and orders the results by the highest infection percent per population.

--4. Countries with highest infection rate per population grouped by date 
SELECT
 location,
 population,
 date,
 MAX(total_cases) AS highest_infection_count,
 MAX((total_cases/population)) * 100 AS infection_percent_per_population
FROM
 `sql-portfolio-project-375703.Portfolio_Project.Covid Deaths `
WHERE
 location NOT LIKE '%income%'
GROUP BY
 location,
 population,
 date
ORDER BY
 infection_percent_per_population DESC
image

Data Visualization

Table 1:

image

Table 2:

image

Table 3:

image

Table 4

image

Conclusion

Critical epidemiological insights from this data science analysis (as of January 2023) reveal that approximately 1% of the global population infected with COVID-19 succumbed to the virus, with Europe demonstrating the highest infection rates and mortality burden worldwide. These findings enable evidence-based decision-making for public health policymakers and healthcare systems. The analytical framework remains scalable for ongoing pandemic monitoring and response optimization, demonstrating the power of data science in transforming health data into life-saving intelligence.

Solutions and Next Steps

Based on the data findings vizualized by the Tableau Dashboard, here are 5 solutions that can be proposed to address the COVID-19 pandemic:

  1. Increased Vaccination Efforts in Europe: Europe has the highest death count among the continents, and many countries in Europe have high infection rates. By increasing vaccination efforts in Europe, we can protect the population and potentially reduce the death count.
  2. Targeted Testing and Contact Tracing in High-Risk Countries: Cyprus and South Korea have some of the highest infection rates in the world. By targeting testing and contact tracing efforts in these countries, we can identify and isolate infected individuals more effectively, slowing the spread of the virus.
  3. Hospital preparedness: Since Europe has high death rate, it's important to ensure that hospitals and health care systems in Europe are prepared to handle the influx of patients. This includes having enough beds, oxygen, and other resources to provide adequate care for those who are critically ill.
  4. Public awareness campaigns : Public awareness campaigns can be targeted in Europe, Cyprus and South Korea to educate people about the importance of vaccination, testing, and contact tracing. This can help increase compliance with public health guidelines and slow the spread of the virus.
  5. Remote work: Encourage remote work in Europe, Cyprus and South Korea, to reduce the spread of the virus.

Project Link

Related Projects

Explore more data science projects demonstrating end-to-end analytical workflows and advanced visualization techniques.

COVID-19 Dashboard

Public Health Dashboard: COVID-19 Data Visualization

Interactive business intelligence platform built with Tableau for pandemic monitoring and epidemiological pattern analysis.

View Project
BCG Logo

BCG Customer Churn Analysis

Complete end-to-end data science project developing customer churn prediction models with advanced feature engineering and Random Forest optimization.

View Project