April 2023 | Explore and Clean Data with SQL

In this project I practiced SQL for data cleaning and initial exploration using SQL Server Management Studio. Starting with an untidy dataset, I gained an overview of the table structure; identified column definitions, data types, and constraints; checked for missing and erroneous data; and produced summary statistics such as ranges and measures of central tendency for continuous variables and category counts for categorical variables. I cleaned the dataset by converting dates, splitting text fields, removing duplicates, and recoding categorical variables. Finally, I produced a view containing a subset of records which may be more useful in certain cases such as predictive modelling. I used intermediate to advanced SQL concepts such as subqueries, self joins, window functions, common table expressions, and views.

Data and code are available on GitHub .