HOME
PROJECTS
In this project I practiced SQL for data cleaning and initial exploration using SQL Server Management Studio.
Starting with an untidy dataset, I gained an overview of the table structure; identified column definitions, data types, and constraints;
checked for missing and erroneous data; and produced summary statistics such as ranges and measures of central tendency for continuous variables
and category counts for categorical variables.
I cleaned the dataset by converting dates, splitting text fields, removing duplicates, and recoding categorical variables.
Finally, I produced a view containing a subset of records which may be more useful in certain cases such as predictive modelling.
I used intermediate to advanced SQL concepts such as subqueries, self joins, window functions, common table expressions, and views.
Data and code are available on GitHub .