Machine learning using Turtle Games data

During the 3rd module of my Data Analytics career accelerator course with LSE I learned how to use Python to perform linear and multiple linear regression, decision tree and random forest regression and classification, k-means and hierarchical clustering using scikit-learn and statsmodels. I also learned NLP and sentiment analysis using nltk and vader. We were also introduced to R and I was impressed by the polished visuals produced by ggplot.

The module 3 assignment gave me the opportunity to put this into practice and also to deepen my knowledge of Matplotlib and Seaborn. I compared the use of multiple linear regression and decision tree regression to find a relationship between loyalty points accumulation, spending score and customer salary using data from a fictional games company, Turtle Games. Comparing the R squared, mean absolute error and RMS error showed that linear regression was the best choice for the data.

Results Results

I also used k-means clustering to find “customer personas” based on spending and remuneration and NLP to analyse reviews and explore the sentiment and common themes.

Results Results Results

It was a challenging but very interesting module. I am now working on the final project section of the course in my spare time, having recently started my first role as a data analyst at London Drainage Facilities!.

Jonathan Shields