Social Data Science 2021 > Applied Data Science and Machine Learning > Assignments > Assignment 2 - UML with Pokemon

Assignment 2 - UML with Pokemon

Description

This time you will work with Pokemon data. No data munging needed. Just old-school (U)ML.

Submission

Submission as PDF (notebook and output)

Submission: Monday 20.09.2021 23:59:00. Peergrade.io

Data

The data is available through the URL: https://sds-aau.github.io/SDS-master/00_data/pokemon.csv. It contains data on 800 Pokemon from the 1st to the 6th generation.

Tasks

Give a brief overview of data, what variables are there, how are the variables scaled and variation of the data columns.
Execute a PCA analysis on all numerical variables in the dataset. Hint: Don’t forget to scale them first. Use 4 components. What is the cumulative explained variance ratio? Hint: I am not sure this terminology and code was introduced during class, but try and look into cumulative explained variance and sklearn(package) and see if you can figure out the code needed.
Use a different dimensionality reduction method (eg. UMAP/NMF) – do the findings differ?
Perform a cluster analysis (KMeans) on all numerical variables (scaled & before PCA). Pick a realistic number of clusters (up to you where the large clusters remain mostly stable).
Visualize the first 2 principal components and color the datapoints by cluster.
Inspect the distribution of the variable Type1 across clusters. Does the algorithm separate the different types of pokemon?
Perform a cluster analysis on all numerical variables scaled and AFTER dimensionality reduction and visualize the first 2 principal components.
Again, inspect the distribution of the variable “Type 1” across clusters, does it differ from the distribution before dimensionality reduction?

Solutions

R team :::: HERE ::::
Py team :::: HERE :::: - Includes also some SML