Workshop 2

In this workshop we are going to explore the nomad data (cities) using unsupervised ML techniques. We used the data in a research project some years ago and you can check out a conference presentation below:

Things to do!

You will find the data for today’s session here: https://sds-aau.github.io/SDS-master/M1/data/cities.csv Use the template from the Penguin analysis with UML and build a notebook that covers following steps:

  • Stratup
    • Load up and explore the data (a bit)
    • Clean up if you thing you neeed to
    • Select the nummerical variables to be used for UML
    • Preprocess the data for UML
  • Dimensionality reduction
    • Use PCA for dimensionality reduction
    • Explore variable loading
    • Plot 1. vs 2. component
    • If you want, use another algorithm for the same steps and compare results
  • Clustering
    • Perform a clustering on the reduced data
    • Plot the clusters into the above visualization
    • Explore the results (clusters vs components / clusters vs aggregated variables of interest)

Schedule for the workshop

TimeActivity
9:10-9:45Work in groups on Dimensionality reduction
10:00-10:15Explore issues and discuss
10:15-11:00Work in groups on Clustering questions
11:10-11:45Discuss solutions and explore alternative analyses split
11:45-12:00Hand out Peergrade assignment

In class Notebooks

Videos

Assignment 2 handout

R: Recording

coming soon

Python group recoding