Day 1

Practical info

Place: DHØ 1.23 Time: 8:15 (we start 15 min later) - 13:20

Schedule for the day

TimeActivityData
Session 18:15-9:45EDA and DataVizOpen Policing
Session 210:00-11:30EDA and UMLDigital Nomads
Session 312:00-13:20UML continued.Digital Nomads

Datasets & Context

Open policing project

The Stanford Open Policing Dataset records trafic stops by US police including data on the vehicle, driver, violation, outcome and many more variables. It has been used in research to investigate racial bias and other issues. I

Digital Nomad Dataset

In this workshop we are going to explore the nomad data using unsupervised ML techniques. This time we will focus on the city data, whichn you have not sen so far.

We used the data in a research project some years ago and you can check out a conference presentation below:

Suggested Workflow

You will find the data for today’s session here: https://sds-aau.github.io/SDS-master/M1/data/cities.csv

  • Stratup
    • Load up and explore the data (a bit)
    • Clean up if you thing you neeed to
    • Select the nummerical variables to be used for UML
    • Preprocess the data for UML
  • Dimensionality reduction
    • Use PCA for dimensionality reduction
    • Explore variable loading
    • Plot 1. vs 2. component
    • If you want, use another algorithm for the same steps and compare results
  • Clustering
    • Perform a clustering on the (reduced?) data
    • Plot the clusters into the above visualization
    • Explore the results (clusters vs components / clusters vs aggregated variables of interest)