Day 1 - NLP

Practical info

Place: DHØ 1.52 Time: 11:40

Schedule for the day

  • Session 1: Review Hate-speech classifier Notebook
  • Session 2: Work on NLP case in groups
  • Session 3 - 15:00-16:00: Guest Webinar - AI Implementations in SMEs (in collaboration with AI Denmark and AI 4 The People)

Context - Exercise: Presidential Debate 2020

Yes, we are going back in time to the Presidential Debate in the US 2020 - the time of lots of unhappy Tweeting. It’s just too good a dataset and case to let it go…

Data

  • Political tweets: https://github.com/SDS-AAU/SDS-master/raw/master/M2/data/pol_tweets.gz from https://github.com/alexlitel/congresstweets We’ve preprocessed a bit to make things easier. 1: Dems. 0: Rep.

  • Tweets around the time of the debate in oktober 20 (8000): https://github.com/SDS-AAU/SDS-master/raw/master/M2/data/pres_debate_2020.gz

Both datasets are in JSON format. Task: Build a classifier that can distinguish Dem/Rep tweets. Bonus: 1. Explore discussed topics; 2. find out what drives predictions.

In class Notebooks