(Voluntary) Assignments

Assignment 1 (Network Analysis)

Introduction

Data: What do I get?

Background

Let the fun begin. You will analyze network datacollected from the managers of a high-tec company. This dataset, originating from the paper below, is widely used in research on organizational networks. Time to give it a shot as well. Krackhardt D. (1987). Cognitive social structures. Social Networks, 9, 104-134. The company manufactured high-tech equipment on the west coast of the United States and had just over 100 employees with 21 managers. Each manager was asked to whom do you go to for advice and who is your friend, to whom do you report was taken from company documents. Description

The dataset includes 4 files - 3xKrack-High-Tec and 1x High-Tec-Attributes. Krack-High-Tec includes the following three 21x3 text matrices:

Column 1 contains the ID of the ego (from where the edge starts), and column 2 the alter (to which the edge goes). Column 3 indicates the presence (=1) or absence (=0) of an edge.

High-Tec-Attributes includes one 21x4 valued matrix.

Tasks

1. Create a network

2. Analysis

Make a little analysis on:

A: Network level characteristics. Find the overal network level of:

… for the different networks. Describe and interpret the results. Answer the following questions:

B: Node level characteristics: Likewise, find out:

C: Relational Characteristics: Answer the following questions:

3. Visualization

Everything goes. Show us some pretty and informative plots. Choose what to plot, and how, on your own. Interpret the results and share some insights.

Example solution

Using the below data of tweets about the presidential debate in the US (autumn 2020) as well as the tweets of US Congress members by Party, we would like you to classify the debate-tweets into conservative vs. liberal. Using techniques learned in the course provide some insights about what most conservative the and most liberal posts are talking about…

library(jsonlite)
tmp <- tempfile()
download.file("https://github.com/SDS-AAU/SDS-master/raw/master/M2/data/pol_tweets.gz", tmp)

tweets_raw <- stream_in(gzfile(tmp, "pol_tweets"))

Example solution

Final Assignment NLP (graded)

Introduction

In our sessions, online courses and assignments you learned how to work with text in different contexts and on various levels of aggregation. These skills allow you to access and process a vast range of data. Now it’s time to get creative. We would like you to carry out an own analysis on a self-chosen topic (and on self-chosen data). This analysis should be interesting and informative and contain different elements natural language processing.

Task description

Data & Problem identification

In this exercise, you are asked to choose and obtain a dataset you consider interesting and appropriate for the tasks required. You are welcome to use existing datasets e.g. this repo you could also consider getting your own data (e.g. Twitter API, Instagram, news repositories etc.)

The data should be large enough and of proper granularity to be interesting for NLP techniques. If you are in doubt, please reach out.

What is expected from you:

Analysis pipeline

The analysis to be carried out by you has to contain elements of data manipulation, exploration, unsupervised and supervised ML as applied to language data.

Many of the steps are optional. So choose which methods you deem helpful and relevant to explore your chosen problem.

Note: Quality > Quantity. Consider which analysis, summarization, and visualization adds value. Excessive and unselective outputs (e.g. running 20 different models without providing a reason for, providing all possibilities of different plots without discussing and evaluating the insights gained from it) will not be considered helpful but rather distracting.

You are welcome to reuse code-elements from the course.

Delivery and Format

Please work in groups of around 3 members on the assignment.

You deliver a computational notebook as ipynb/rmd and HTML or PDF file by mail to Roman and Daniel. Deadline for delivery 31/3 - noon

If you want to host your project as a repo on GitHub, please do so (voluntary).

Let us know if you are interested in a (voluntary) seminar discussing your projects in April.