Working with unstructured data

intro image here
Unsplash, Franki Chamaki

This is the repository website for the 2021 course in Network Analysis and NLP at the UNISTRA DS programme. On this page you will find tutorial videos, assignments and links to Q&A meetings. The page is updated throughout the course.

Card image
Part 1: Network Analysis
Contents
  • Network thinking
  • Types of Networks & Sources of Network Data
  • Network Visualization
  • Network-based indicators
  • Community Detection
  • Networks and SML

Ecosystem

R

  • Tidyverse: General datascience ecosystem for R
  • Igraph: Main graph analytics package in R
  • Tidygraph: Igraph wrapper for tidy network analytics workflows
  • Ggrph: ggplot2 extension for network visualization

Python

  • NetworkX - standard library for NA in Python
  • pyvis - useful visualisation package
Card image
Part 2: Natural Language Processing
Contents
  • Simple frequency based approaches
  • Co-occurrence and link to networks
  • Topic Modelling
  • Vectorisation and embeddings
  • NLP and SML

Ecosystem

R

  • tidytext - Main NLP and textmining ecosystem for R.
  • Quanteda - Further NLP functionalities in this popular package in R.

Python

  • NLTK - Standard library for all traditional NLP work in Python
  • SpaCy - modern deep learning based library for many NLP tasks
  • Gensim - library for topic modelling and vectorisation
  • Textblob - simple library wrapping NLTK … but simpler

Theme: workshop-template-b by evanwill is built using Jekyll on GitHub Pages. The site is styled using Bootstrap with FontAwesome icons.

Content: CC BY-SA Daniel Hain & Roman Jurowetzki 2021 (get source code). Creative Commons License