Introduction to Network Analysis

Updated October 08, 2023

This session

Welcome to your first part of the introduction to network analysis. In this session you will learn:

Why applying network analysis is helpful to answer certain questions.
Why framing certain contexts as networks gives new insights.
The basic structure of relational data.
How to construct graph objects from different datasources.
How to analyse basic features of nodes, edges, and graphs.
How to identify groups and communities in graphs.
How to do simple network visualizations.

Introduction

Networks… so what?

So, before we talk about networks, one thing upfront… why should we? I mean, they undeniably look pretty, don’t they?
Somehow, the visualization of networks fascinates the human mind (find a short TED talk on networks and how they depict our world here).
They have even inspired an own art movement, networkism (see some examples here).

The basic jargon

The elements of a a network or graph are commonly referred to as nodes (system theory jargon) or vertices (graph theory jargon) of a graph.
The connections are edges or links.

Types of networks

Networks are a form of representing relational data.
This is a very general tool that can be applied to many different types of relationships between all kind of elements.
The content, meaning, and interpretation for sure depends on what elements we display, and which types of relationships. For example:
In Social Network Analysis:
- Nodes represent actors (which can be persons, firms and other socially constructed entities)
- Edges represent relationships between this actors (friendship, interaction, co-affiliation, similarity ect.)
Other types of network
- Chemistry: Interaction between molecules
- Computer Science: The world-wide-web, inter- and intranet topologies
- Biology: Food-web, ant-hives

The possibilities to depict relational data are manifold. For example:

Relations among persons
- Kinship: mother of, wife of…
- Other role based: boss of, supervisor of…
- Affective: likes, trusts…
- Interaction: give advice, talks to, retweets…
- Affiliation: belong to same clubs, shares same interests…
Relations among organizations
- As corporate entities, joint ventures, strategic alliances
- Buy from / sell to, leases to, outsources to
- Owns shares of, subsidiary of
- Via their members (Personnel flows, friendship…)

Relational data-structures

Edgelist

Common form of storing real-life relational data (eg. in relational databases)
An edgelist is a dataframe that contains a minimum of two columns, one of nodes that are the source of a connection and another that are the target of the connection.
The nodes in the data are typically identified by unique IDs.
If the distinction is not meaningful, the network is undirected (more on that later).
If the distinction between source and target is meaningful, the network is directed.
Can also contain additional columns that describe attributes of the edges such as a magnitude aspect for an edge, meaning the graph is weighted (e.g., number of interactions, strenght of friendship).

from	to
1	2
2	3
2	4
1	5
4	1
5	2
3	1

Adjacency Matrix

Represented as a \(n*n\) matrix, where \(n\) stands for the number of elements of which their relationships should be represented. T
The value in the cell that intercepts row \(n\) and column \(m\) indicates if an edge is present (=1) or absent (=0).
Tip: Given an edgelist, an adjacency matrix can easily be produced by crosstabulating.

1	2	3	4	5
0	1	0	0	1
0	0	1	1	0
1	0	0	0	0
1	0	0	0	0
0	1	0	0	0

Nodelists

Edgelists as well as adjacency matrices only stores connectivity pattern between nodes, but due to their structure cannot store informations on the nodes in which we might be interested.
Therefore, we in many cases also provide a a node list with these informations (such as the names of the nodes or any kind of groupings).

id	name	gender	group
1	Jesper	M	A
2	Pernille	F	B
3	Jacob	M	B
4	Dorte	F	A
5	Donald	M	C

Graph Objects

Tabular data
- In tabular data, summary statistics of variables are between observations (column-wise) interdependent, meaning changing a value of some observation will change the corresponding variables summary statistics.
- Likewise, variable values might be within observation interdependent (row-wise), meaning changing a variable value might change summary statistics of the observation
- Otherwise, values are (at least mathematically) independent.
Graph data
- Same holds true, but adittional interdependencies due to the relational structure of the data.
- Sepperation between node and edge data, which is interdependent. Removing a node might alos impy the removal of edges, removal of edges changes the characteristics of nodes
- In adittion, the relational structure makes that not only true for adjacent nodes and edges, but potentially multiple. Adding/Removing one node/edge could change the characteristics of every single other node/edge.
- That is less of a problem for local network characteristics (eg., a node’s degree on level 1). However, many node and edge characteristics such
- That’s mainly why graph computing is slightly more messy, and need own mathematical tools, and applications from graphical computing (graphical like graph, not like figure)

Network / Graph concepts & terminology

Before we start creating graphs, lets fix some more terminology related to graphs. Some of them might sound unintuitive by now, but we will come back to that later.

The vertices u and v are called the end vertices of the edge (u,v)
If two edges have the same end vertices they are Parallel
An edge of the form (v,v) is a loop
A Graph is simple if it has no parallel edges and loops
A Graph is said to be Empty if it has no edges. Meaning E is empty
A Graph is a Null Graph if it has no vertices. Meaning V and E is empty
Edges are Adjacent if they have a common vertex. Vertices are Adjacent if they have a common edge
The degree of the vertex v, written as d(v), is the number of edges with v as an end vertex. By convention, we count a loop twice and parallel edges contribute separately
Isolated Vertices are vertices with degree 1.
A Graph is Complete if its edge set contains every possible edge between ALL of the vertices
A Walk in a GraphG = (V,E) is a finite, alternating sequence of the form ViEiViEi consisting of vertices and edges of the graph G
A Walk is Open if the initial and final vertices are different. A Walk is Closed if the initial and final vertices are the same
A Walk is a Path if ANY vertex is appear atmost once (Except for a closed walk)

Network analysis and measures

Node-Level measures

Often, we are interested in ways to summarize the pattern of node connectivity to infer something on their characteristics.

Centralities

One of the simplest concepts when computing node level measures is that of centrality, i.e. how central is a node or edge in the graph.
As this definition is inherently vague, a lot of different centrality scores exists that all treat the concept of “central” a bit different.

We in the following well briefly illustrate the idea behind three of the most popular centrality measures, namely:

Degree centrality
Eigenvector centrality
Betweenness centrality

Degree centrality

The degree centrality is probably the most intuitive node measure, which basically just counts the number of edges adjacent to a node.
Formally, degree of node \(i\) is the n. of existing edges \(e_{ij}\) with other nodes \(j\) in a network with \(n\) nodes:

\[c^{dgr}_{j} =\sum\limits_{j=1}^{n} e_{ij} ~ where: ~ i \neq j\]

Eigenvector centrality

Eigenvector centrality takes idea of characterizing nodes by their importance in a network further.
The basic idea is to weight a node’s degree centrality by the centrality of the nodes adjacent to it (and their centrality in turn by their centrality).
The eigenvector here is just a clever mathematical trick to solve such a recurrent problem.

\[c^{ev}_{j}={\frac {1}{\lambda }}\sum _{t\in M(i)}x_{t}={\frac {1}{\lambda }}\sum _{t\in G}a_{i,t}x_{t}\]

Betweenness centrality

The betweenness centrality of a node measures the extent to which it lies on short paths.
A higher betweenness indicates that it lies on more short paths and hence should somehow be important for traversing between different parts of a network
How many pairs of individuals would have to go through you in order to reach one another in the minimum number of hops?

\[c^{btw}_{j} = \sum_{s,t \in G} \frac{ \Psi_{s,t}(i) }{\Psi_{s,t}}\]

where vertices \(s,t,i\) are all different from each other

\(\Psi_{s,t}\) denotes the number of shortest paths (geodesics) between vertices \(s\) and \(t\)
\(\Psi_{s,t}(i)\) denotes the number of shortest paths (geodesics) between vertices \(s\) and \(t\) that pass through vertex \(i\).
The geodesic betweenness \(B_n\) of a network is the mean of \(B_n(i)\) over all vertices \(i\)

Neighborhood of a Node

Lastly, we can look at the surrounding of a node, meaning the ones it is connected to, its neighborhood. Here, we can look at the ego-network of a node. That means how many nodes are in a certain geodesic distance. Plainly speaking, how many nodes are not more than x-steps away.

Clustering (Community detection)

Another common operation is to group nodes based on the graph topology, sometimes referred to as community detection based on its commonality in social network analysis.
The main logic: Form groups which have a maximum within-connectivity and a minimum between-connectivity.
Consequently, nodes in the same community should have a higher probability of being connected than nodes from different communities.

Example: The Louvain Algorithm

I will illustrate the idea using the Louvain Method, one of the most widely used community detection algorithms.
It optimises a quantity called modularity:

\[ \sum_{ij} (A_{ij} - \lambda P_{ij}) \delta(c_i,c_j) \]

\(A\) - The adjacency matrix

\(P_{ij}\) - The expected connection between \(i\) and \(j\).

\(\lambda\) - Resolution parameter

Can use lots of different forms for \(P_{ij}\) but the standard one is the so called configuration model:

\(P_{ij} = \frac{k_i k_j}{2m}\)

Loosely speaking, in an iterative process it

You take a node and try to aggregate it to one of its neighbours.
You choose the neighbour that maximizes a modularity function.
Once you iterate through all the nodes, you will have merged few nodes together and formed some communities.
This becomes the new input for the algorithm that will treat each community as a node and try to merge them together to create bigger communities.
The algorithm stops when it’s not possible to improve modularity any more.

This is the original paper, for those interested in further reads:

Blondel, Vincent D; Guillaume, Jean-Loup; Lambiotte, Renaud; Lefebvre, Etienne (9 October 2008). “Fast unfolding of communities in large networks”. Journal of Statistical Mechanics: Theory and Experiment. 2008 (10): P10008

(Global) Network structure

Finally, it is often also informative to look at the overal characteristics of the network. We will do this in more detail next time, but just so you know:

The density of a measure represents the share of all connected to all possible connections in the network
Transistivity, also called the Clustering Cefficient indicates how much the network tends to be locally clustered. That is measured by the share of closed triplets. Again,w e will dig into that next time.
The diameter is the longest of the shortest paths between two nodes of the network.
The mean distance, or average path lenght represents the mean of all shortest paths between all nodes. It is a measure of diffusion potential within a network.

Summing up

In this session we talked about:

What are networks and why might it be interesting to study them.
What are commong datastructures to represent networks.
What are basic definitions and concepts relevant for network analysis
What are common measures of local network structure?
What are common measures of global network structure?

This session

Introduction

Networks… so what?

The basic jargon

Types of networks

Relational data-structures

Edgelist

Adjacency Matrix

Nodelists

Graph Objects

Network / Graph concepts & terminology

Network analysis and measures

Node-Level measures

Centralities

Degree centrality

Eigenvector centrality

Betweenness centrality

Neighborhood of a Node

Clustering (Community detection)

Example: The Louvain Algorithm

(Global) Network structure

(Global) Network structure

Summing up

Summing up

Further readings