### Load standardpackages
library(tidyverse) # Collection of all the good stuff like dplyr, ggplot2 ect.
library(magrittr) # For extra-piping operators (eg. %<>%)

library(tidygraph)
library(ggraph)

Welcome to your second part of the introduction to network analysis. In this session you will learn:

  1. What directed networks are, and when that matters.
  2. How different measures have to be calculated in directed networks.
  3. What multidimensional networks are, and how they matter.
  4. How to compare network measures between graphs, and with random graphs

Introduction

Hello so far :)

Directed networks

  • Up to now, we did not pay attention to the direction of edges, and assumed them to be symetric (A->B == B->A). this makes sense in a lot of setting, for instance when we look at co-occurence networks.
  • However, in many cases, such as friendship networks, that might not be the case (the person you name a close friend not necessarily thinks the same about you).
  • In such cases, we would like to take this directionality into account, and analyse directed networks.

Lets look a brief example of highschool students data, which had to name their close friends.

highschool %>%
  head()

Again, here it sometimes happens that friendship is not reciprocal, so we will create a directed friendship graph.

g <- highschool %>% 
  as_tbl_graph(directed = TRUE)
g
# A tbl_graph: 70 nodes and 506 edges
#
# A directed multigraph with 1 component
#
# Node Data: 70 x 1 (active)
  name 
  <chr>
1 1    
2 2    
3 3    
4 4    
5 5    
6 6    
# … with 64 more rows
#
# Edge Data: 506 x 3
   from    to  year
  <int> <int> <dbl>
1     1    13  1957
2     1    14  1957
3     1    20  1957
# … with 503 more rows
set.seed(1337)
# The names where anonymized, which is a bit boring. So I will just give them some random names to associate with.
library(randomNames)

g <- g %N>%
  mutate(gender = rbinom(n = n(), size = 1, prob = 0.5),
         label= randomNames(gender = gender, name.order = "first.last"))
g %N>% as_tibble()
  • Lets plot this network briefly to get a sense.
  • Notice that we have edges for two years, so we can do a facet plot for every year.
set.seed(1337)
g %E>% 
  ggraph(layout = "nicely") + 
    geom_edge_link(arrow = arrow()) + 
    geom_node_point() +
    theme_graph() +
    facet_edges(~year)

We indeed see that the friendship structure alters slightly between years. To make it less complicated for now, we will only look at the 1958 network.

g <- g %E>% 
  filter(year == 1958) %N>%
  filter(!node_is_isolated()) 
set.seed(1337)
g %E>% 
  ggraph(layout = "nicely") + 
    geom_edge_link(arrow = arrow()) + 
    geom_node_point() +
    theme_graph() 

Centrality measures

Our network is now directed, meaning a node-pair now has two different roles:

  • Ego: The node the edge loriginates from.
  • Alter: The node the edge leads to.

Consequently, most network metrics have to take this directionality into account. For example, degree centrality is now differentiated between the in-degree centrality (now many edges lead to the node) and the out-degree centrality (now many edges lead to the node)

g <- g %N>%
  mutate(cent_dgr_in = centrality_degree(mode = "in"),
         cent_dgr_out = centrality_degree(mode = "out")) 

Community Structures

Now it is getting a bit more complicated. Most community detection algorithms implemented in igraph only work with undirected networks. So, now we could do 2 things:

  1. Convert the network in an undirected one.
  2. Use the “edge betweenness” algorithm, the only one implemented that can handle directed networks.
g <- g %N>%
  mutate(community = group_edge_betweenness(directed = TRUE) %>% as.factor())
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
g %E>% 
  ggraph(layout = "nicely") + 
    geom_edge_link(arrow = arrow()) + 
    geom_node_point(aes(col = community, size = cent_dgr_in)) +
    theme_graph() 

Case: Lawyers, Friends & Foes

Introduction to the case

  • Emmanuel Lazega, The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership, Oxford University Press (2001).

Data

  • This data set comes from a network study of corporate law partnership that was carried out in a Northeastern US corporate law firm, referred to as SG&R, 1988-1991 in New England.
  • It includes (among others) measurements of networks among the 71 attorneys (partners and associates) of this firm, i.e. their strong-coworker network, advice network, friendship network, and indirect control networks.
  • Various members’ attributes are also part of the dataset, including seniority, formal status, office in which they work, gender, lawschool attended, individual performance measurements (hours worked, fees brought in), attitudes concerning various management policy options, etc.
  • This dataset was used to identify social processes such as bounded solidarity, lateral control, quality control, knowledge sharing, balancing powers, regulation, etc. among peers.

Setting

  • What do corporate lawyers do? Litigation and corporate work.
  • Division of work and interdependencies.
  • Three offices, no departments, built-in pressures to grow, intake and assignment rules.
  • Partners and associates: hierarchy, up or out rule, billing targets.
  • Partnership agreement (sharing benefits equally, 90% exclusion rule, governance structure, elusive committee system) and incompleteness of the contracts.
  • Informal, unwritten rules (ex: no moonlighting, no investment in buildings, no nepotism, no borrowing to pay partners, etc.).
  • Huge incentives to behave opportunistically ; thus the dataset is appropriate for the study of social processes that make cooperation among rival partners possible.
  • Sociometric name generators used to elicit coworkers, advice, and ‘friendship’ ties at SG&R:“Here is the list of all the members of your Firm.”

The networks where created according to the follwoing questionaire:

  • Strong coworkers network: “Because most firms like yours are also organized very informally, it is difficult to get a clear idea of how the members really work together. Think back over the past year, consider all the lawyers in your Firm. Would you go through this list and check the names of those with whom you have worked with. By”worked with" I mean that you have spent time together on at least one case, that you have been assigned to the same case, that they read or used your work product or that you have read or used their work product; this includes professional work done within the Firm like Bar association work, administration, etc."
  • Basic advice network: “Think back over the past year, consider all the lawyers in your Firm. To whom did you go for basic professional advice? For instance, you want to make sure that you are handling a case right, making a proper decision, and you want to consult someone whose professional opinions are in general of great value to you. By advice I do not mean simply technical advice.”
  • ‘Friendship’ network: “Would you go through this list, and check the names of those you socialize with outside work. You know their family, they know yours, for instance. I do not mean all the people you are simply on a friendly level with, or people you happen to meet at Firm functions.”

Data preperation

Load the data

Lets load the data! The three networks refer to cowork, friendship, and advice. The first 36 respondents are the partners in the firm.

mat_friendship
   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
1  0 1 0 1 0 0 0 1 0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
2  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  1  1  0  0  0  0  1  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0
3  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
4  0 1 1 0 0 0 0 0 1  0  1  1  0  1  0  1  1  0  1  0  1  1  1  0  0  1  1  0  1  0  0  0  0  0  0  0  0  0  0  0
5  0 0 0 0 0 0 1 0 0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
6  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
7  0 0 1 0 1 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
8  0 0 0 0 0 0 0 0 0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
9  0 0 0 1 0 0 0 0 0  0  1  0  0  0  0  0  0  0  0  0  1  0  1  1  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0
10 0 1 0 0 0 0 0 1 1  0  1  1  1  0  0  0  1  0  0  0  1  0  0  1  0  1  1  0  1  0  0  0  0  1  0  0  0  1  0  0
11 0 0 0 0 0 0 0 1 1  0  0  0  0  0  0  0  1  0  0  0  1  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0
12 1 1 0 1 1 0 0 1 1  1  1  0  1  0  1  1  1  0  1  0  1  1  0  1  1  1  1  0  1  0  0  0  0  1  0  0  0  1  0  0
13 0 0 0 1 0 0 0 0 1  0  1  1  0  0  0  0  0  0  0  1  1  0  1  1  0  1  1  0  0  0  0  0  0  0  0  0  0  1  0  0
14 0 0 1 0 0 1 0 0 0  0  0  0  0  0  0  1  1  0  0  1  0  0  0  0  1  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0
   41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
1   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
2   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
3   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
4   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
5   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
6   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
7   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
8   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
9   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
10  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
11  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
12  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
13  1  0  1  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
14  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
 [ reached getOption("max.print") -- omitted 57 rows ]

We also load a set of nodes

nodes %>% head()

Cleaning up

The variables in nodes are unnamed, but from the paper I know how they are coded, so we can give them names.

colnames(nodes) <- c("name", "seniority", "gender", "office", "tenure", "age", "practice", "school")

We can also recode the numeric codes in the data into something more intuitive. I agaion know from the data description of the paper the coding.

  • seniority status (1=partner; 2=associate)
  • gender (1=man; 2=woman)
  • office (1=Boston; 2=Hartford; 3=Providence)
  • years with the firm
  • age
  • practice (1=litigation; 2=corporate)
  • law school (1: harvard, yale; 2: ucon; 3: other)
nodes %<>%
  mutate(name = name %>% as.numeric(),
         seniority = recode(seniority, "1" = "Partner", "2" = "Associate"),
         gender = recode(gender, "1" = "Man", "2" = "Woman"),
         office = recode(office, "1" = "Boston", "2" = "Hartford", "3" = "Providence"),
         practice = recode(practice, "1" = "Litigation", "2" = "Corporate"),
         school = recode(school, "1" = "Harvard, Yale", "2" = "Ucon", "3" = "Others"))   
nodes %>% head()

Generate the graph

  • Since we have now a multidimensional network (=different types of edges), we first load them into isolated networks.
  • We could also directly load them into one network with labeled edges, but that’s a bit more complicated, so we keep it for the sake of clarity seperated for now.
g_friendship <- mat_friendship %>% as_tbl_graph(directed = TRUE) %E>%
  mutate(type = "friendship") %N>%
  mutate(name = name %>% as.numeric()) %>%
  left_join(nodes, by = "name")

g_advice <- mat_advice %>% as_tbl_graph(directed = TRUE) %E>%
  mutate(type = "advice") %N>%
  mutate(name = name %>% as.numeric()) %>%
  left_join(nodes, by = "name")

g_work <- mat_work %>% as_tbl_graph(directed = TRUE) %E>%
  mutate(type = "work") %N>%
  mutate(name = name %>% as.numeric()) %>%
  left_join(nodes, by = "name")

# Notice: The node names are taken from the matrices dimnames as string, therefore need to be converted as numeric

First inspection

# We could also join all the networks together.
g_all <- g_friendship %>%
  graph_join(g_advice, by = "name") %>%
  graph_join(g_work, by = "name")
g_all %E>%
  as_tibble() %>%
  head()
# Then we could plot them pointly via an edge facett...
g_all %>% 
  ggraph(layout = 'fr') + 
  geom_edge_fan(aes(col = type), 
                arrow = arrow(angle = 30, length = unit(0.25, 'cm'),type = "closed"), 
                alpha = 0.25) + 
  geom_node_point(col = 'purple') +
  geom_node_text(aes(label = name)) + 
  theme_graph() +
  theme(legend.position = "none") +
  facet_edges(~type) 

This is convenient, yet somewhat of a compromise, since the layout is optimized on the full network of all edges. So it kind of fits to all, but not fully to one…

Network effects & structures

for the following, we will only look at the friendship network, while i leave the analysis of the other’s up to you.

g <- g_friendship

Lets take a look

set.seed(1337)
# Then we could plot them pointly via an edge facett...
g %N>% 
  filter(!node_is_isolated()) %>%
  ggraph(layout = 'stress') + 
  geom_edge_fan(arrow = arrow(angle = 30, length = unit(0.25, 'cm'),type = 'closed'), alpha = 0.25) + 
  geom_node_point(aes(col = office, size = centrality_eigen(directed = TRUE))) +
  geom_node_text(aes(label = name, size = centrality_eigen(directed = TRUE))) + 
  theme_graph() +
  theme(legend.position = "bottom") +
  facet_edges(~type) 

Node level (local)

  • We could look at all the node level characteristics (degree, betweenness etc.) again, but for the sake of time I skip that for now, since its all already in the last notebook.

Network level (global)

library(igraph)
  • Ok, lets do the whole exercise with getting the main-determinants of the network structure again. We can look at the classical structural determinants.
# Get density of a graph
edge_density(g)
[1] 0.1156942
# Get the diameter of the graph g
diameter(g, directed = TRUE)
[1] 7
# Get the average path length of the graph g
mean_distance(g, directed = TRUE)
[1] 2.505126
# Transistivity
transitivity(g, type ="global")
[1] 0.4486229

Network level (global direced)

  • Since we here have a directed network, a couple of interesting additional metrics are available, explicitly taking into account th direction of edges.
  • While there are many more, we here will just take a look at some of the most important ones, which are also known to be popular mechanisms in social networks.

  • Reciprocity measures the extend to which edges are reciprocal, meaning a edge between i & j implies also an edge between j & i
# reciprocity
reciprocity(g)
[1] 0.6121739
  • We have another important concept that often explains edge-formation in directed (social) networks: Assortativity, also called homopholy.
  • This is a measure of how preferentially attached vertices are to other vertices with identical attributes. In other words: How much “birds of the same feather flock together ”.
  • Lets first look at people of the same tenure flock together.
assortativity(g, V(g)$tenure, directed = TRUE)
[1] 0.5499482
  • What about people from elite universities?
assortativity(g, V(g)$school == "Harvard, Yale", directed = TRUE)
[1] 0.1808466
  • Lastly, what about the popularity (or “Matthew”) effect?
assortativity(g, degree(g, mode = "in"), directed = TRUE)
[1] 0.1771873
  • One more thing we didn’t talk about yet: Small worlds.
  • Small worlds are an interesting network structure, combining short path lenght betwen the nodes with a high clustering coefficient.
  • That means, that we have small interconnected clusters, which are in turn connected by gatekeepers (the edges we call bridges or structural holes).

This leads to an interesting setup, which has proven to be conductive for efficient communication and fast diffusion of information in social networks.

We calculate it for now in an easy way:

transitivity(g, type ="global") / mean_distance(g, directed = TRUE)
[1] 0.179082

However, you by now probably wonder how to interprete this numbers. Are they high, low, or whatever? What is the reference? In fact, it’s very hard to say. The best way to say something about that is to compare it with what a random network would look like.

So, lets create a random network. Here, we use the play_erdos_renyi() function, which creates a network with a given number of nodes and edge-density, but where the edges are constructed completely random.

g_r <- play_erdos_renyi(n = g %>% gorder(),  
                        m  = g %>% gsize(), 
                        directed = TRUE, 
                        loops = FALSE)

Looks kind of different. However, one randomly created network doesn’t present a good baseline. So, lets better create a bunch, and compare our network to the average values of the randomly generated ones.

# Generate n random graphs
n = 1000
g_l <- vector('list', n)
  
for(i in 1:n){
  g_l[[i]] <- play_erdos_renyi(n = g %>% gorder(),  
                        m  = g %>% gsize(), 
                        directed = TRUE, 
                        loops = FALSE)
}

Now we can see how meaningful our observed network statistics are, by comparing them with the mean of the statistics in the random network.

# Calculate average path length of 1000 random graphs
dist_r <- g_l %>% lapply(mean_distance, directed = TRUE) %>% unlist() #%>% mean()
cc_r <- g_l %>% lapply(transitivity, type = "global") %>% unlist() #%>% mean()
rp_r <- g_l %>% lapply(reciprocity) %>% unlist() #%>% mean()

Lets see:

stats_friend <- tibble(density = g %>% edge_density(),
                       diameter = g %>% diameter(directed = TRUE),
                       reciprocity = g %>% reciprocity(),
                       reciprocity_score = mean(reciprocity(g) > rp_r),
                       distance = g %>% mean_distance(directed = TRUE),
                       distance_score = mean(mean_distance(g, directed = TRUE) > dist_r),
                       clustering = g %>% transitivity(type = "global"),
                       clustering_score = mean(transitivity(g, type = "global")  > cc_r),
                       small_world = mean(transitivity(g, type = "global")  > cc_r) / mean(mean_distance(g, directed = TRUE) > dist_r) )
stats_friend

Your turn

Please do Exercise 1 in the corresponding section on Github.

Endnotes

Suggestions for further study

Literature

Classics

  • Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications (Vol. 8). Cambridge university press.
  • Carrington, P. J., Scott, J., & Wasserman, S. (Eds.). (2005). Models and methods in social network analysis (Vol. 28). Cambridge university pres

Own work on directed networks

  • Hain, D., Buchmann, T., Kudic, M., & Müller, M. (2018). Endogenous dynamics of innovation networks in the German automotive industry: analysing structural network evolution using a stochastic actor-oriented approach. International Journal of Computational Economics and Econometrics, 8(3-4), 325-344.
  • Hain, Daniel S., and Roman Jurowetzki. “Incremental by Design? On the Role of Incumbents in Technology Niches.” Foundations of Economic Change. Springer, Cham, 2017. 299-332.

Session Info

sessionInfo()
