### Load standardpackages
library(tidyverse) # Collection of all the good stuff like dplyr, ggplot2 ect.
library(magrittr) # For extra-piping operators (eg. %<>%)
library(tidygraph)
library(ggraph)
Welcome to your second part of the introduction to network analysis. In this session you will learn:
Hello so far :)
A->B == B->A
). this makes sense in a lot of setting, for instance when we look at co-occurence networks.Lets look a brief example of highschool students data, which had to name their close friends.
highschool %>%
head()
Again, here it sometimes happens that friendship is not reciprocal, so we will create a directed friendship graph.
g <- highschool %>%
as_tbl_graph(directed = TRUE)
g
# A tbl_graph: 70 nodes and 506 edges
#
# A directed multigraph with 1 component
#
# Node Data: 70 x 1 (active)
name
<chr>
1 1
2 2
3 3
4 4
5 5
6 6
# … with 64 more rows
#
# Edge Data: 506 x 3
from to year
<int> <int> <dbl>
1 1 13 1957
2 1 14 1957
3 1 20 1957
# … with 503 more rows
set.seed(1337)
# The names where anonymized, which is a bit boring. So I will just give them some random names to associate with.
library(randomNames)
g <- g %N>%
mutate(gender = rbinom(n = n(), size = 1, prob = 0.5),
label= randomNames(gender = gender, name.order = "first.last"))
g %N>% as_tibble()
set.seed(1337)
g %E>%
ggraph(layout = "nicely") +
geom_edge_link(arrow = arrow()) +
geom_node_point() +
theme_graph() +
facet_edges(~year)
We indeed see that the friendship structure alters slightly between years. To make it less complicated for now, we will only look at the 1958 network.
g <- g %E>%
filter(year == 1958) %N>%
filter(!node_is_isolated())
set.seed(1337)
g %E>%
ggraph(layout = "nicely") +
geom_edge_link(arrow = arrow()) +
geom_node_point() +
theme_graph()
Our network is now directed, meaning a node-pair now has two different roles:
Consequently, most network metrics have to take this directionality into account. For example, degree centrality is now differentiated between the in-degree centrality (now many edges lead to the node) and the out-degree centrality (now many edges lead to the node)
g <- g %N>%
mutate(cent_dgr_in = centrality_degree(mode = "in"),
cent_dgr_out = centrality_degree(mode = "out"))
Now it is getting a bit more complicated. Most community detection algorithms implemented in igraph
only work with undirected networks. So, now we could do 2 things:
g <- g %N>%
mutate(community = group_edge_betweenness(directed = TRUE) %>% as.factor())
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
FALSE Modularity is implemented for undirected graphs only.
g %E>%
ggraph(layout = "nicely") +
geom_edge_link(arrow = arrow()) +
geom_node_point(aes(col = community, size = cent_dgr_in)) +
theme_graph()
The networks where created according to the follwoing questionaire:
Lets load the data! The three networks refer to cowork, friendship, and advice. The first 36 respondents are the partners in the firm.
mat_friendship
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 1 0 1 0 1 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 1 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 0 0 0 1 0 0 1 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0
11 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
12 1 1 0 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 1 0 1 1 0 1 1 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0
13 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
14 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[ reached getOption("max.print") -- omitted 57 rows ]
We also load a set of nodes
nodes %>% head()
The variables in nodes
are unnamed, but from the paper I know how they are coded, so we can give them names.
colnames(nodes) <- c("name", "seniority", "gender", "office", "tenure", "age", "practice", "school")
We can also recode the numeric codes in the data into something more intuitive. I agaion know from the data description of the paper the coding.
nodes %<>%
mutate(name = name %>% as.numeric(),
seniority = recode(seniority, "1" = "Partner", "2" = "Associate"),
gender = recode(gender, "1" = "Man", "2" = "Woman"),
office = recode(office, "1" = "Boston", "2" = "Hartford", "3" = "Providence"),
practice = recode(practice, "1" = "Litigation", "2" = "Corporate"),
school = recode(school, "1" = "Harvard, Yale", "2" = "Ucon", "3" = "Others"))
nodes %>% head()
g_friendship <- mat_friendship %>% as_tbl_graph(directed = TRUE) %E>%
mutate(type = "friendship") %N>%
mutate(name = name %>% as.numeric()) %>%
left_join(nodes, by = "name")
g_advice <- mat_advice %>% as_tbl_graph(directed = TRUE) %E>%
mutate(type = "advice") %N>%
mutate(name = name %>% as.numeric()) %>%
left_join(nodes, by = "name")
g_work <- mat_work %>% as_tbl_graph(directed = TRUE) %E>%
mutate(type = "work") %N>%
mutate(name = name %>% as.numeric()) %>%
left_join(nodes, by = "name")
# Notice: The node names are taken from the matrices dimnames as string, therefore need to be converted as numeric
# We could also join all the networks together.
g_all <- g_friendship %>%
graph_join(g_advice, by = "name") %>%
graph_join(g_work, by = "name")
g_all %E>%
as_tibble() %>%
head()
# Then we could plot them pointly via an edge facett...
g_all %>%
ggraph(layout = 'fr') +
geom_edge_fan(aes(col = type),
arrow = arrow(angle = 30, length = unit(0.25, 'cm'),type = "closed"),
alpha = 0.25) +
geom_node_point(col = 'purple') +
geom_node_text(aes(label = name)) +
theme_graph() +
theme(legend.position = "none") +
facet_edges(~type)
This is convenient, yet somewhat of a compromise, since the layout is optimized on the full network of all edges. So it kind of fits to all, but not fully to one…
for the following, we will only look at the friendship network, while i leave the analysis of the other’s up to you.
g <- g_friendship
Lets take a look
set.seed(1337)
# Then we could plot them pointly via an edge facett...
g %N>%
filter(!node_is_isolated()) %>%
ggraph(layout = 'stress') +
geom_edge_fan(arrow = arrow(angle = 30, length = unit(0.25, 'cm'),type = 'closed'), alpha = 0.25) +
geom_node_point(aes(col = office, size = centrality_eigen(directed = TRUE))) +
geom_node_text(aes(label = name, size = centrality_eigen(directed = TRUE))) +
theme_graph() +
theme(legend.position = "bottom") +
facet_edges(~type)
library(igraph)
# Get density of a graph
edge_density(g)
[1] 0.1156942
# Get the diameter of the graph g
diameter(g, directed = TRUE)
[1] 7
# Get the average path length of the graph g
mean_distance(g, directed = TRUE)
[1] 2.505126
# Transistivity
transitivity(g, type ="global")
[1] 0.4486229
# reciprocity
reciprocity(g)
[1] 0.6121739
assortativity(g, V(g)$tenure, directed = TRUE)
[1] 0.5499482
assortativity(g, V(g)$school == "Harvard, Yale", directed = TRUE)
[1] 0.1808466
assortativity(g, degree(g, mode = "in"), directed = TRUE)
[1] 0.1771873
This leads to an interesting setup, which has proven to be conductive for efficient communication and fast diffusion of information in social networks.
We calculate it for now in an easy way:
transitivity(g, type ="global") / mean_distance(g, directed = TRUE)
[1] 0.179082
However, you by now probably wonder how to interprete this numbers. Are they high, low, or whatever? What is the reference? In fact, it’s very hard to say. The best way to say something about that is to compare it with what a random network would look like.
So, lets create a random network. Here, we use the play_erdos_renyi()
function, which creates a network with a given number of nodes and edge-density, but where the edges are constructed completely random.
g_r <- play_erdos_renyi(n = g %>% gorder(),
m = g %>% gsize(),
directed = TRUE,
loops = FALSE)
Looks kind of different. However, one randomly created network doesn’t present a good baseline. So, lets better create a bunch, and compare our network to the average values of the randomly generated ones.
# Generate n random graphs
n = 1000
g_l <- vector('list', n)
for(i in 1:n){
g_l[[i]] <- play_erdos_renyi(n = g %>% gorder(),
m = g %>% gsize(),
directed = TRUE,
loops = FALSE)
}
Now we can see how meaningful our observed network statistics are, by comparing them with the mean of the statistics in the random network.
# Calculate average path length of 1000 random graphs
dist_r <- g_l %>% lapply(mean_distance, directed = TRUE) %>% unlist() #%>% mean()
cc_r <- g_l %>% lapply(transitivity, type = "global") %>% unlist() #%>% mean()
rp_r <- g_l %>% lapply(reciprocity) %>% unlist() #%>% mean()
Lets see:
stats_friend <- tibble(density = g %>% edge_density(),
diameter = g %>% diameter(directed = TRUE),
reciprocity = g %>% reciprocity(),
reciprocity_score = mean(reciprocity(g) > rp_r),
distance = g %>% mean_distance(directed = TRUE),
distance_score = mean(mean_distance(g, directed = TRUE) > dist_r),
clustering = g %>% transitivity(type = "global"),
clustering_score = mean(transitivity(g, type = "global") > cc_r),
small_world = mean(transitivity(g, type = "global") > cc_r) / mean(mean_distance(g, directed = TRUE) > dist_r) )
stats_friend
Please do Exercise 1 in the corresponding section on Github
.
Classics
Own work on directed networks
sessionInfo()