### Load standardpackages
library(tidyverse) # Collection of all the good stuff like dplyr, ggplot2 ect.
library(magrittr) # For extra-piping operators (eg. %<>%)
In this applied session, you will:
tibble()
function.edge_list <- tibble(from = c(1, 2, 2, 1, 4, 3, 5),
to = c(2, 3, 4, 5, 1, 2, 5))
edge_list
adj_matrix <- edge_list %>%
table() %>%
as.matrix()
adj_matrix
to
from 1 2 3 4 5
1 0 1 0 0 1
2 0 0 1 1 0
3 0 1 0 0 0
4 1 0 0 0 0
5 0 0 0 0 1
Note:
dgCMatrix
object from the Matrix
can be helpful.library(Matrix)
sparse_matrix <- edge_list %>%
table() %>%
Matrix(sparse = TRUE)
sparse_matrix
5 x 5 sparse Matrix of class "dgCMatrix"
to
from 1 2 3 4 5
1 . 1 . . 1
2 . . 1 1 .
3 . 1 . . .
4 1 . . . .
5 . . . . 1
sparse_matrix %>% glimpse()
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:7] 3 0 2 1 1 0 4
..@ p : int [1:6] 0 1 3 4 5 7
..@ Dim : int [1:2] 5 5
..@ Dimnames:List of 2
.. ..$ from: chr [1:5] "1" "2" "3" "4" ...
.. ..$ to : chr [1:5] "1" "2" "3" "4" ...
..@ x : num [1:7] 1 1 1 1 1 1 1
..@ factors : list()
node_list <- tibble(id = 1:5,
name = c("Jesper", "Pernille", "Jacob", "Dorte", "Donald"),
gender = c("M", "F", "M", "F", "M"),
group = c("A", "B", "B", "A", "C"))
node_list
igraph
R
and Python
alike is igraph
. * It provides a powerful toolbox for analysis as well as plotting alike. Lets take a peak.igraph
object from an edge-list data frame we can use the graph_from_data_frame()
function.graph_from_data_frame()
function: d, vertices, and directed.
TRUE
or FALSE
depending on whether the data is directed or undirected.graph.data.frame()
treats the first two columns of the edge list and any remaining columns as edge attributes.library(igraph)
g <- graph_from_data_frame(d = edge_list, vertices = node_list, directed = FALSE)
# g <- graph_from_adjacency_matrix(adj_matrix, mode = "undirected") # Same for the adjacency matrix
g
IGRAPH 0e6ab66 UN-- 5 7 --
+ attr: name (v/c), gender (v/c), group (v/c)
+ edges from 0e6ab66 (vertex names):
[1] Jesper --Pernille Pernille--Jacob Pernille--Dorte Jesper --Donald Jesper --Dorte
[6] Pernille--Jacob Donald --Donald
Lets inspect the resulting object. An igraph
graph object summary reveals some interesting informations.
UN
, or directed DN
attr: name (v/c), gender (v/c), group (v/c)
)n--m
indicates an undirected, n->m
an directed edge.Lets take a look at the structure of the object:
g[[1:2]]%>% glimpse() # Note the double brackets (g is a list object)
List of 2
$ Jesper : 'igraph.vs' Named int [1:3] 2 4 5
..- attr(*, "names")= chr [1:3] "Pernille" "Dorte" "Donald"
..- attr(*, "env")=<weakref>
..- attr(*, "graph")= chr "0e6ab66c-8b3d-4871-8dac-6fa47bff7829"
$ Pernille: 'igraph.vs' Named int [1:4] 1 3 3 4
..- attr(*, "names")= chr [1:4] "Jesper" "Jacob" "Jacob" "Dorte"
..- attr(*, "env")=<weakref>
..- attr(*, "graph")= chr "0e6ab66c-8b3d-4871-8dac-6fa47bff7829"
..$ Jesper: 'igraph.vs' Named int [1:3] 2 4 5
)igraph
object can be directly used with the plot()
function.plot(g)
Note: We will not venture further into the igraph
plotting functionality, since we will go all in with ggraph
. However, there is a very neath tutorial here, that will tell you everything you need to know, in case you are interested.
We can inspect and manipulate the nodes via V(g)
(V for vertices, its graph-theory slang), and edges with E(g)
V(g)
+ 5/5 vertices, named, from 0e6ab66:
[1] Jesper Pernille Jacob Dorte Donald
E(g)
+ 7/7 edges from 0e6ab66 (vertex names):
[1] Jesper --Pernille Pernille--Jacob Pernille--Dorte Jesper --Donald Jesper --Dorte
[6] Pernille--Jacob Donald --Donald
We can also use most of the base-R slicing&dicing.
V(g)[1:3]
+ 3/5 vertices, named, from 0e6ab66:
[1] Jesper Pernille Jacob
E(g)[2:4]
+ 3/7 edges from 0e6ab66 (vertex names):
[1] Pernille--Jacob Pernille--Dorte Jesper --Donald
Remember, it’s a list-object. So, if we just want to have the values, we have to use the double bracket [[x]]
.
V(g)[[1:3]]
+ 3/5 vertices, named, from 0e6ab66:
We can also use the $
notation.
V(g)$name
[1] "Jesper" "Pernille" "Jacob" "Dorte" "Donald"
There is obviously a lot more to say about igraph
and its rich functionality. You will learn much of the abse functionality of igraph
in your DC assignments. Furthermore Katya Ognyanova, has a brilliant tutorial that can be studied.
tidygraph
igraph
functionality still represents the core of R
’s network analysis toolbox, recent developments have made network analytics much more accessible and intuitive.ggforce
, gganimate
, and the R
implementation of lime
) has recently released the tidygraph
package.igraph
in a manner consistent with the tidyverse
workflow.igraph
object and functionality which makes it accessible for much of the traditional dplyr
workflows.ggraph
, a consistent ggplot2
-look-and-feel network visualization package.tidygraph
framework, while we still in some few cases need to draw from the base igraph
functionality.All tidygraph
functions are excellently documented here
tbl_graph
library(tidygraph)
tbl_graph
directly from the igraph
object.g %<>% as_tbl_graph()
g
# A tbl_graph: 5 nodes and 7 edges
#
# An undirected multigraph with 1 component
#
# Node Data: 5 x 3 (active)
name gender group
<chr> <chr> <chr>
1 Jesper M A
2 Pernille F B
3 Jacob M B
4 Dorte F A
5 Donald M C
#
# Edge Data: 7 x 2
from to
<int> <int>
1 1 2
2 2 3
3 2 4
# … with 4 more rows
g
# A tbl_graph: 5 nodes and 7 edges
#
# An undirected multigraph with 1 component
#
# Node Data: 5 x 4 (active)
id name gender group
<int> <chr> <chr> <chr>
1 1 Jesper M A
2 2 Pernille F B
3 3 Jacob M B
4 4 Dorte F A
5 5 Donald M C
#
# Edge Data: 7 x 2
from to
<int> <int>
1 1 2
2 2 3
3 2 4
# … with 4 more rows
tbl_graph
class is a thin wrapper around an igraph
object that provides methods for manipulating the graph using the tidy API.igraph
, every igraph
method and its syntax will work as expected and can be used if necessary. However, it might convert it back into an original igraph
object.V(g)
+ 5/5 vertices, named, from cfdd05e:
[1] Jesper Pernille Jacob Dorte Donald
as_tbl_graph()
function also can transform different types of network data from objects such as data.frame
, matrix
, dendrogram
, igraph
, etc.dplyr
syntax?g %>% filter(name == "Pernille")
would be confusing, since it is unclear if we refer to nodes or edges.tidygraph
’s solution here are selective activation pipes:
%N>%
activates nodes (short for longer alternative:%>% activate_nodes()
)%E>%
activates edges (short for longer alternative:%>% activate_edges()
)g %N>%
filter(gender == "F")
# A tbl_graph: 2 nodes and 1 edges
#
# An unrooted tree
#
# Node Data: 2 x 4 (active)
id name gender group
<int> <chr> <chr> <chr>
1 2 Pernille F B
2 4 Dorte F A
#
# Edge Data: 1 x 2
from to
<int> <int>
1 1 2
g %N>%
filter(group %in% c("A", "B")) %E>%
filter(to == 2)
# A tbl_graph: 4 nodes and 1 edges
#
# An unrooted forest with 3 trees
#
# Edge Data: 1 x 2 (active)
from to
<int> <int>
1 1 2
#
# Node Data: 4 x 4
id name gender group
<int> <chr> <chr> <chr>
1 1 Jesper M A
2 2 Pernille F B
3 3 Jacob M B
# … with 1 more row
tbl_graph
and use it for tabular analysis.g %N>%
filter(group == "B") %>%
as_tibble()
igraph
also provides a powerful network visualization functionality, I will also mostly go with Thomas sister package, ggraph
, which provides a network visualization interface compatible and consistent with ggplot2
ggplot
function call, just that we use special geoms for our networkgeom_edge_density()
to draw a shadow where the edge density is higher, geom_edge_link()
to connect edges with a straight line, geom_node_point()
to draw node points and geom_node_text()
to draw the labels. More options can be found here.library(ggraph)
g %>% ggraph(layout = "nicely") +
geom_edge_link() +
geom_node_point() +
geom_node_text(aes(label = name))
Not very impressive up to now, but wait for the real stuff to come in later sessions.
# generate a sample network: play_smallworld() Create graphs based on the Watts-Strogatz small- world model.
set.seed(1337)
g <- play_barabasi_albert(n = 200, # Number of nodes
power = 0.75, # Power of preferential attachment effect
directed = FALSE # Undirected network
)
set.seed(1337)
g %>%
ggraph(layout = "fr") +
geom_edge_link() +
geom_node_point() +
theme_graph() # Adding `theme_graph()` introduces a stileguide better suited for rgaphs
Centralities can be easily created on node level wit the centrality_[...]
function. All centralities available can be found here
g <- g %N>%
mutate(centrality_dgr = centrality_degree(),
centrality_eigen = centrality_eigen(),
centrality_between = centrality_betweenness())
g %N>%
as_tibble() %>%
head()
set.seed(1337)
g %>%
ggraph(layout = "fr") +
geom_edge_link() +
geom_node_point(aes(size = centrality_dgr, colour = centrality_dgr)) +
scale_color_continuous(guide = "legend") +
theme_graph()
set.seed(1337)
g %>%
ggraph(layout = "fr") +
geom_edge_link() +
geom_node_point(aes(size = centrality_eigen, colour = centrality_eigen)) +
scale_color_continuous(guide = "legend") +
theme_graph()
set.seed(1337)
g %>%
ggraph(layout = "fr") +
geom_edge_link() +
geom_node_point(aes(size = centrality_between, colour = centrality_between)) +
scale_color_continuous(guide = "legend") +
theme_graph()
igraph
are available in tidygraph using the group_*
prefix.Lets illustrate
set.seed(1337)
# We create an example network
g <- play_islands(n_islands = 5, # The number of densely connected islands
size_islands = 15, # The number of nodes in each island
p_within = 0.75, # The probability of edges within and between groups/blocks
m_between = 5 # The number of edges between groups/islands
)
set.seed(1337)
# As planned, we clearely see distinct communities
g %>%
ggraph(layout = 'kk') +
geom_edge_link() +
geom_node_point(size = 7) +
theme_graph()
set.seed(1337)
# We run a community detection simply with the group_* function of tidygraph. here, the Lovain algorithm is a well performing and fast choice.
g <- g %N>%
mutate(community = group_louvain() %>% as.factor())
set.seed(1337)
# Lets see how well it did...
g %>%
ggraph(layout = 'kk') +
geom_edge_link() +
geom_node_point(aes(colour = community), size = 7) +
theme_graph()
g %N>%
mutate(neighborhood_size = local_size(order = 2)) %>%
as_tibble() %>%
arrange(desc(neighborhood_size)) %>%
head()
’ We can also not only look at it, but produce a new sub-graph only of this ego-network. ’ Here, we need to use the base igraph
function. Note that it produces an igraph
object, so we have to make a tidygraph
again…
g1 <- make_ego_graph(g, 2, nodes = 1)[[1]] %>% as_tbl_graph()
g50 <- make_ego_graph(g, 2, nodes = 50)[[1]] %>% as_tbl_graph()
set.seed(1337)
g1 %>%
ggraph(layout = 'kk') +
geom_edge_link() +
geom_node_point(aes(colour = community), size = 7) +
theme_graph()
set.seed(1337)
g50 %>%
ggraph(layout = 'kk') +
geom_edge_link() +
geom_node_point(aes(colour = community), size = 7) +
theme_graph()
Finally, it is often also informative to look at the overal characteristics of the network. We will do this in more detail next time, but just so you know:
The density of a measure represents the share of all connected to all possible connections in the network
edge_density(g)
[1] 0.1545946
Transistivity, also c alled the Clustering Cefficient indicates how much the network tends to be locally clustered. That is measured by the share of closed triplets. Again,w e will dig into that next time.
transitivity(g)
[1] 0.5551739
diameter(g, directed = F, weights = NA)
[1] 4
mean_distance(g, directed = F)
[1] 2.321441
edges <- read_csv('https://sds-aau.github.io/SDS-master/00_data/GoT_network/asoiaf-all-edges.csv')
edges %>% head()
colnames(edges) <- tolower(colnames(edges))
Ok, lets see how many characters we have overal.
n_distinct(c(edges$source, edges$target))
[1] 796
head(chars_main)
edges %<>%
filter(source %in% chars_main$name & target %in% chars_main$name) %>%
select(source, target, weight) %>%
rename(from = source,
to = target)
# Note: Since it is small data, this way with %in% is ok. However, with large datasets I would filter via semi_join() instead (more efficient)
Now we can convert our edgelist into a tbl_graph
object structure.
g <- edges %>% as_tbl_graph(directed = FALSE)
g
# A tbl_graph: 100 nodes and 798 edges
#
# An undirected simple graph with 1 component
#
# Node Data: 100 x 1 (active)
name
<chr>
1 Aemon-Targaryen-(Maester-Aemon)
2 Aeron-Greyjoy
3 Aerys-II-Targaryen
4 Alliser-Thorne
5 Arianne-Martell
6 Arya-Stark
# … with 94 more rows
#
# Edge Data: 798 x 3
from to weight
<int> <int> <dbl>
1 1 4 7
2 1 13 4
3 1 28 3
# … with 795 more rows
tidygraph
helpers to briefly clean the graph. Check ?node_is_*
and ?edge_is_*
for options.# Filtering out multiple edges and isolated nodes (unconnected), in case there are some
g <- g %E>%
filter(!edge_is_multiple()) %N>%
filter(!node_is_isolated())
g %E>%
as_tibble() %>%
ggplot(aes(x = weight)) +
geom_histogram()
We see a right skewed distribution with many weak and some very strong edges. Lets take a look what are the edges with the highest weight (meaning here: the characters with most intraction).
g %E>%
as_tibble() %>%
arrange(desc(weight)) %>%
head()
tidygraph
always uses numeric IDs for nodes, which are also labeling the edges. This is not very helpful to get insights. So, lets take the node names in instead.
# We access the nodes directly via .N(). The same can be done for edges with .E() and the graph with .G(). Check ?context_accessors for more infos
g %E>%
mutate(name_from = .N()$name[from],
name_to = .N()$name[to]) %>%
as_tibble() %>%
select(name_from, name_to, weight) %>%
arrange(desc(weight)) %>%
head()
g <- g %N>%
mutate(centrality_dgr = centrality_degree(weights = weight),
centrality_eigen = centrality_eigen(weights = weight),
centrality_between = centrality_betweenness(weights = weight))
bind_cols(g %N>%
select(name, centrality_dgr) %>%
arrange(desc(centrality_dgr)) %>%
as_tibble(),
g %N>%
select(name, centrality_eigen) %>%
arrange(desc(centrality_eigen)) %>%
as_tibble(),
g %N>%
select(name, centrality_between) %>%
arrange(desc(centrality_between)) %>%
as_tibble()) %>%
mutate_if(is.numeric, round, 1) %>%
head()
g <- g %N>%
mutate(community = group_louvain() %>% as.factor())
g %N>%
select(name, community, centrality_dgr) %>%
as_tibble() %>%
arrange(community, desc(centrality_dgr)) %>%
group_by(community) %>%
slice(1:5) %>% mutate(n = 1:5) %>%
ungroup() %>%
select(-centrality_dgr) %>%
spread(community, name)
NA
Ok, lets give it a first minimal shot:
g %>% ggraph(layout = "fr") +
geom_edge_link() +
geom_node_point() +
geom_node_text(aes(label = name))
Not very exciting. Maybe we can do a bit better, using more options in the ggraph
functionality to visualize aspects of the network.
g %E>%
filter(weight >= quantile(weight, 0.5)) %N>%
filter(!node_is_isolated()) %>%
ggraph(layout = "fr") +
geom_edge_link(aes(width = weight), alpha = 0.2) +
geom_node_point(aes(color = community, size = centrality_eigen)) +
geom_node_text(aes(label = name, size = centrality_eigen), repel = TRUE) +
scale_color_brewer(palette = "Set1") +
theme_graph() +
labs(title = "A Song of Ice and Fire character network",
subtitle = "Nodes are colored by community")
Please do Exercise 1 in the corresponding section on Github
.
sessionInfo()