Network analysis allows us to take a qualitative look at a system
Network analysis is the study of interactions within a system. A network, or graph, is comprised of nodes, the objects of study, and edges, the interactions betwee nodes.
We’re going to start learning about network theory with the Zachary’s Karate Club dataset:
library(igraph)
library(igraphdata)
library(tidyverse)
data(karate)
theme_set(theme_light())
src("src/utils.R")
(Note: you can reproduce Zachary’s Karate network with igraph::make_graph("Zachary")
, but it doesn’t have as much metadata as the version that comes with igraphdata
)
Let’s do some basic igraph stuff. Let’s see what an igraph
object looks like:
karate
## IGRAPH 4b458a1 UNW- 34 78 -- Zachary's karate club network
## + attr: name (g/c), Citation (g/c), Author (g/c), Faction (v/n),
## | name (v/c), label (v/c), color (v/n), weight (e/n)
## + edges from 4b458a1 (vertex names):
## [1] Mr Hi --Actor 2 Mr Hi --Actor 3 Mr Hi --Actor 4
## [4] Mr Hi --Actor 5 Mr Hi --Actor 6 Mr Hi --Actor 7
## [7] Mr Hi --Actor 8 Mr Hi --Actor 9 Mr Hi --Actor 11
## [10] Mr Hi --Actor 12 Mr Hi --Actor 13 Mr Hi --Actor 14
## [13] Mr Hi --Actor 18 Mr Hi --Actor 20 Mr Hi --Actor 22
## [16] Mr Hi --Actor 32 Actor 2--Actor 3 Actor 2--Actor 4
## [19] Actor 2--Actor 8 Actor 2--Actor 14 Actor 2--Actor 18
## + ... omitted several edges
There’s a lot of information here, but let’s focus on the attr
line. This tells us the following things:
name (g/c)
is an attribute of the graph itself, stored as characters. The same applies for Citation
and Author
.Faction (v/n)
tells us that the vertices have a numeric attribute corresponding to the faction the vertex belongs to. Similarly, vertices have names, labels, and colors.weight (e/n)
tells us that the edges have a number assigned to them according to some weight.Let’s start examining more attributes of the graph. We can get the order of the graph, which tells us the number of vertices it has:
gorder(karate)
## [1] 34
Similarly, the size of the graph tells us how many edges there are:
gsize(karate)
## [1] 78
Let’s type in We can get a list of the nodes in the graph by doing the following:
V(karate)
## + 34/34 vertices, named, from 4b458a1:
## [1] Mr Hi Actor 2 Actor 3 Actor 4 Actor 5 Actor 6 Actor 7
## [8] Actor 8 Actor 9 Actor 10 Actor 11 Actor 12 Actor 13 Actor 14
## [15] Actor 15 Actor 16 Actor 17 Actor 18 Actor 19 Actor 20 Actor 21
## [22] Actor 22 Actor 23 Actor 24 Actor 25 Actor 26 Actor 27 Actor 28
## [29] Actor 29 Actor 30 Actor 31 Actor 32 Actor 33 John A
We can access individual vertices by index…
V(karate)[1:5]
## + 5/34 vertices, named, from 4b458a1:
## [1] Mr Hi Actor 2 Actor 3 Actor 4 Actor 5
…or by name.
V(karate)["Mr Hi"]
## + 1/34 vertex, named, from 4b458a1:
## [1] Mr Hi
We can also access multiple vertices at once by name:
V(karate)[c("Mr Hi", "John A", "Actor 7")]
## + 3/34 vertices, named, from 4b458a1:
## [1] Mr Hi John A Actor 7
We can see a list of the attributes of each vertex with vertex.attributes
…
vertex.attributes(karate)
## $Faction
## [1] 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2 1 1 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2
##
## $name
## [1] "Mr Hi" "Actor 2" "Actor 3" "Actor 4" "Actor 5" "Actor 6"
## [7] "Actor 7" "Actor 8" "Actor 9" "Actor 10" "Actor 11" "Actor 12"
## [13] "Actor 13" "Actor 14" "Actor 15" "Actor 16" "Actor 17" "Actor 18"
## [19] "Actor 19" "Actor 20" "Actor 21" "Actor 22" "Actor 23" "Actor 24"
## [25] "Actor 25" "Actor 26" "Actor 27" "Actor 28" "Actor 29" "Actor 30"
## [31] "Actor 31" "Actor 32" "Actor 33" "John A"
##
## $label
## [1] "H" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14"
## [15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28"
## [29] "29" "30" "31" "32" "33" "A"
##
## $color
## [1] 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2 1 1 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2
…and we can access that attribute from a particular node by doing using $
:
V(karate)[34]$name
## [1] "John A"
We can access edges in a similar way:
E(karate)
## + 78/78 edges from 4b458a1 (vertex names):
## [1] Mr Hi --Actor 2 Mr Hi --Actor 3 Mr Hi --Actor 4
## [4] Mr Hi --Actor 5 Mr Hi --Actor 6 Mr Hi --Actor 7
## [7] Mr Hi --Actor 8 Mr Hi --Actor 9 Mr Hi --Actor 11
## [10] Mr Hi --Actor 12 Mr Hi --Actor 13 Mr Hi --Actor 14
## [13] Mr Hi --Actor 18 Mr Hi --Actor 20 Mr Hi --Actor 22
## [16] Mr Hi --Actor 32 Actor 2--Actor 3 Actor 2--Actor 4
## [19] Actor 2--Actor 8 Actor 2--Actor 14 Actor 2--Actor 18
## [22] Actor 2--Actor 20 Actor 2--Actor 22 Actor 2--Actor 31
## [25] Actor 3--Actor 4 Actor 3--Actor 8 Actor 3--Actor 9
## [28] Actor 3--Actor 10 Actor 3--Actor 14 Actor 3--Actor 28
## + ... omitted several edges
And to find edge attributes:
edge.attributes(karate)
## $weight
## [1] 4 5 3 3 3 3 2 2 2 3 1 3 2 2 2 2 6 3 4 5 1 2 2 2 3 4 5 1 3 2 2 2 3 3 3
## [36] 2 3 5 3 3 3 3 3 4 2 3 3 2 3 4 1 2 1 3 1 2 3 5 4 3 5 4 2 3 2 7 4 2 4 2
## [71] 2 4 2 3 3 4 4 5
We can do a basic visualization:
plot.igraph(karate,
main = "Zachary's Karate Network")
We can have the size of the node reflect its degree:
plot(karate,
vertex.size = degree(karate) * 1,
main = "Zachary's Karate Club"
)
Finally, we can try out other layouts:
plot(karate,
layout = layout_as_tree(karate),
main = "Zachary's Karate Club")
karate
?karate[]
into your console. What is that?karate
again using layout_in_circle
and scaling the width of each edge by its weight.A common question in network analysis is to figure out how the degrees in a graph are distributed. We say two nodes are connected if there is an edge between them - that is, if the two objects represented by the nodes interact with each other. We define the degree of a node as the number of other nodes that the graph is connected it. We can look at the degrees of each node in the graph using degree
:
degree(karate)
## Mr Hi Actor 2 Actor 3 Actor 4 Actor 5 Actor 6 Actor 7 Actor 8
## 16 9 10 6 3 4 4 4
## Actor 9 Actor 10 Actor 11 Actor 12 Actor 13 Actor 14 Actor 15 Actor 16
## 5 2 3 1 2 5 2 2
## Actor 17 Actor 18 Actor 19 Actor 20 Actor 21 Actor 22 Actor 23 Actor 24
## 2 2 2 3 2 2 2 5
## Actor 25 Actor 26 Actor 27 Actor 28 Actor 29 Actor 30 Actor 31 Actor 32
## 3 3 2 4 3 4 4 6
## Actor 33 John A
## 12 17
There are a few ways we can access the highest degree nodes. If we just want the single highest node, we can do:
which.max(degree(karate))
## John A
## 34
If we wanted, say, the 5 highest nodes, we can do:
sort(degree(karate)) %>%
tail(5)
## Actor 2 Actor 3 Actor 33 Mr Hi John A
## 9 10 12 16 17
Suppose we wanted all those nodes with degree higher than 4. We can than do:
degree(karate)[degree(karate) > 4]
## Mr Hi Actor 2 Actor 3 Actor 4 Actor 9 Actor 14 Actor 24 Actor 32
## 16 9 10 6 5 5 5 6
## Actor 33 John A
## 12 17
iGraph
has a degree.distribution
function that returns the proportion of nodes of each degree.
degree.distribution(karate)
## [1] 0.00000000 0.02941176 0.32352941 0.17647059 0.17647059 0.08823529
## [7] 0.05882353 0.00000000 0.00000000 0.02941176 0.02941176 0.00000000
## [13] 0.02941176 0.00000000 0.00000000 0.00000000 0.02941176 0.02941176
We can get a quick visualization of this distribution via plot
:
plot(degree.distribution(karate))
We can use plot_dd
from the utils
file to make it a bit prettier.
plot_dd(karate)
The neighborhood of a node is the set of nodes that are connected to it. For instance, to see the neighborhood of Mr Hi
, we can do:
neighbors(karate, "Actor 7")
## + 4/34 vertices, named, from 4b458a1:
## [1] Mr Hi Actor 5 Actor 6 Actor 17
intersect
to find the overlap between these two neighborhoods. What did you find? Does this make sense? What are some questions you may have about this overlap?karate
network, it is easy to pick out those nodes that have a “high” degree connectivity. However, suppose that your degree distribution looked like the graph below. How would you decide a cutoff for a node to have “high” degree connectivity?Another common technique in network analysis is graph clustering. The aim of graph clustering is to find subsets of nodes within a graph that are related. *iGraph
comes with several clustering algorithms. For example, we can use cluster_fast_greedy
to cluster the karate
network:
karate.cfg <- cluster_fast_greedy(karate)
plot(karate.cfg,
karate)
Paul Villanueva
Ph.D. Student - Bioinformatics and Computational Biology
Iowa State University, Ames, IA.