Small World Exercise
Background
Small world networks were inspired by the sociological phenomenon of
"six degrees of separation": the network of human friendships is
thought to link random pairs of humans through a remarkably small
number of first-name basis relationships (you know someone who knows
someone who ... who knows someone who was struck by a meteor, just
like the dinosaurs). The same phenomena happens in movies -- hence the
famous "Bacon number".
This simple model has been studied thoroughly, but remains a nice
introduction both to random network construction, to breadth-first
search algorithms, and to scaling collapses. Your resulting software
can then also be applied to real-world networks.
You will apply your algorithms to some randomly generated
networks. Many interesting problems arise from studying properties of
randomly generated networks. A network is a collection of nodes and
edges, with each edge connected to two nodes, but with each node
potentially connected to any number of edges. A random network is
constructed probabilistically according to some definite rules;
studying such a random network usually is done by studying the entire
ensemble of networks, each weighted by the probability that it was
constructed. Thus these problems naturally fall within the broad
purview of statistical mechanics.
In particular, in this exercise we will generate some random networks
and calculate the distribution of distances (path lengths) between
pairs of nodes. We will study small world networks, a
theoretical model that suggests how a small number of shortcuts (e.g.,
unusual international and intercultural friendships in a social
network) can dramatically shorten the tyical path lengths. Finally
we will study how a simple, universal scaling behavior emerges for
large networks with few shortcuts.
Learning Goals
Science:
You will determine how the connectivity of a network changes when a small
number of random connections are added, and explore scaling collapses
that summarize path length data for networks of different sizes and
numbers of near-neighbor connections.
Computation:
You will learn some basic algorithms for efficiently traversing and computing
path length distributions of arbitrary networks. You will
write a program to produce small world networks, i.e., 1D ring
shaped networks with random "short circuits". Write routines for
generating statistics of shortest connection paths between nodes in
these networks.
Procedure
By now, you should have completed the
Introduction to Networks exercise, and should have working code to
create undirected graphs. In what follows, you will continue
with the study of small world networks:
Creating
Small World Networks, Finding the Statistical Properties of Networks,
Finding the Statistical Properties of Small World Networks, and two
optional excercises: applying your program to the study of real
networks, and a study of "betweenness".
Creating Small World Networks
- Read the paper by Duncan Watts and Steven Strogatz,
Collective Dynamics of "small world" Networks.
(You might also be interested in the writeup by
Jon Kleinberg and David Easley,
"Handout on the small-world
phenomenon",
Networks: Economics 204 / Sociology 209 / Information Science 204.)
- Download the file SmallWorldNetworksHints.py (Hints file for small world nets) and
rename it as "SmallWorldNetworks.py". It is easiest if you store this file
in the same directory as the networks files you worked with earlier
(Networks.py and NetGraphics.py), since those files will be needed for
this exercise.
- Also download the MultiPlot.py software for use in making scaling collapses.
- Start up a new ipython session.
(This is not strictly necessary, but you will probably save
yourself grief, so that anything you did in previous sessions won't
interact or conflict with what you are doing now.)
- Construct a function MakeRingGraph that takes
two arguments, the number of nodes (num_nodes), and the number of edges (Z)
that each node has, and returns an UndirectedGraph object.
This really only makes
sense for even Z, in which case there will be edges connecting each node
to the Z/2 nearest neighbors to the left and the Z/2 nearest neighbors
to the right. This is a ring, so there should be periodic boundary
conditions. (Hint -- the Python operator that implements
modular arithmetic is %:
see the
python documentation)
- Test by typing run SmallWorldNetworks.py,
generating some networks, and plotting them using the commands in NetGraphics.
- Construct a function AddRandomEdges that adds a
given number of random edges to a graph. Test. (There are some subleties
here. You might imagine that, at random, the same pair of nodes might be
selected for an edge, in which case the number of unique edges in the graph
might be less than you initially specify. This will become less likely as
the size of the graph increases. Alternatively, you could check for the
number of unique edges, and stop once you've reached the specified number.)
- Construct the function MakeSmallWorldNetwork that
calls your two previous functions to generate a ring network of L
sites with Z edges per site, then add p*Z*L/2
random edges. Test. (p is typically a number less than 1.)
- As per the hints file construct MakeSmallWorldSimple.
Test.
Computing Path Lengths of Networks
- You will now write routines for computing path lengths between
pairs of nodes in a network. Although we are interested in this question
in the context of small world networks, the computation is applicable
to all networks. Therefore, you will add code to your network
infrastructure file "Networks.py" rather that the "SmallWorldNetworks.py"
file. In "Networks.py", write and debug the following routines:
FindPathLengthsFromNode, FindAllPathLengths, and
FindAveragePathLength.
- The algorithms are described in
Small World Exercise.
- The basic algorithmic approach is to use breadth-first search
to traverse the graph starting from a root node. Central to graph
traversal algorithms is the need to mark nodes as they are visited, so
they need not be revisited. The use of a Python dictionary serves a
dual purpose in that regard: both to store the pathlength to a given node,
and to mark that node as visited.
- Test.
Finding Statistical Properties of Small World Networks
- Go back to "SmallWorldNetworks.py".
Make and debug the routine MakePathLengthHistograms.
- Plot histograms for L=1000, Z=2, and p=0.02.
Repeat for p=0.2. Play with other p's.
What value of p's gives "six degrees of separation"?
- Use FindAverageAveragePathLength
on several random small world networks with L=100,Z=2, and p=0.1.
How wildly do these fluctuate?
- Quantify the fluctuations by writing the
FindAverageAveragePathLength routine,
which will calculate the mean and standard deviation of a set of random
small world networks. Calculate the mean with different sample sizes.
Does this agree with your previous observations? How many samples do you
need for the mean calculated this way to be representitive of the mean of
an infinite set of samples? [This is the concept in statistics of
"
standard error of the mean", which is distinct from
"
standard deviation".]
- Now that you know how to get the mean and standard
deviation for some fixed p, you want to make a table of the average
path length (and standard deviation) as a function of p.
Make the functions GetPathLength_vs_p and PlotPathLength_vs_p.
- Make the graph of mean vs p for L=50,Z=2, and p between
0.001 and 1.
- Compare with figure 2 of Watts and Strogatz
Collective Dynamics of "small world" Networks.
Since they are working with a different L, you should find that you
need to shift p for the results to coincide. See
Small World Exercise
for an explanation.
- Create and display a circle graph with (Z=2, L=50) at p=0.1.
Create the histogram of path lengths.
- Create and display a circle graph with (Z=10, L=1000) at p=0.1
and p=0.01. Create the histogram of path lengths.
How do these compare to your previous histogram?
- Write the routine PlotScaledPathLength_vs_pZL,
and plot Pi Z l/L (where l is the average path length) versus the total number
of shortcuts pLZ/2, for p ranging between 0.001 and 1. Use
L=100 and 200, and Z= 2 and 4. These scaling functions are described
and derived in the paper by Newman and Watts,
Renormalization--group analysis of the small world network model, and geometric intuition
is provided in
the Small World Exercise.
- Does this make sense?
Real World Networks (optional)
- Use some of your routines on real-world networks -- some good sources of data are the Barabasi group and the Community
structure in social and biological networks, and Mark Newman,
Scientific collaboration networks II: shortest paths, weighted
networks, and centrality.
Useful Constructions/Syntax
Science Concepts
Files
References
- Duncan Watts and Steven Strogatz,
Collective Dynamics of "small world" Networks
- Jon Kleinberg and David Easley,
"Handout on graph theory", Networks:
Economics 204 / Sociology 209 / Information Science 204
- Jon Kleinberg and David Easley,
"Handout on the small-world
phenomenon",
Networks: Economics 204 / Sociology 209 / Information Science 204
- Mark Newman,
The structure and function of complex networks
- Mark Newman,
Models of the Small World
- Mark Newman,
Scientific collaboration networks II: shortest paths, weighted
networks, and centrality.
- Mark Newman and Duncan Watts,
Renormalization--group
analysis of the small world network model
- Michelle Girvan and Mark Newman,
Community
structure in social and biological networks
- Yook, Oltvai, and Barabasi,
Comparing Different Protein Interaction Networks
Links
Notes
When you try to import a module in Python (e.g., via import somemodule), the interpreter looks through a sequence of directories to find the
specified module. This sequence (a Python list) is stored in the variable
sys.path: you can see what is in the default search path by
first importing the sys module (import sys) and then printing
the path (print sys.path). You will see that an empty directory
('') is in the path; this corresponds to your current directory.
For this reason, it is often easiest to put a group of related files together
in the same directory, such as is recommended here.
You will see that there are also directories in sys.path. These
indicate where various built-in and third-party packages are installed.
You can extend the default search path in a couple of different ways.
One is to define the shell variable PYTHONPATH. Any directories placed
in that list will be appended (i.e., stuck at the end) of sys.path.
In the bash shell, for example, one could type at the command line
or place in the ~/.bashrc file, a command like:
export PYTHONPATH=~/mypythonlibrary and
anything Python files in that directory would be accessible for import.
If you find yourself developing source files that you reuse for a number
of different projects, creating a central repository like this might be
useful. (Such a repository can be hierarchically structured as well.)
The other way to augment the path is to add to the sys.path
variable directly within your Python code: e.g.,
sys.path.append('~/mypythonlibrary') will add that directory
to your path, but only for the currently running session.
The multiplot software is available at:
Scaling collapse software (MultiPlot).
Conceptually, scaling collapses are extremely straightforward. In a
family of x-y data sets, the x axis in each set is scaled by one
formula, and the y-axis by another, where the formulas depend on
parameters that distinguish each data set from one another.
While conceptually simple, computing scaling collapses requires
managing multiple data sets, their associated parameter values, and
the functional form of the desired scaling collapse.
To address this complexity, we have written the MultiPlot module to
facilitate scaling collapses, although the collapses can certainly
be done without using MultiPlot.
The key to the MultiPlot module is
to store families of data sets in a Python dictionary, where the keys
to the dictionary are the parameter values for each data set, and the
values are scipy arrays containing the raw, unscaled data. The x-axis
data and y-axis data are stored separately in different dictionaries.
By using Python's capabilities for dynamic code evaluation, we can
encode (nearly) arbitrary functional forms for scaling collapses
in Python strings, rather than having to hardcode specific functional forms.
For example, let xdata and ydata describe data that were generated
for different values of parameters A and B, say, for A = 50 and 100,
and B = 2 and 4. Then xdata and ydata would be dictionaries
keyed on (A,B) tuples:
xdata[(50, 2)] = # x coordinates for A=50, B=2
xdata[(50, 4)] = # x coordinates for A=50, B=4
xdata[(100,2)] = # x coordinates for A=100, B=2
xdata[(100,4)] = # x coordinates for A=100, B=4
ydata[(50, 2)] = # y coordinates for A=50, B=2
ydata[(50, 4)] = # y coordinates for A=50, B=4
ydata[(100,2)] = # y coordinates for A=100, B=2
ydata[(100,4)] = # y coordinates for A=100, B=4
In MultiPlot, we can specify the functional form of the collapse as a
function of the parameters A and B. We do this by passing a
symbolic representation of the transformation function, encoded
in a Python string. The form of the transformation function is:
"unscaled_variable->scaled_variable".
For example, if we called the
unscaled coordinates "x" and
"y", and the scaling functions for those
variable were "x*A*B" and "y*A/B",
respectively, then the transformation
functions would be written as xform="x->x*A*B" and
yform="y->y*A/B".
(xform and yform are two arguments that are passed to the
MultiPlot.MultiPlot function.) In order for Python to be
able to numerically evaluate these symbolic expressions (which it does
using the eval() function), it needs to know what the values of
A and B are for each dataset. Of course, those values
make up the keys of the xdata and ydata dictionaries, but Python doesn't
know that. In order to associate the numerical values in the dictionary
keys with the symbolic string variables "A" and "B",
we use the keyNames argument to MultiPlot.MultiPlot.
keyNames is a tuple containing the string names of the keys
in the xdata and ydata dictionaries, in the order that they appear in
those keys. For our example here, therefore, keyNames=("A", "B").
To execute this scaling collapse, therefore, we would issue the
following command:
MultiPlot.MultiPlot(xdata, ydata, xform="x->x*A*B", yform="y->y*A/B", keyNames=("A","B"))
Additional features in MultiPlot: