Background
This course contains multiple exercises that address different aspects of complex networks. These are:
- Introduction to Networks (below): a prerequisite to the remaining exercises, introducing basic aspects of network construction and traversal.
- Small World Networks: explores the statistics of path lengths in random networks, and the importance of a small number of long-range shortcuts in creating small worlds.
- Percolation: explores the connectivity of random networks on regular lattices, and the universality of critical behavior near the percolation transition.
- Exploration of network growth, structure, etc.
- Random Network Generation (under construction)
Introduction to Networks
This is the first of several exercises on networks. A network (or graph, as it is often referred to in mathematics) is a data structure in which nodes are connected by edges, which provides a very general concept that plays a role in many scientific problems. This first exercise is focused on defining networks and on implementing algorithms to compute some of their properties. As such, it lays the groundwork for particular scientific applications in subsequent exercises, aimed at analyzing "small worlds" and percolation in networks.
In particular, in this exercise we will generate some random networks and calculate the distribution of distances (path lengths) between pairs of nodes. We will study small world networks, a theoretical model that suggests how a small number of shortcuts (e.g., unusual international and intercultural friendships in a social network) can dramatically shorten the tyical path lengths. Finally we will study how a simple, universal scaling behavior emerges for large networks with few shortcuts.
Learning Goals
Science: You will learn what a graph/network is. This exercise in largely about building infrastructure for use in more interesting scientific problems, such as the study of small worlds and percolation.
Computation: You will learn some basics of object-oriented programming: what a class is, how to use and manipulate them, and how they capture abstract relationships among data.
Procedure
This introductory exercise is focused on defining undirected graphs/networks.
The UndirectedGraph Class
- Create a new directory (named, for example, "Networks").
- Download the following files:
- Hints file for general network algorithms
-
Network graphics software
- NOTE: If you encounter errors regarding fonts in the Network Graphics software, download this file instead and save it as NetGraphics.py.
- NOTE: If you are running these programs on your own machine (i.e, not on the course lab machines), you might also need to download font file Vera.ttf and save it in the same directory that the Network graphics software is installed.
- Copy the hints file you downloaded to a new file that you will edit, e.g., cp NetworksHints.py Networks.py .
- If you are not familiar with graphs and networks, read Jon Kleinberg and David Easley, "Handout on graph theory", Networks: Economics 204 / Sociology 209 / Information Science 204.
- Open your Networks.py file in an editor, start up whichever ipython environment you'd like to use, and enter %run Networks.py in the ipython session. This will execute the contents of the file within your ipython session.
- In your eduitor, define a class called UndirectedGraph, which will create
objects that represents graphs.
- See the notes below for further discussion of different types of networks.
- See the notes below for further discussion of the role of objects and classes in object-oriented design.
- Define the following Method for your class.
- __init__
- This is the method or "subroutine" that is run everytime you create a new instance of the class. For this class, the initialization routine should take one object (called self) which represents the instance. We want this initialization routine to create the attribute self.connections, and set it equal to the empty dictionary {} .
- __init__
- In your eduitor, define a class called UndirectedGraph, which will create
objects that represents graphs.
- Test your routine by saving your "Networks.py" file and re-executing %run Networks.py in the ipython session. Create an instance of your graph class by typing testgraph1=UndirectedGraph().
- Type testgraph1.connections to see if your initialization worked right. testgraph1 is now an instance of your class, although at this point it is a rather uninteresting instance, since it only contains an initializer that creates an empty dictionary.
- Go back to your editing environment and complete your
UndirectedGraph class by defining the following methods.
Following each definition test the routine. This strategy of testing
every routine immediately after writing it is the key to rapid software
development. Ideally you should never have more than one routine which
you are not certain works correctly. Moreover, one typically
plays with the internal workings of the routine in an interactive session before encapsulating it.
The hints file gives a description of what each of the methods should do.
- HasNode
- This is hard to test fully until you have defined AddNode. Without any nodes added, however, HasNode should return False for any node that is queried.
- AddNode
- Test by typing testgraph1.AddNode(1), then look at the dictionary by typing testgraph1.connections (or, if you enjoy extra typing, print testgraph1.connections). The command testgraph1.HasNode(1) should now be True while testgraph1.HasNode(2) should be False. It should be clear how to test the rest of the methods.
- AddEdge
- GetNodes
- GetNeighbors
- HasNode
- Next it would be nice to visualize the graph. We have
written some visualization software. To see a graphical representation of
your test graph type: NetGraphics.DisplayCircleGraph(testgraph1).
- Here we've introduced the notion of accessing a function from another module. See the notes below to learn more about modules and namespaces.
- See the notes to consider further issues regarding the interplay of visualization tools and core data structures.
- For further testing, your Networks.py file contains
a predefined function called pentagraph. Type
testgraph2=pentagraph() to run the function and assign the
output to a new object called testgraph2.
This should create a graph
and display a figure shaped like a pentagram.
- See notes below for a discussion of class privacy issues in Python.
- You now have a working class, and we can start to use it for something. Proceed to either the Small Worlds Exercise or the Percolation Exercise.
Useful Constructions/Syntax
- class: Python tutorial, nanotutorial
- Scope, namespaces and use of the "dot" operator.
- Dictionaries, lists, tuples, and other data structures
- Looping (for x in list : dosomething(x))
- Comparison (if, elif, else)
- range
- Functions:
- Tutorial
- keywords: def, return
- Technical Documentation
- Assignment (=):
- Modules and import
- reload
Science Concepts
Notes
Different types of networks
Networks and graphs come in many flavors. Undirected graphs involve edges that have no intrinsic directionality, i.e., the two nodes connected by the edge are symmetric. In directed graphs, edges have directionality: the edge points from one node to the other, introducing an asymmetry. We will be studying edges without weights (i.e., all edges are equivalent), but in many important problems, edges have weights. In this exercise, we're using the specific name UndirectedGraph to be clear and avoid confusion.
Objects and classes
- Objects represent as dramatic a conceptual leap in programming as
functions. A good way to think about it is that functions are verbs,
and objects are nouns. Functions are designed so that all you need to
know about them is what the form of the input is and what the form of
the output is -- you do not need to know what the implementation is to
use a function. Well, objects are the same way. An object has some
attributes which allows it to represent something useful (such as a
graph). The object carries with it some methods of accessing and
changing its properties (these are functions called "methods"). As
long as you only talk to the object through those methods, you do not
need to know anything about the internal implementation.
A class can either be thought of as a template for an object, or perhapse more concretely as a function which creates an object. When you call a class you create an instance of the class. This instance is the object.
A quick introduction to classes can be found in NanoTutorial 1. In particular, if you are confused about the syntax of method (function) definitions in classes and the seemingly mysterious self variable, you should reach that NanoTutorial or the textbook to find out more.
- We use a class to represent our graphs, since this lets us manipulate our computer representation of the graph just as we think about manipulating real graphs. A network has nodes and edges. An UndirectedGraph object should carry with it the information needed to tell you what nodes and edges it has. It should also provide tools for adding nodes and edges.
- Encaspulation is an important concept in object-oriented programming, and in flexible software design generally. The idea with objects is that the user of the class should not need to know how things are implemented. As the programer, however, you have to choose some implementation.
- A class is used to define the space of all possible networks (UndirectedGraphs), while a network instance (or object) is a concrete entity that represents a particular network. The methods of the network class are the public face of the object. If you want to know what nodes the object testgraph1 contains you call the method GetNodes by typing testgraph1.GetNodes(). If you whant to know if there is a node labeled by the numeral 1, you would type testgraph1.HasNode(1). In cleanly written object-oriented programs, objects should always be manipulated through their methods, and the external interface embodied in the methods should be independent of the internal representation of data. For example, you could have written your network object so that the neighbors were stored in an adjacency matrix (table). The matrix would have as many columns and rows as the network has nodes, and it would have 1's and 0's representing which nodes were connected and which were not. Such a representation of the network should have the exact same methods. The user of the class should not see a difference between the behavior of the two objects.
- By using this idea of encapsulation you can change the implementation at a later time without changing the rest of your program.
- Another advantage of this approach is that suppose at some later time you work with a related object, say a "directed graph" (which means that the edges have directions, pointing from a source node to a destination node). This new object will have many of the same methods. Any tools that you write which interface with the object by using the common methods should work without any change.
- [At some later point in your programing career you can learn about "inheritance" which lets you reuse methods written for one object in another object.]
Modules and namespaces
A module is a way of grouping together a set of classes and functions that are used for a particular purpose. For example, in this exercise you generate a networks module that has functions and classes used for manipulating networks. In python, you load a module with the command import, although the ipython interpreter provides the shortcut function %run that imports everything in the specified file into the __main__ namespace.Namespaces are a mechanism for organizing the names of variables (objects, functions, classes, etc.), so as to avoid giving two variables/functions/classes/objects the same name. Suppose you are writing a program and you defined a function f. Suppose later you load the usefulstuff module (by typing import usefulstuff) and usefulstuff happens to also define a function f. There could be a conflict here. You don't want the f in usefulstuff to overwrite the f that you defined. The way python avoids this problem is that the two f's are in different namespaces. To access the usefulstuff function "f", you must type usefulstuff.f. You can think of the . in the same way you think of "/" when you are navigating directories. You can change the namespace that you import a module into by typing import usefulstuff as u -- which would put usefulstuff in the namespace u. This saves typing, but in the end can lead to more confusion. (You can also circumvent the protection that namespaces provide by typing from usefulstuff import *, in which case everything defined in usefulstuff will be imported into the space doing the importing, which could result in your previously defined functions being overwritten.)
Note that the syntax for calling the display function contained in the NetGraphics module is very similar to the syntax used to call methods of a class. This similarity is intentional: NetGraphics is a module, DisplayCircleGraph is a function inside that module, and the "dot" operator . is used to access the function within the module's namespace, just as the dot operator is used to access a method within an object's namespace.
Interplay of visualization tools and core data structures
As an aside, one could add to UndirectedGraph a method called Display, so that if you typed testgraph1.Display() it would call DisplayCircleGraph, creating a picture of the graph. One would say that the object knows how to display itself. In certain contexts this might be what you want. There are, however, several downsides:- Conceptually, displaying a graph is a complicated process. For example, you need to decide how to lay out the points. It is possible that in the future some user of your network class will want to play with different ways of laying out the points. To the extent that a network is usually only concerned with topology (how things are connected) and not geometry (where things are), requiring this layout information to be inside the graph object is undesirable. [On the other hand, one might at a later time create a subclass which knows about geometry, and that subclass might want to know how to display itself -- or perhaps not -- the display function could also simply ask the object for the geometry and use it.]
- It makes for a clean program to have one object and feed it into different display functions.
- You may be programing on different platforms -- s ay one runs X-windows, and another runs Microsoft Windows. You may need to use different display functions on the different machines.
- By separating the object and how it is displayed, you are more likely to produce reusable code.
Privacy in Python
Many object-oriented languages have a syntactic notion of privacy: data in a class that are declared private, for example, are incapable of being modified from outside the class. Python does not contain such constructs, generally relying instead on programmers' recognizing the value of encapsulating internal data and working instead through method interfaces. Although we have created the pentagraph using the externally defined methods like AddEdge, we could instead break the encapsulation of internal data and manipulate the connections dictionary directly. (If you're feeling adventurous, go ahead and try it: after creating the graph, now type testgraph2.connections[2].append(3). Any idea what this will do? Run NetGraphics.DisplayCircleGraph(testgraph2) to see. Looks can be deceiving, though. Even though node 3 is now a neighbor of node 2, is the converse true? Use GetNeighbors to check.) Breaking the data encapsulation makes for a fragile program. In scientific computing this may not be a bad thing. If you are only going to use a function once, then it might not be worth the time to figure out how to better design things. It is a good exercise to think of alternative ways to implement this function -- for example, you could add a SetNeighborDict method to write the entire neighbor dictionary by hand, or a SetNeighbors(node) to set the neighbors of a given node.
"Applying" a function to a tuple of arguments
Here is a useful construction that you might want to learn. Create a "tuple" of nodes: newnodes=(4,5). Suppose you wanted to add these nodes (and an edge between them) to testgraph1. The obvious way is to type testgraph1.AddEdge(newnodes[0],newnodes[1]). The shorter way is testgraph1.AddEdge(*newnodes). The asterix essentially "strips" the elements of the tuple so you can apply the function to them; that is, a function that accepts N arguments can be fed instead a tuple containing N elements. (This is equivalent to the Apply function in Mathematica.)