## 2011 Spies Prize: Robert Bradshaw

This year, Robert Bradshaw is the winner of the Annual Spies Sage Development Prize. Congratulations, Robert! Here is the prize citation:

Robert Bradshaw has been an extremely active and productive Sage developer for over five years. Additionally, he has been a leader, both in maintaining the community and in important design decisions.

He is probably best known for his work on Cython, which is critical for the performance of many key parts of Sage, and his work designing and implementing the coercion model, which makes many powerful mathematical constructions possible. However, his interests and significant contributions are wide-ranging, including: exact linear algebra, arithmetic of elliptic curves, L-functions, 3-D plotting and parallel building. A recent project is the patchbot tool, which automates testing contributions posted on trac. Moreover, he is an important contributor to trouble-shooting and design discussions in the sage-devel forum and is also the third most numerous poster of all time in the sage-support forum.

For his many important technical contributions, and his long-time and continuing involvement in the Sage community, Robert Bradshaw is awarded the 2011 Spies Sage Development Prize. This award carries a prize of $500 from the Sage Foundation (thanks to Jaap Spies).

## Documenta Mathematica now mirrored at sagemath.org

I’m delighted to announce that the journal Documenta Mathematica is now mirrored on the Sage website at http://www.sagemath.org/documenta/. Documenta Mathematica is an open access mathematics journal. It is open to all fields of mathematics. Articles are refereed in the traditional anonymous peer review model, which is what any respectable journal does.

## Statistical analysis of the Fisher-Yates shuffle

The Fisher-Yates shuffle is a procedure for producing a random permutation of a sequence. This procedure is also known as the Knuth shuffle. Here I provide a statistical analysis of an implementation of the Fisher-Yates shuffle. A central idea is that any permutation of a sequence should equally likely be an output of the Fisher-Yates shuffle. That is, in a large enough number of shuffles of a fixed sequence, the observed probability of each permutation produced by a Fisher-Yates shuffle implementation should cluster around or converge to the theoretical probability for that permutation. As the number of shuffles increases, the observed probability for each possible permutation should converge to the theoretical probability. Otherwise there is something wrong with the implementation. I used my implementation of the Fisher-Yates shuffle to produce random permutations of various simple sequences of digits. The resulting output of the shuffles were used to perform frequency analyses of the behaviour of the implementation. Following are details on the particular sequences and the number of iterations for each sequence. Iteration here counts the number of times that I shuffled the given sequence. An experiment on a sequence is then the totality of all shuffles performed on it.

- Sequence: 123. Iterations: 1,000,000
- Sequence: 1234. Iterations: 1,000,000
- Sequence: 12345. Iterations: 1,000,000
- Sequence: 123456. Iterations: 10,000,000
- Sequence: 1234567. Iterations: 100,000,000
- Sequence: 12345678. Iterations: 100,000,000
- Sequence: 123456789. Iterations: 100,000,000

Each sequence was initialized as follows. Let be a vector where holds the digit . So holds the digit , holds , and so on. Two versions of the experiment was performed on each sequence. In the first version of the experiment, called version A, at the start of each iteration, I initialized to be as described above. Then I randomly permuted the vector. In version B of the experiment, I first initialized to be as above. Then I proceeded to repeatedly randomly permute . Thus if is the permutation obtained from iteration , then during iteration I applied the Fisher-Yates shuffle on to obtain . These two different versions of each experiment on a sequence were performed to see whether if they would produce qualitatively identical results. The experimental results suggest so: the two different versions of each experiment produced qualitatively similar results.

Source code of the experiments are provided here. Note that in order to compile the C files, you need to check out igraph trunk from Launchpad, apply the patch on this ticket, and then compile and install the resulting patched igraph version on your system. The C files containing the code for the experiments output the result of each shuffle to a file. For small sequences with say 3 to 4 digits, the resulting output files are a few MB in size. But for longer sequences, such as with 5 or more digits, the output files can be from tens of MB to hundreds of MB in size. The experimental data are easily generated from the above C files, so I do not provide the data. The data for each experiment were analyzed using the Python script fisherstat.py. If you intend to replicate the experiments, you need to adjust this script for each data file of each experiment. Given a data file for each experiment, the Python file is loaded from within the Sage command line interface; everything from then on is automated, from reading the experimental data to computing the frequency distribution. All experiments were run on the Sage cluster, in particular the sage.math compute node, whose purchase is supported by US National Science Foundation Grant No. DMS-0821725. Data analysis was performed using the Sage mathematics software system version 4.6.2.

Also note that the Bitbucket.org project link also points to PDF files. These files plot the normalized frequency distributions of the experimental data. The horizontal axis of each plot is for the permutation IDs. Each permutation of a fixed sequence is assigned a unique ID starting from 0. For example, for the sequence “123” here are all the possible permutations together with their corresponding IDs:

123 -> 0 132 -> 1 213 -> 2 231 -> 3 312 -> 4 321 -> 5

The vertical axis contains the corresponding normalized frequency of each permutation. Each frequency count was normalized by the number of iterations for the corresponding experiment. See the script fisherstat.py for further details. The normalized frequency for a permutation can be thought of as the empirical probability of that permutation showing up as a result of a Fisher-Yates shuffle.

And now comes the fun bit: plots of the experimental data. As I said above, both versions of each experiment produced qualitatively similar results. For this reason, below I only show some plots for version A of each experiment. To see all plots including plots for version B, refer to the Bitbucket.org project page. For kicks, each PDF file containing a plot was typeset using LaTeX and pgfplots.

As is evident from the above plots, for each sequence considered the empirical probabilities resulting from the experiments cluster around the theoretical probabilities. For a sequence of 3 or 4 digits, the empirical probabilities converge to the theoretical probability after a million or so experimental iterations. For example, the sequence “123” has six possible permutations so each permutation has a theoretical probability of of occurring as a result of the Fisher-Yates shuffle. In the above plot for the sequence “123”, it can be seen that the empirical probabilities converge to the theoretical probability after one million iterations. But as the number of digits in a sequence increases, the number of experimental iterations needs to increase as well in order to observe a convergence of the empirical probabilities to the theoretical probability for that sequence. For example, after iterations for the sequence “123456789”, observe that the empirical probabilities still cluster around the theoretical probability of . As the iteration number increases, the range of empirical probabilities should converge to the theoretical probability.

## Version 0.7 of book “Algorithmic Graph Theory” released

Here is version 0.7 of the book Algorithmic Graph Theory. The relevant download options are:

Version 0.7 fleshes out the chapter “Random Graphs”. Here is the content of the chapter in brief:

- Network statistics
- Binomial random graph model
- Erdos-Renyi model
- Small-world networks
- Scale-free networks

## Version 0.5 of the book “Algorithmic Graph Theory”

I’m happy as a clam to announce version 0.5 of the book Algorithmic Graph Theory for your reading pleasure.

The main focus of this release is to flesh out the chapter on trees and forests. Along the way, numerous problems/exercises are added to the introductory chapter “Introduction to Graph Theory” and the chapter “Graph Algorithms”. Needless to say, there are also the multitude of typo fixes throughout the book. We, the authors of the book, gratefully acknowledge contributions from the following people while preparing this release:

- Caroline Melles
- Pravin Paratey

See the section “Acknowledgments” in the book for full details on their contributions. Here is an outline of topics covered in the newly fleshed out chapter “Trees and Forests”:

- Definitions and examples relating to trees and forests.
- Various basic characterizations of trees.
- Techniques for constructing minimum spanning trees: a randomized spanning tree construction algorithm and the usual suspects including Kruskal’s algorithm, Prim’s algorithm, and Boruvka’s algorithm.
- Binary trees and an algorithm to construct a random binary tree. Application topics include coding theory, Gray code, and Huffman code.
- The usual suspects of tree traversal algorithms: level-order, pre-order, post-order, and in-order.

## Compiling Sage on Linux and Mac OS X

Compiling Sage can be a difficult process. The difficulty is compounded by the many versions of the GNU/Linux and Mac OS X operating systems. In this post, I don’t want to cover every possible ways of compiling Sage on each operating system. My goal here is to provide a general overview of how to compile Sage on Ubuntu 10.10 and Mac OS X 10.6.x.

**Ubuntu 10.10**

For Linux, it should be sufficient to explain how to compile Sage under Ubuntu. The following commands were tested under Ubuntu 10.10. First, make sure you have the necessary tools for compiling Sage. To install these tools, execute the following command from a terminal window:

$ sudo apt-get install build-essential gfortran m4 readline-common libreadline-dev

Suppose you have downloaded the Sage source distribution and it is saved as the file named sage-x.y.z.tar. The file extension “.tar” signifies that sage-x.y.z.tar is a tar archive, also called a tar file or a tarball. To compile Sage, you need to launch a terminal window, extract the Sage source files from the tarball, and use development tools that come with Ubuntu to compile Sage. Suppose the Sage source tarball is located at

/home/username/sage-x.y.z.tar

where “username” is your username, the one you use to login to your Ubuntu system; you should replace “username” your actual username. Say you want to compile Sage under the directory /home/username/. From your terminal window, navigate to your home directory and extract the source tarball:

$ cd $ tar -xf sage-x.y.z.tar

which will produce a directory called

/home/username/sage-x.y.z/

that contains all the packages that are distributed with the Sage source distribution. You are now ready to compile Sage as contained under the latter directory. Note that you can rename the latter directory to, say, /home/username/mysage/ as follows:

$ mv sage-x.y.z mysage

If you have a directory, say, /home/username/bin/ and you want to extract the Sage source tarball to that directory, you can do so as follows:

$ tar -xf sage-x.y.z.tar -C /home/username/bin

Here, it is assumed that you want to compile the Sage source as contained under

/home/username/sage-x.y.z/

This top-level directory is usually referred to as SAGE_ROOT and is where the compiled version of Sage resides. The next step is to navigate to the latter directory and start the automated compilation process:

$ cd /home/username/sage-x.y.z/ $ make

Usually, that is all you need to do. The rest of the compilation process is non-interactive and takes at most a few hours, depending on your system.

Note that compiling Sage is very taxing on your system’s resources, i.e. CPU, RAM, disk space, and so on. Make sure you have sufficient system resources; for example, you could close all running applications and ensure you have at least 2 GB of free disk space for Sage. Furthermore, the compilation process can take from a dozen or so minutes to a few hours, depending on your system’s hardware and how you customized the compilation process. By default, the compilation process uses one thread to compile Sage, which means that every component is compiled one after the other, i.e. serial compilation. Serial compilation of Sage is known to take up to an hour or more to complete.

On a multi-core or multi-CPU system, the compilation process can be parallelized. This means that many components of Sage can be compiled in parallel, hence reducing the time it takes for the compilation process to complete. The general steps to compile Sage in parallel are as follows. Determine the number of cores on your system and decide on how many threads you want to devote to a parallel compilation of Sage. On Ubuntu, detailed information about your system’s CPU is contained in the file

/proc/cpuinfo

Read through that file to determine the number of cores on your system. Say your system has two cores and you want to use two threads for compiling Sage in parallel. Then the necessary setup to compile Sage with two threads is:

$ SAGE_PARALLEL_SPKG_BUILD="yes"; export SAGE_PARALLEL_SPKG_BUILD $ MAKE="make -j2"; export MAKE $ make

This should initiate the process of compiling Sage in parallel. If your system allows for compiling Sage with more threads, e.g. 4 or 8 threads, then change “-j2” to “-j4” or “-j8” accordingly.

Once all components of Sage are successfully compiled, the compilation process will automatically build the HTML version of the Sage standard documentation. This documentation includes a tutorial, an FAQ, a collection of thematic tutorials, a reference manual, an installation guide, among other documents. Building the HTML version of the Sage standard documentation does not require . However, for best results, it is recommended that you have installed on your system. Furthermore, is a prerequisite for building the PDF version of the Sage standard documentation. The following command installs the full distribution that comes with Ubuntu:

$ sudo apt-get install texlive-full

This will also install necessary tools for building both the HTML and PDF versions of the Sage documentation. To manually build the HTML version of the documentation, do the following from within the SAGE_ROOT directory:

$ ./sage -docbuild all html # or $ ./sage -docbuild --no-pdf-links all html

To build the PDF version of the documentation, do

$ ./sage -docbuild all pdf

For further options to customize your build of the Sage standard documentation, see the output of the following command:

$ ./sage -docbuild

You can also download the documentation from

http://www.sagemath.org/help.html

or view it online at

To customize the compilation process to your particular needs, refer to the file

SAGE_ROOT/Makefile

Once the compilation process completes, there is no separate installation process for your newly compiled Sage. You can think of Sage as being compiled and installed under SAGE_ROOT. To begin using your freshly compiled version of Sage, navigate to SAGE_ROOT and launch Sage as follows:

$ cd /home/username/sage-x.y.z/ $ ./sage

**Mac OS X 10.6.x**

For Mac OS X, most of the prerequisites for compiling Sage are bundled with XCode, which is freely available from Apple’s website. Install XCode as directed by Apple’s installation instructions and download the latest Sage stable source distribution from the Sage website. If you have XCode installed on your system, then the following commands should not result in any errors:

$ which gcc /usr/bin/gcc $ which g++ /usr/bin/g++ $ which make /usr/bin/make $ which m4 /usr/bin/m4 $ which perl /usr/bin/perl $ which tar /usr/bin/tar $ which ranlib /usr/bin/ranlib

Suppose you have downloaded the Sage source distribution and it is saved as the file named sage-x.y.z.tar. The file extension “.tar” signifies that sage-x.y.z.tar is a tar archive, also called a tar file or a tarball. To compile Sage, you need to launch a terminal window, extract the Sage source files from the tarball, and use tools that come with XCode to compile Sage. A terminal window or a console is usually launched when you run the program Terminal, which is found at

/Applications/Utilities/Terminal.app

under Mac OS X 10.4.x. For Mac OS X 10.5.x and 10.6.x, use the application finder to locate Terminal and then launch Terminal. Suppose the Sage source tarball is located at

/Users/username/sage-x.y.z.tar

where “username” is your username, the one you use to login to your Mac OS X system; replace “username” with your actual username. Say you want to compile Sage under the directory /Users/username/. From your terminal window, navigate to your home directory and extract the source tarball:

$ cd $ tar -xf sage-x.y.z.tar

which will produce a directory called

/Users/username/sage-x.y.z/

that contains all the packages that are distributed with the Sage source distribution. You are now ready to compile Sage as contained under the latter directory. Note that you can rename the latter directory to, say, /Users/username/mysage/ as follows:

$ mv sage-x.y.z mysage

If you have a directory, say, /Users/username/bin/ and you want to extract the Sage source tarball to that directory, you can do so as follows:

$ tar -xf sage-x.y.z.tar -C /Users/username/bin

Here, it is assumed that you want to compile the Sage source as contained under

/Users/username/sage-x.y.z/

This top-level directory is usually referred to as SAGE_ROOT and is where the compiled version of Sage resides. The next step is to navigate to the latter directory and start the automated compilation process:

$ cd /Users/username/sage-x.y.z/ $ make

Usually, that is all you need to do. The rest of the compilation process is non-interactive and takes at most a few hours, depending on your system.

Note that compiling Sage is very taxing on your system’s resources, i.e. CPU, RAM, disk space, and so on. Make sure you have sufficient system resources; for example, you could close all running applications and ensure you have at least 2 GB of free disk space for Sage. Furthermore, the compilation process can take from a dozen or so minutes to a few hours, depending on your system’s hardware and how you customized the compilation process. By default, the compilation process uses one thread to compile Sage, which means that every component is compiled one after the other, i.e. serial compilation. Serial compilation of Sage is known to take up to an hour or more to complete.

On a multi-core or multi-CPU system, the compilation process can be parallelized. This means that many components of Sage can be compiled in parallel, hence reducing the time it takes for the compilation process to complete. The general steps to compile Sage in parallel are as follows. Determine the number of cores on your system and decide on how many threads you want to devote to a parallel compilation of Sage. On Mac OS X, detailed information about your system’s hardware is obtained via the command

$ system_profiler

Say your system has two cores and you want to use two threads for compiling Sage in parallel. Then the necessary setup to compile Sage with two threads is:

$ SAGE_PARALLEL_SPKG_BUILD="yes"; export SAGE_PARALLEL_SPKG_BUILD $ MAKE="make -j2"; export MAKE $ make

This should initiate the process of compiling Sage in parallel. If your system allows for compiling Sage with more threads, e.g. 4 or 8 threads, then change “-j2” to “-j4” or “-j8” accordingly.

Once all components of Sage are successfully compiled, the compilation process will automatically build the HTML version of the Sage standard documentation. This documentation includes a tutorial, an FAQ, a collection of thematic tutorials, a reference manual, an installation guide, among other documents. Building the HTML version of the Sage standard documentation does not require . However, for best results, it is recommended that you have installed on your system. Furthermore, is a prerequisite for building the PDF version of the Sage standard documentation. To manually build the HTML version of the documentation, do the following from within the SAGE_ROOT directory:

$ ./sage -docbuild all html # or $ ./sage -docbuild --no-pdf-links all html

To build the PDF version of the documentation, do

$ ./sage -docbuild all pdf

For further options to customize your build of the Sage standard documentation, see the output of the following command:

$ ./sage -docbuild

You can also download the documentation from

http://www.sagemath.org/help.html

or view it online at

To customize the compilation process to your particular needs, refer to the file

SAGE_ROOT/Makefile

Once the compilation process completes, there is no separate installation process for your newly compiled Sage. You can think of Sage as being compiled and installed under SAGE_ROOT. To begin using your freshly compiled version of Sage, navigate to SAGE_ROOT and launch Sage as follows:

$ cd /Users/username/sage-x.y.z/ $ ./sage

## Is the Sage command line limiting?

The central issue I want to address in this post is: Does the Sage command line interface, which is text-based, limit what you could do with Sage?

My short answer is, “No”. With careful use and experience, the Sage command line interface can be a powerful tool. Say you want to step through a function to understand how it works. As of Sage 4.6, the Sage command line is the only place where you can interactively step through a calculation, via the function trace(). As another example, say you have a Sage script called

/home/username/myscript.sage

and you want to edit it and load it immediately after each edit. From the command line, do

sage: %edit /home/username/myscript.sage

to load your Sage script, and edit it. After you are done, quit your editor and you should be back at the Sage command line, with your updated script automatically loaded.

Or say you want to interact with your operating system without having to quit Sage. From the Sage command line, do

sage: !sh

and you should be presented with a shell for interacting with your system. When you are done, enter “exit” and you should be back to the Sage command line. From the Sage notebook interface, giving users a shell can open the door to many security issues. The issue becomes clear when you are managing a public Sage notebook server.

Or how do you view the Sage standard documentation from the command line? To view the documentation of function foo(), from the Sage command line do

sage: foo?

To view the source code of function bar(), from the Sage command line do

sage: bar??

To view the Sage tutorial, do

sage: tutorial()

To view the Sage reference manual, do

sage: reference()

How do you find that function you are looking for? Do

sage: search_src("the function name") # or sage: search_doc("the function name") # or sage: search_def("the function name")