Generation and sampling of set partitions

Consider a system of many components and restrict attention to the set C of its components, ignoring other structure for the moment.   In some applications we wish to optimize some global property that is a function of a partition of C.  For example, the components may be nodes in a communication network, and we wish to distribute a certain number of functions over the nodes, in such a way as to minimize the total distance of each node from the nearest provider of each function.

The focus of the project would be algorithm and software design - we want practical methods that are as fast as possible, as well as easy to use.

The mathematical structures we need are as follows (wlog, C is a set of integers, being the labels of the nodes):

A (set) partition of [n]={0,2,...,n-1} is an assignment of each element to a subset called a block, without regard to the labelling of  the block, or order of elements within a block. The block structure of a partition may be indicated by listing the elements and separating blocks with a symbol such as |. Thus for [5], 01|2|34 and 2|10|43 are the same partition.  Contrast this concept with an integer partition, such as 5=1+2+2.   To each  integer partition of n corresponds one or more set partitions of [n].

If there are k blocks, we speak of a k-partition. The number of k-partitions of [n] is given by the Stirling number of the second kind S(n,k), which can be computed from the recursions S(n,k)=S(n-1,k-1) + kS(n-1,k), S(n,1)=S(n,n)=1. The total number of partitions of [n] is given by the sum over k of S(n,k), and this is called a Bell number B(k).

There are algorithms for generating all partitions in a Gray code order, due to Ehrlich and Ruskey (see fasc3b.ps at http://www-cs-faculty.stanford.edu/~knuth/news.html, if it's still there). In such a sequence, only one element moves to a new block on each step.   Thus, for sufficiently small systems, we may solve our optimization problem by explicit enumeration of all partitions.    The Gray code property may be used to speed up evaluation of the objective function.  Part of the project will be to study the algorithmic complexity of these algorithms, and also to measure how large C may be in practice.   We may want to generate all partitions, or just partitions with a specifed number of blocks.

Beyond about n=15, there are too many partitions to allow us to construct all of them in reasonable time. Thus, for large sets, we could instead use a uniform random sampling procedure to generate some of the partitions.    This would not solve our optimization problem with certainty, but would enable use to find reasonably good partitions.  Methods for this are known, but it's unclear how fast they are in practice.

This project would investigate all these methods and produce a unified, well-designed and documented C library, preferably for public release.   As far as I know, no such library exists.  Each function would in fact be quite small, but the art is in designing a uniform interface for them all.