Non-uniform random number generators

by Agner Fog

The file stocc.zip contains a C++ class library of non-uniform random number generators providing variates with the following distributions: normal, bernoulli, poisson, binomial, hypergeometric, Wallenius' noncentral hypergeometric, and Fisher's noncentral hypergeometric distribution. Most distributions are available in both univariate and multivariate versions. There is also a function to shuffle numbers.

Most of these functions are fast and accurate, even for extreme values of the parameters.

The functions are based on the uniform random number generators in randomc.zip or randoma.zip.

File list

The archive stocc.zip contains the following files:

stocc.htm
This file. Instructions
stocc.h
C++ header file containing class definitions.
You must #include this in all C++ files that use this library.
stoc1.cpp
C++ source code for the non-uniform random number generators for the most common probability distributions.
stoc2.cpp
Alternative code for the distributions poisson, binomial and hypergeometric. Optimized for situations where these functions are called repeatedly with the same parameters.
stoc3.cpp
Source code for generating random variables with the univariate and multivariate Wallenius' and Fisher's noncentral hypergeometric distributions.
userintf.cpp
System-dependent user interface code
fnchyppr.cpp wnchyppr.cpp erfres.cpp
Additional source code used by stoc3.cpp
ex-stoc.cpp
Example showing how to use this class library to make random numbers with various distributions.
ex-lib.cpp
Example showing how to use a combined random number generator from the assembly language library as base for the non-uniform random number functions.
ex-cards.cpp
Example showing how to shuffle a list of items randomly. Shuffles a deck of cards.
ex-lotto.cpp
Example showing how to generate a sequence of random integers where no number occurs more than once.
testpois.cpp,  testbino.cpp,  testhype.cpp, testfnch.cpp, testwnch.cpp, testmfnc.cpp, testmwnc.cpp
Test programs for testing the poisson, binomial, hypergeometric, and univariate and multivariate Fisher's and Wallenius' noncentral hypergeometric distribution generators. Also useful for calculating mean and variance of these distributions.
distrib.pdf
Definition of the statistical distributions that can be generated with this package.
sampmet.pdf
Sampling methods. Theoretical explanation of the methods used for generating variates with these distributions.

Instructions

Unpack the files randomc.zip and stocc.zip into a new directory. Make sure all the unpacked files are in the same directory. To get started, just compile one of the example programs for console mode using any C++ compiler.

The C++ class StochasticLib can be derived from any of the random number generator classes in randomc.zip. Choose the one you like best. The examples use the "Mersenne Twister" class TRandomMersenne  because this is the random number generator that has the fewest portability problems. To use a different random number generator as base class, just change the 
  #define RANDOM_GENERATOR TRandomMersenne 
directive in the file stocc.h or make such a define before the 
  #include "stocc.h"
statement in all your modules.

The C++ class StochasticLib2 is derived from StochasticLib. It contains alternative versions of the functions poisson, binomial and hypergeometric.  StochasticLib is the best choice if the parameters to these functions often wary in your program, StochasticLib2  is faster when these functions are called many times with the same parameters.

The C++ class StochasticLib3 is derived from StochasticLib or StochasticLib2. It adds the Fisher's and Wallenius' noncentral hypergeometric distributions, and the multivariate versions of these, to the library.

The header file stocc.h must be included in all modules that use this class library. The source code file stoc1.cpp can be included in your project, either as an #include, or as a separate module. If you are using StochasticLib2,  then you need both stoc1.cpp and stoc2.cpp. If you are using StochasticLib3,  then you need both stoc1.cpp and stoc3.cpp.

It is recommended that you make only one instance of StochasticLib or  StochasticLib2 or StochasticLib3. These stochastic library classes inherit the uniform random number functions from their base class. Therefore, one object of StochasticLib3 gives you access to all the uniform and non-uniform random number generators. Example:

#include "stocc.h"                // header file for class library
#include <time.h>                 // define time function

void main () {
  int seed = time(0);             // seed for random number generator
  StochasticLib sto(seed);        // make instance of class
  int i = sto.IRandom(0, 99);     // make random integer with uniform distribution
  int b = sto.Binomial(0.2, 100); // make random integer with binomial distribution
  ...
  }

See the file ex-stoc.cpp for a more detailed example.

See the file ex-lib.cpp for an example of how to use an assembly language uniform random number generator as base for the non-uniform random number generators.

If you make more than one instance of these classes then make sure they don't have the same seed.

Portability

The C++ class library is supposed to work with all C++ compilers and all operating systems. It has been tested on several different systems.

There are, however, a few system differences that you need to be aware of:

  1. Error messages. There is no portable way of writing error messages. Systems with a graphical user interface (e.g. Windows) need a pop-up message box, while console mode programs and other line oriented systems need output to the standard output or error output. Therefore, you may have to modify the function FatalError  in the file userintf.cpp to fit your system. This function is called by the library functions in case of illegal parameter values or other fatal errors. Experience shows that these error messages are very useful when debugging a program that uses the non-uniform random variate generators. You may even enhance the FatalError function to output additional debug information about the state of your program. 
  2. Program exit. Windows-like environments may require that the program waits for the user to press a key before exiting, in order to prevent the output screen image from disappearing. Therefore, you may have to modify the function EndOfProgram in userintf.cpp to fit your system if you experience this problem.
  3. Multithreaded programs. The functions are not re-entrant. This is because they rely on static variables to remember the last parameters, so that they don't have to re-do set-up calculations that are the same as last time the function was called.

    If you want to call the same stochastic functions from more than one thread in a multi-threaded program then you have to make these functions re-entrant. There are two ways to do this:

    Method 1. Find all static local variables in the non-uniform random variate generating functions in stoc1.cpp and stoc3.cpp. Make these variables non-static and make the setup in these functions non-conditional. Don't use StochasticLib2 and stoc2.cpp.

    Method 2. Find all static local variables in the non-uniform random variate generating functions in stoc1.cpp, stoc2.cpp and stoc3.cpp. Make all these variables non-static members of the respective class. Those variables that need to be initialized to -1 (names ending in _last) must be so in the constructor.

    Whether you use method 1 or 2, you have to give each thread its own instance of the class and make sure they don't have the same seed. A difference of 1 in the seeds is sufficient.
  4. See randomc.htm for more portability issues.

Function descriptions

int Bernoulli(double p)
Bernoulli distribution with probability parameter p.
Returns 1 with probability p, or 0 with probability 1-p.
Error conditions:
Gives error message if p < 0 or p > 1.
double Normal(double m, double s)
Normal distribution with mean m and standard deviation s.
This distribution simulates the sum of many random factors.
Definition:
f(x) = (2*pi*s2)-0.5*exp(-0.5*s-2*(x-m)2)
Error conditions:
None.
long int Poisson (double L)
Poisson distribution with mean L.
This is the distribution of the number of events in a given time span or a given geographical area when these events are randomly scattered in time or space.
Definition:
f(x) = Lx/x! * exp(-L)
Error conditions:
Gives error message if L < 0 or L > 2*109.
long int Binomial (long int n, double p)
Binomial distribution with parameters n and p.
This is the distribution of the number of red balls you get when drawing n balls with replacement from an urn where p is the fraction of red balls in the urn.
Definition:
f(x) = B(n,x) * px * (1-p)n-x,
where the binomial coefficient B(a,b) = a! / (b! * (a-b)!).
Error conditions:
Gives error message if n < 0 or p < 0 or p > 1.
long int Hypergeometric (long int n, long int m, long int N)
Hypergeometric distribution with parameters n, m, N. (Note the order of the parameters).
This is the distribution of the number of red balls you get when drawing n balls without replacement from an urn where the urn contains N balls, where m balls are red and N-m balls are white.
Definition:
f(x) = B(m,x) * B(N-m,n-x) / B(N,n),
where the binomial coefficient B(a,b) = a! / (b! * (a-b)!).
Error conditions:
Gives error message if any parameter is negative or n > N or m > N.
long int WalleniusNCHyp(long int n, long int m, long int N, double odds)
The Wallenius noncentral hypergeometric distribution is the same as the hypergeometric distribution, but with bias. The bias can be seen as an odds ratio. A bias > 1 will favor the red balls, a bias < 1 will favor the white balls.
When bias = 1 we have the hypergeometric distribution.
Error conditions:
Gives error message if any parameter is negative or n > N or m > N.
long int FishersNCHyp(long int n, long int m, long int N, double odds)
The Fisher's noncentral hypergeometric distribution is a conditional binomial distribution resembling the Wallenius noncentral hypergeometric distribution. See the file distrib.pdf for a definition. Execution may be slow and inexact when N is high and bias is far from 1.
Error conditions:
Gives error message if any parameter is negative or n > N or m > N.
void Multinomial (long int * destination, double * source, long int n, int colors)
void Multinomial (long int * destination, long int * source, long int n, int colors)
Multivariate binomial distribution.
This is the distribution you get when drawing n balls from an urn with replacement, where there can be any number of colors. This is the same as the binomial distribution when colors = 2.
The number of balls of each color is returned in destination, which must be an array with colors places. source contains the number or fraction of balls of each color in the urn. source must be a double or long int array with colors places.
The sum of the values in source does not have to be 1, but it must be positive. The probability that a ball has color i is source[i] divided by the sum of all values in source.
Error conditions:
Gives an error message if any parameter is negative or if the sum of the values in source is zero. The behavior is unpredictable if source or destination has less than colors places.
void MultiHypergeo (long int * destination, long int * source, long int n, int colors)
Multivariate hypergeometric distribution.
This is the distribution you get when drawing n balls from an urn without replacement, where there can be any number of colors. This is the same as the hypergeometric distribution when colors = 2.
The number of balls of each color is returned in destination, which must be an array with colors places. source contains the number of balls of each color in the urn. source must be an array with colors places.
Error conditions:
Gives an error message if any parameter is negative or if the sum of the values in source is less than n. The behavior is unpredictable if source or destination has less than colors places.
void MultiWalleniusNCHyp(long int * destination, long int * source, double * weights, long int n, int colors)
Multivariate Wallenius noncentral hypergeometric distribution. This is the distribution you get when drawing colored balls from un urn without replacement, with bias. weights is an array with colors places containing the odds for each color. The probability of drawing a particular ball is proportional to its weight. The other parameters are the same as above.
This function may be inexact, but uses an approximation with an accuracy that is better than 1% in almost all cases.
Error conditions:
Gives an error message if any parameter is negative or if the total number of balls with nonzero weight is less than n. The behavior is unpredictable if any of the arrays has less than colors places.
void MultiFishersNCHyp(long int * destination, long int * source, double * weights, long int n, int colors)
The multivariate Fisher's noncentral hypergeometric distribution is a conditional binomial distribution resembling the multivariate Wallenius noncentral hypergeometric distribution. See the file distrib.pdf for a definition.
This function may be inexact, but uses an approximation with an accuracy that is better than 1% in most cases. The precision can be tuned at the expense of higher calculation times.
Error conditions:
Gives an error message if any parameter is negative or if the total number of balls with nonzero weight is less than n. The behavior is unpredictable if any of the arrays has less than colors places.
void Shuffle(int * list, int min, int n)
This function makes a list of the n numbers from  min  to  min+n-1  in random order.
The result is returned in list, which must be an array with n elements.
The array index goes from 0 to n-1.
If you want to shuffle something else than integers then use the integers in list as an index into a table of the items you want to shuffle.
Error conditions: none. The behavior is unpredictable if the size of the array list is less than n.
static double LnFac(long int n)
Log factorial. Mainly used internally.
Definition:
LnFac(n) = loge(n!).
This function uses a table when n < 1024 and Stirling's approximation for larger n. The table is generated by the constructor.
Error conditions:
Gives an error message if n < 0.

Theory

These distributions are defined in the file distrib.pdf. The methods used for generating variates with these distributions are described in the file sampmet.pdf.

A theoretical description of the univariate and multivariate Wallenius and Fisher's noncentral hypergeometric distributions, including calculation methods and sampling methods, is given in nchyp.pdf.

Examples showing how to simulate biological evolution using this class library can be found in evolc.zip.

Copyright and licence

© 2002-2005 by Agner Fog. General public license statement.

 

Back to random number generators page.