Non-uniform random number generators
by Agner Fog
The file stocc.zip contains a C++ class library of non-uniform random number generators
providing variates with the following distributions: normal,
bernoulli, poisson, binomial, hypergeometric, Wallenius' noncentral hypergeometric,
and Fisher's noncentral hypergeometric distribution. Most distributions are
available in both univariate and multivariate versions. There is also a function to
shuffle numbers.
Most of these functions are fast and accurate, even for extreme values of the
parameters.
The functions are based on the uniform random number generators in randomc.zip
or randoma.zip.
File list
The archive stocc.zip contains the following
files:
- stocc.htm
- This file. Instructions
- stocc.h
- C++ header file containing class definitions.
You must #include
this in all C++ files that use this library.
- stoc1.cpp
- C++ source code for the non-uniform random number generators for the most
common probability distributions.
- stoc2.cpp
- Alternative code for the distributions poisson, binomial and hypergeometric.
Optimized for situations where these functions are called repeatedly with
the same parameters.
- stoc3.cpp
- Source code for generating random variables with the univariate and
multivariate Wallenius' and Fisher's noncentral hypergeometric distributions.
- userintf.cpp
- System-dependent user interface code
- fnchyppr.cpp wnchyppr.cpp erfres.cpp
- Additional source code used by stoc3.cpp
- ex-stoc.cpp
- Example showing how to use this class library to make random numbers with
various distributions.
- ex-lib.cpp
- Example showing how to use a combined random number generator from the
assembly language library as base for the non-uniform random number
functions.
- ex-cards.cpp
- Example showing how to shuffle a list of items randomly. Shuffles a deck
of cards.
- ex-lotto.cpp
- Example showing how to generate a sequence of random integers where no
number occurs more than once.
- testpois.cpp, testbino.cpp, testhype.cpp, testfnch.cpp, testwnch.cpp,
testmfnc.cpp, testmwnc.cpp
- Test programs for testing the poisson, binomial, hypergeometric, and
univariate and multivariate Fisher's and Wallenius' noncentral
hypergeometric distribution generators. Also useful for calculating mean and variance of
these distributions.
- distrib.pdf
- Definition of the statistical distributions that can be generated with
this package.
- sampmet.pdf
- Sampling methods. Theoretical explanation of the methods used for
generating variates with these distributions.
Instructions
Unpack the files randomc.zip and stocc.zip
into a new directory. Make sure all the unpacked files are in the same
directory. To get started, just compile one of the example programs for console
mode using any C++ compiler.
The C++ class StochasticLib
can be derived from any of the
random number generator classes in randomc.zip. Choose
the one you like best. The examples use the "Mersenne Twister" class TRandomMersenne
because this is the random number generator that has the fewest portability
problems. To use a different random number generator as base class, just change
the
#define RANDOM_GENERATOR TRandomMersenne
directive in the file stocc.h or
make such a define before the
#include "stocc.h"
statement in all your
modules.
The C++ class StochasticLib2
is derived from StochasticLib
.
It contains alternative versions of the functions poisson, binomial and
hypergeometric. StochasticLib
is the best choice if the
parameters to these functions often wary in your program, StochasticLib2
is faster when these functions are called many times with the same parameters.
The C++ class StochasticLib3
is derived from StochasticLib
or StochasticLib2
. It adds the Fisher's and Wallenius' noncentral
hypergeometric distributions, and the multivariate versions of these, to the
library.
The header file stocc.h must be included in all modules that use this class
library. The source code file stoc1.cpp can be included in your project, either
as an #include
, or as a separate module. If you are using StochasticLib2
,
then you need both stoc1.cpp and stoc2.cpp. If you are using StochasticLib3
,
then you need both stoc1.cpp and stoc3.cpp.
It is recommended that you make only one instance of StochasticLib
or StochasticLib2
or StochasticLib3
. These stochastic library classes
inherit the uniform random number functions from their base class. Therefore,
one object of StochasticLib3
gives you access to all the uniform and non-uniform random
number generators.
Example:
#include "stocc.h" // header file for class library
#include <time.h> // define time function
void main () {
int seed = time(0); // seed for random number generator
StochasticLib sto(seed); // make instance of class
int i = sto.IRandom(0, 99); // make random integer with uniform distribution
int b = sto.Binomial(0.2, 100); // make random integer with binomial distribution
...
}
See the file ex-stoc.cpp for a more detailed example.
See the file ex-lib.cpp for an example of how to use an assembly language
uniform random number generator as base for the non-uniform random number
generators.
If you make more than one instance of these classes then make sure they don't
have the same seed.
Portability
The C++ class library is supposed to work with all C++ compilers and all
operating systems. It has been tested on several different systems.
There are, however, a few system differences that you need to be aware of:
- Error messages. There is no portable way of writing error messages.
Systems with a graphical user interface (e.g. Windows) need a pop-up message
box, while console mode programs and other line oriented systems need output to the standard output or error
output. Therefore, you may have to modify the function
FatalError
in the file userintf.cpp to fit your system. This function is called by the library functions in case of illegal parameter
values or other fatal errors. Experience shows that these
error messages are very useful when debugging a program that uses the
non-uniform random variate generators. You may even enhance the FatalError
function to output additional debug information about the state of your
program.
- Program exit. Windows-like environments may require that the program waits
for the user to press a key before exiting, in order to prevent the output
screen image from disappearing. Therefore, you may have to modify the function
EndOfProgram
in userintf.cpp to fit your system if you experience this problem.
- Multithreaded programs. The functions are not re-entrant. This
is because they rely on static variables to remember the last parameters, so
that they don't have to re-do set-up calculations that are the same
as last time the function was called.
If you want to call the same stochastic functions from more than one thread
in a multi-threaded program then you have to make these functions re-entrant. There are two ways to do
this:
Method 1. Find all static local variables in the non-uniform random variate
generating functions in stoc1.cpp and stoc3.cpp. Make these variables non-static and make the setup in these
functions non-conditional. Don't use StochasticLib2
and
stoc2.cpp.
Method 2. Find all static local variables in the non-uniform random variate
generating functions in stoc1.cpp, stoc2.cpp and stoc3.cpp. Make all these variables
non-static members of the respective class. Those variables that need to be
initialized to -1
(names ending in _last
) must be so in the constructor.
Whether you use method 1 or 2, you have to give each thread its own instance
of the class and make sure they don't have the same seed. A difference of 1
in the seeds is sufficient.
- See randomc.htm for more portability issues.
Function descriptions
- int Bernoulli(double p)
- Bernoulli distribution with probability parameter p.
Returns 1 with probability p, or 0 with probability 1-p.
Error conditions:
Gives error message if p < 0 or p > 1.
- double Normal(double m, double s)
- Normal distribution with mean m and standard deviation s.
This distribution simulates the sum of many random factors.
Definition:
f(x) = (2*pi*s2)-0.5*exp(-0.5*s-2*(x-m)2)
Error conditions:
None.
- long int Poisson (double L)
- Poisson distribution with mean L.
This is the distribution of the number of events in a given time span or a
given geographical area when these events are randomly scattered in time or
space.
Definition:
f(x) = Lx/x! * exp(-L)
Error conditions:
Gives error message if L < 0 or L > 2*109.
- long int Binomial (long int n, double p)
- Binomial distribution with parameters n and p.
This is the distribution of the number of red balls you get when drawing n
balls with replacement from an urn where p is the fraction of red
balls in the urn.
Definition:
f(x) = B(n,x) * px * (1-p)n-x,
where the binomial coefficient B(a,b) = a! / (b! * (a-b)!).
Error conditions:
Gives error message if n < 0 or p < 0 or p > 1.
- long int Hypergeometric (long int n, long int m, long int N)
- Hypergeometric distribution with parameters n, m, N. (Note the order of
the parameters).
This is the distribution of the number of red balls you get when drawing n
balls without replacement from an urn where the urn contains N balls,
where m balls are red and N-m balls are white.
Definition:
f(x) = B(m,x) * B(N-m,n-x) / B(N,n),
where the binomial coefficient B(a,b) = a! / (b! * (a-b)!).
Error conditions:
Gives error message if any parameter is negative or n > N or m > N.
- long int WalleniusNCHyp(long int n, long int m, long int N, double
odds)
- The Wallenius noncentral hypergeometric distribution is the same as the hypergeometric distribution, but with bias. The bias can be seen
as an odds ratio. A bias > 1 will favor the red balls, a bias < 1 will
favor the white balls.
When bias = 1 we have the hypergeometric distribution.
Error conditions:
Gives error message if any parameter is negative or n > N or m > N.
- long int FishersNCHyp(long int n, long int m, long int N, double
odds)
- The Fisher's noncentral hypergeometric distribution is a conditional binomial
distribution resembling the Wallenius noncentral hypergeometric distribution. See the
file distrib.pdf for a definition. Execution may
be slow and inexact when N is high and bias is far from 1.
Error conditions:
Gives error message if any parameter is negative or n > N or m > N.
- void Multinomial (long int * destination, double * source, long int n, int colors)
void Multinomial (long int * destination, long int * source, long int n, int colors)
- Multivariate binomial distribution.
This is the distribution you get when drawing n balls from an urn with
replacement, where there can be any number of colors. This is the same
as the binomial distribution when colors
= 2.
The number of balls of each color is returned in destination
,
which must be an array with colors
places. source
contains the number or fraction of balls of each color in the urn. source
must be a double
or long int
array with colors
places.
The sum of the values in source
does not have to be 1, but it
must be positive. The probability that a ball has color i
is source[i]
divided by the sum of all values in source
.
Error conditions:
Gives an error message if any parameter is negative or if the sum of the
values in source
is zero. The behavior is unpredictable if source
or destination
has less than colors
places.
- void MultiHypergeo (long int * destination, long int * source, long int n, int colors)
- Multivariate hypergeometric distribution.
This is the distribution you get when drawing n balls from an urn without
replacement, where there can be any number of colors. This is the same
as the hypergeometric distribution when colors
= 2.
The number of balls of each color is returned in destination
,
which must be an array with colors
places. source
contains the number of balls of each color in the urn. source
must be an array with colors
places.
Error conditions:
Gives an error message if any parameter is negative or if the sum of the
values in source
is less than n. The behavior is unpredictable
if source
or destination
has less than colors
places.
- void MultiWalleniusNCHyp(long int * destination, long int * source, double * weights, long int n, int
colors)
- Multivariate Wallenius noncentral hypergeometric distribution. This is the
distribution you get when drawing colored balls from un urn without
replacement, with bias.
weights
is an array with colors
places containing the odds for each
color. The probability of drawing a particular ball is proportional to its
weight. The other parameters are the same as above.
This function may be inexact, but uses an approximation with an accuracy that
is better than 1% in almost all cases.
Error conditions:
Gives an error message if any parameter is negative or if the total number
of balls with nonzero weight is less than n. The behavior is unpredictable
if any of the arrays has less than colors
places.
- void MultiFishersNCHyp(long int * destination, long int * source, double * weights, long int n, int
colors)
- The multivariate Fisher's noncentral hypergeometric distribution is a conditional
binomial distribution resembling the multivariate Wallenius noncentral hypergeometric
distribution. See the file distrib.pdf for a
definition.
This function may be inexact, but uses an approximation with an accuracy that
is better than 1% in most cases. The precision can be tuned at the expense
of higher calculation times.
Error conditions:
Gives an error message if any parameter is negative or if the total number
of balls with nonzero weight is less than n. The behavior is unpredictable
if any of the arrays has less than colors
places.
- void Shuffle(int * list, int min, int n)
- This function makes a list of the n numbers from
min
to min+n-1
in random order.
The result is returned in list
, which must be an array with n elements.
The array index goes from 0
to n-1
.
If you want to shuffle something else than integers then use the integers in
list
as an index into a table of the items you want to shuffle.
Error conditions: none. The behavior is unpredictable if the size of the
array list
is less than n
.
- static double LnFac(long int n)
- Log factorial. Mainly used internally.
Definition:
LnFac(n) = loge(n!).
This function uses a table when n < 1024 and Stirling's approximation for
larger n. The table is generated by the constructor.
Error conditions:
Gives an error message if n < 0.
Theory
These distributions are defined in the file distrib.pdf.
The methods used for generating variates with these distributions are described
in the file sampmet.pdf.
A theoretical description of the univariate and multivariate Wallenius and
Fisher's noncentral hypergeometric distributions, including calculation methods and
sampling methods, is given in nchyp.pdf.
Examples showing how to simulate biological evolution using this class
library can be found in evolc.zip.
Copyright and licence
© 2002-2005 by Agner Fog.
General public license statement.
Back to random number
generators page.