My Pseudo Random Number Generator, Too Big To Fail or Snake Oil?

For PRNG background information, check the links here. Below are the best links for PRNGs. They also have links to many other sites and web pages and blogs. Many of these web sites have very deep mathematical analysis, but all of them have some plain english explanations, and many just mention the math and go on to explain in a way you can understand without a math degree.

If you read nothing else, read this one!

Bob Jenkins primer on PRNG design

This is another explanation of randomness and PRNGs, and some discussion of test suites.

A Primer On Randomness

Melissa O'Neill's PCG site. Watch her video lecture too.

Melissa O'Neill's PCG64 site

Especially read the "Too Big to Fail" article. This applies to my PRNG. Then read all the others.

Melissa O'Neill's blog on PRNGs and PRNG testing.

Links to many PRNGs, great explanation of different types, and using Practrand to compare for suitability for use in games and simulations. Links to source code and mathematical analysis papers. Especially read link to the Romu family of PRNGs.

PRNG battle many RNGs tested

Chris Wellons wrote a program to generate and test random hash functions. Then he refined it, and got it producing pretty good generators. Tweaked some more and got it to produce actual good PRNGs that pass the tests.

Chris Wellons Prospecting for Hash Functions

Explanation of hash functions which are basicaly the same thing as PRNGs except all the input is compressed down to a fixed small number of bytes.

Bret Mulvey's hash function article

How to break the RSA algorithm by weak key attack.

About weak key attack on RSA algorithm



I needed a Pseudo Random Number Generator, (PRNG), for a computer game program I was writing on the Commodor64 way back in the mid 1980's. The system PRNG was crappy and I found a program in "The Transactor" magazine for an LFSR type PRNG. "The Transactor" was all about assembly language and the LFSR was a simple, as I recall, 51 byte Fibinacci LFSR using an order 51 irreducible primitive polynomial modulo 2. (From now on, I'll just call them primitive polynomials, but I always mean irreducible and modulo 2 also).

In implementation it was really 8 parallel binary LFSRs whose combined output is an 8 bit byte. Well that solved my immediate problem, but I was intrigued by how it worked. I had never seen the concept before but it's so simple you can do it on peice of paper. It just takes a long time doing it by hand, but this is exactly the type of computation that computers are best at, so it works out well. Over the years, I discovered more information about LFSRs, from cryptography and computer books, begining with Kahn's early classic, "The Codebreakers," Knuth's "The Art of Programming , Seminumerical Algorithms." and eventually culminating with Bruce Schneier's seminal book, "Applied Cryptography."

Eventually I thought I would write a cryptography program which I did, using an LFSR. I didn't know how bad the numbers were at the time, but the program encrypted and decrypted using a very complicated set of transformations, using the PRNG numbers.

I realized over the years, the the PRNGs I had been using were worefully inadequate. I was unable to write any testing software myself, but I found out about Dieharder, a test suite of programs that run various statistical tests on random numbers to find out if the really are random.

There are many sophisticated mathematical tests and also some you can understand. Like for example, simply counting the number of 0 and 1 bits in each position of the output numbers, over millions of Psuedo Random Numbers generated. We know if they truly are random, each bit position will get the same number of 0 and 1 bits over the long run. Statistically it's like flippping a coin over an over, you should get equal numbers of heads and tails. If you don't, then its not a fair coin, and not truly random. We test dice the same way, except the odds are 1/6 instead of 1/2.

Some tests are as simple as using the PRNG to generate poker hands, and comparing the generated hands to the well known mathematical probabilities for the different types of poker hands. Anything along those lines is called a “Monte Carlo Simulation.” You just use the test PRNG to simulate some kind of gambling game like poker, roulette, craps, or whatever, and compare the results to the actual probablities of the game. If you get way too many 4 of a kind hands, or never ever get snake eyes, then you know the PRNG is bad.

PRNG testing can't prove the PRNG is generating unbiased numbers, but if the PRNG does not generate unbiased numbers, the tests will be able to detect it by finding results outside the expected range. But since random means anything can happen, there will occasionally be some sequences of numbers that are biased for a (relatively), small amount of output and then the bias can change and become a different bias, and eventually after enough numbers it gets back to the average. This is called “regression to the mean” and it means that PRNGs and even truly Random Number Generaters, such as the radioactive decay, or atmospheric noise based RNGs, are allowed to deviate from the statistical averages a little bit, as long as they eventually regress back to the mean within the statistical probablilites.

It turns out, that in real life testing, if it deviates from the average too much, and it stays deviated for a long time, thats not good. But what tends to happen is that the deviation is either small and never strays too far from normal, or, once it does stray much beyond statistical chance, they tend to stray further and further from the mean, and eventually the bias is so great it cannot be ignored. This is a statistical cutoff, and you can make it as rigorous or as lenient as you want.

The test generates a result and calculates the probability of that result. The probability is called the P-value. So a P-value is the probablility of the test producing those same results using a theoretically perfect, truly random number generator. P-values actually represent a distillation of the test results down to a single number between 0 and 1. Any P-value at the extremes, is bad. .999 is just as bad as .001 and .999999 is just as bad as .000001. What you are looking for with P-values is a uniform distribution. When you run the test, over and over the P-values for each run should be all over the place, but never extremely close to 1 or extremely close to 0.

In PRNG testing, any P-values with less than .0001 chance of occuring is considered border line, but good PRNGs can and do recover from that after outputting enough numbers to regress back to the mean. A result with P-value less than .00001 is still not provably failed, but at some point you have to say "this is just not going to happen by random chance, at least not in my lifetime." That cutoff varies, but .0000001 or therabouts is often used.

A one in a million chance can be confirmed by generating ten million more results to see if it happens about 10 times. But when it's one in a hundred billion billion, you can't just generate ten times that many more numbers to see if it happens again because there isn't that much time left in the universe. So, it's always possible that the PRNG really is good, even with a very low, (or very high), P-value but we just didn't test enough numbers to know for sure. Thats why mathematical analysis is important. Then you can fill in the empircal gaps with true or false mathematical propositions.

Here is an example on a PRNG successfully recovering from a very bad P-value. This is rare. Its the only example I've seen of regression to the mean after such an extreme P-value. 1.3e-6 = 0.0000013 = 13 times in one million tests runs. The detected bias was rather extreme, but only for a while until things evened out with more numbers.

 rng=RNG_stdin64, seed=unknown 
length= 64 gigabytes (2^36 bytes), time= 1313 seconds 
  no anomalies in 340 test result(s) 

rng=RNG_stdin64, seed=unknown 
length= 128 gigabytes (2^37 bytes), time= 2595 seconds 
  no anomalies in 355 test result(s) 

rng=RNG_stdin64, seed=unknown 
length= 256 gigabytes (2^38 bytes), time= 5139 seconds 
  no anomalies in 369 test result(s) 

rng=RNG_stdin64, seed=unknown 
length= 512 gigabytes (2^39 bytes), time= 10037 seconds 
  Test Name                         Raw       Processed     Evaluation 
  BCFN(2+0,13-0,T)                  R=  +9.1  p =  2.2e-4   unusual          
  BCFN(2+1,13-0,T)                  R=  +8.6  p =  3.9e-4   unusual          
  ...and 381 test result(s) without anomalies 

rng=RNG_stdin64, seed=unknown 
length= 1 terabyte (2^40 bytes), time= 19043 seconds 
  Test Name                         Raw       Processed     Evaluation 
  BCFN(2+0,13-0,T)                  R= +13.2  p =  1.3e-6   suspicious       
  ...and 396 test result(s) without anomalies 

rng=RNG_stdin64, seed=unknown 
length= 2 terabytes (2^41 bytes), time= 37388 seconds 
  no anomalies in 409 test result(s) 

rng=RNG_stdin64, seed=unknown 
length= 4 terabytes (2^42 bytes), time= 81077 seconds 
  no anomalies in 422 test result(s) 

The suspicious P-value is borderline, 1.3e-6 = .0000013. That means a truly random generator would only produce this result 3 times in a million runs of that particular test. I should never ever see a P-value that low again using the same test on the same PRNG. Here's what BFCN means, and mine applies to

        BCFN(2+0,13-0,T)
        BCFN(2+1,13-0,T)

BCFN - checks for long range linear correlations (bit counting);
		in practice this often detects Fibonacci style RNGs that 
		rely upon large lags to defeat other statistical tests.  
		Two integer parameters are used - the first is the minimum 
		"level" it checks for bias at (it checks all higher levels 
		that it has enough data for), higher values are faster but 
		can miss shorter range correlations.  The recommended 
		minimum level is 2, as that helps it skip the slowest parts 
		and avoids redundancy with DC6 checking for the shortest 
		range linear correlations, while still doing a reasonable 
		amount of work for how much memory it has to scan.  
		The second integer parameter helps determine the amount 
		of memory and cache it will use in testing.  It is the 
		log-base-2 of the size of the tables it uses internally.  
		Recommended values are 10 to 15, larger values if cache is 
		large, lower values if cache is small.  
		Each individual "level" of this is a frequency test on 
		overlapping sets of hamming weights.  

This test detected an LFSR and its because my dispersion is not fast. This can happen if you inadvertently use a part of the state twice, giving it more weight than other parts. I've seen this from things like a nonfatal off by one error, using an allocated but uninitialized value, or some such mundane thing, but in this case it was just part of a much larger pseudo random sequence.

How I started developing the algorithm and learned to test using Practrand, GJrand, and TestU01

Briefly, I explored Fibinacci type LFSRs because thats what I can understand. It's a simple concept. Take a set of bytes, and find a primitive polynomial, modulo2, and use the exponents of the polymomial as the tap positions of the LFSR bytes. So each byte of the LFSR is numbered, 0 to n, and we select 4 of those positions to be taps. The positions must correspond to the exponents of an irreducible primitive polynomial, modulo 2. I don't know how to calculate those, it's complicated and I don't want to priogram it, but there are tables online that you can download, and Bruce Schneier's “Applied Cryprography” has a table.

For a length 51 LFSR I know of two choices for taps. {51, 15, 24, 46}, {51, 39, 25, 12},

The number 51 is the length, and in this table, it is always replaced with 0, so the very first tap is always 0. The remaining taps are the numbers given and each tap is an index into the array of bytes we are using as the LFSR. The 51 bytes of the LFSR are call the “state” and they are the numbers that we apply math and logic operations to in order to generate pseudo random numbers.

The taps indicate which of the bytes to use to generate the next number. For each number generated, the taps are incremented by one, to advance to the next position in the array of bytes. When a tap increments past the last element at index 50, we reset the tap to 0, to start over at the beginning of the LFSR. Since each tap is unique, they will, never reference the same byte in the LFSR at the same time. The taps are always incremented in lock step for every number generated.

And finally, the first tap, is special. The first tap's LFSR number is updated with the value of the operations on all four taps. Whatever new number that produces, is stored back into the LFSR into the tap1 positiion. This updates one element of the LFSR on every call. So as we call it to generate numbers, the tap1 posiiton increments, gets assigned the value of the calculation, and updates each element of the LFSR, as it cycles through them forever.

Eventually the output numbers will repeat. The count of numbers generated before it repeats is called the "period." Fibinacci LFSRs built with primitive polynomials, modulo 2, and using only one of XOR, plus, minus, always have a period of 2**n-1, where n is the length of the LFSR. Thats 2, raised to the power of the length of the LFSR. For the 51 element LFSR the period is 2251799813685247. Thats how many numbers it will generate before it repeats the same sequence of numbers over again.

Unfortunately it will fail any randomness testing long before that. The numbers are not very random. To illustrate why a long period is meaningless, you can easily create a random number generater with as long a period as you like simply by starting with 0 and incrementing by 1, forever. It will never ever repeat the same sequence of numbers, but its obviously as non-random as you can get.

So you can then do some transformation on the numbers, and as long as the transformations uniquely map to the same domain of numbers as the period, then you have a start. But doing something as simple as multipying each number by some prime number and dividing by some other prime number, or something like that won't work either. The numbers will look random to the human eye, but they will fail stistical testing as easily as the simple incrementing sequence.

So then you decide to not just start with 0 and count by 1 from there, but seed the PRNG with a randomly chosen starting number. That's better, but you still have the same underlying problem of bias. So then you have to start looking at the transformations. You are more likely to give up and reasearch known working methods of genrating random numbers, than to descover a new mathematical principle.

With fibinacci LFSRs the mathematics is well understood, and they can be tweaked to produce semi reasonable pseudo random numbers, but only good enough for inconsequential things that don't really matter that much, like selecting a random tip of the day. Anything more serious needs better numbers.

Eventually I stumbled across the DIEHARDER PRNG testing program. Not the original DIEHARD by Marsaglia, but the improved open source version by Robert Brown. I didn't know that DIEHARDER was faulty, other than the warnings from Robert Brown, so I used that to test my programs, and just ignored the tests that were suscpected of being faulty by Robert Brown himself. DIEHARDER does work well for distinguishing garbage output from random, but it has many tests that produce false positives and tests that don't prodcuce true positives. I didn't know that at the time, but I did compare my DIEHARDER output with output from DIEHARDER's builtin PRNGs and pcg64 PRNG written by Melissa O'Niell.

I could get my LFSR based PRNGs to perform about as well statistically as pcg64, but never as fast as pcg64. But the only measure I had was DIEHARDER. Then after a year or two, I started using Practrand and TestU01. After that I got Gjrand, another test suite. It turns out, in the PRNG community, these are the gold standard testing programs.

TestU01 is not as good as Practrand and Gjrand, but TestU01 is written by Pierre L'Ecuyre, a University mathematician, and the program is academically researched, analyzed, and published. This makes it the only PRNG test suite that is maathematically verified and confirmed to be correct. You can trust it's reuslts. The problem is that it does not do many tests, and does not use many random numbers, and only examines the first 31 of each 32 bits of output. It simply ignores every 32nd bit. Of course you have to get around that by testing the exact same sequence again, but with each 32 bits shifted 1 bit to the left, (in C). Many of the DIEHARDER tests are also academically verified, but the implmentation is suspect and the tests are not easily scalable. I modified my copy to exclude the useless tests and its very customizable from the command line.

Gjrand and Practrand are not academically analyzed, but they are widely considered to be better at detecting bias than TestU01, and they can take as many input numbers as memory allows them to process. Depending on test configurations they can run for months and consume hundreds or thousands of terabytes of random number. By contrast, DIEHARDER and TestU01 both use a fixed amount of input, and other then that, you just run them mulltiple times.

In the last few years, I've been using mostly Practrand and Gjrand for quick testing, and I only use crush and bigcrush to academicaly confirm Practrand and Gjrand. Gjrand is good because you can run a specific test. Well, I can, because I modifed the source to use temp files in /tmp instead of a single file named “report.txt.” This allows me to run mcp or any of the individual Gjrand tests concurrently as many times as my CPU's and memory can support. So if Gjrand fails a certain test, I fix and then run that test again until I can pass it, then run the test with ten times the number of input, and just keep fixing and testing until I confirm that I have eradicated the correlation.

I have never used Gjrand to ten terabytes, yet, but thats coming, for completeness, and eventually I plan to compare ecbs64 output to HC-256, pcg64, jsf64, trivium, chacha, and salsa. These tests take a long long time, so this will take at least a year unless I can get a Ryzen Threadripper and run 64 tests at a time.

Anyway, any cryptograrphy expert will tell you that anything written by an amatuer is snake oil, garbage, doodoocahcah, and you are sternly advised to stay away from such things. Well if you read on, you might find some interesting concepts here.

Thats the background of my preparation for this algorithm I call ecbs64 where it stands for Eric Canzler's Big Secure, or Big Slow, being a play on Bob Jenkins JSF64, (Small Fast). But you can use some other mnemonic if you want.

The algorithm consists of state, operations, and semi constants. The semiconstants are variables that are initialized to some random value, but never changed. There are no true constants or magic numbers.

Design goals.

          1. statistically random output
          2. unpredictable output
          3. unpredictable preput
          4. resistance to cryptanalysis.
      

I typically use the caret symbol, "^" for exponentiation and percent symbol "%" for modulo.

This is a linear feedback shift register using XOR, multiplication, addition, and a variable rotate. This creates a multicycle generator with no fixed period. Adding a counter fixes that problem. What the counter does is essentially switch to a new cycle on every call. If it hits another short cycle, thats ok, its only gonna use it one time and then change to another cycle. It may change to the same cycle again, but eventually it will break out of the short cycle. Addiitonally we swap the taps around on every call, so you are guranteed to break out of a short cycle sooner or later because of the adders and tap swaps.

Every element of the ecbs family of generators can be initialized to any possible state, except the taps. The taps are restricted to be unique integers in range 0 - (lfsralen-1). Other than that restriction on tap values, no other value of the LFSR makes any difference to the operation, they are all equally valid. There are no weak keys, even the raw cycle with period 1 is totally valid because we are guaranteed to break out of it on the next pass.

LEMMA 1 Any LFSR can be made to have an arbitrarily long minimum period by adding a sufficiently large counter, provided the following conditions are met.

        1. The counter must alter the state.
        2. The state must NOT alter the counter.
        3. The counter is best implemented with a step of one.
           Any other step value must be chosen carefully or the
           amount of period increase may be minimal. 
      

LEMMA 2 An LFSR that produces uniformly distributed psuedo random output, and uses a counter to extend its minimum guaranteed period, will continue to produce uniformly distributed output.

When you XOR the counter it spreads out the bit changes immediately without the lag of plus or minus which introduce a bias towards 0 or 1 bits at the ends of the cycle and smaller cyclic bias throughout the range. Using XOR is not as good as entropy since its still deterministic, but its the next best thing. Over the long run, counters do no harm, although they might not help anything except the period. This is just to say that if the output is unpredictable and indistinguishible from true random, then mixing anything with it will still be indistinguishible from true random. This is why ciphers work.

LEMMA 3 Changing the taps of a fixed tap LFSR is equivilent to changing the operations of the LFSR.

Consider

    lfsra[tap1] = lfsra[tapa2] + (lfsra[tapa3] * (lfsra[tapa4]|1))

If I swap the values of tapa2 and tapa3 it is the same as not swapping but using

    lfsra[tap1] = lfsra[tapa3] + (lfsra[tapa2] * (lfsra[tapa4]|1))

So, if I swap my taps on every call, its the same as having many different fixed tap LFSRs. Since I have 8 taps to swap among, I have the equivlent of 8! = 40320, different, fixed tap LFSRs built in. To swap, I pick a tap and swap values with it's neighbor. This is essentially a "bubble" shuffle.

There is quite a bit of explanation in the source code available in the link below. I'm still trying to decide how to finalize the code for testing. All the tests I've run on subcomponents pass Practrand to 32 terabytes. So far.


Explanation and detailed description of the algorithms and why they are as they are.

How to use testecbs8 and testecbs64.

How ecbscryptogen works

     I have some files available for download, source code only.

     testecbs8.c     - Source code for the smallest version.

     testecbs64.c    - Source code for the largest version.

     ecbscrypto.c    - Source code for a standalone cipher program.
                       encrypts or decrypts, reads STDIN and writes STDOUT.

     ecbscryptogen.c - Source code for a "standalone, data dependent,
                       cipher program" generator program. This program
                       creates a new program and executes it to encrypt
                       or decrypt. Very mysterious, and harder than $%!^
                       to program and debug.

View or Download testecbs64.c, testecbs8.c, ecbscrypto.c, ecbscryptogen.c and some minor utilities.

Back to main index page.