This talks about 'bad' seeding quite a lot. But it really depends on what you need what is bad and what is good. Sometimes you need to have a reproducible program so you need to write the random number generator yourself and/or otherwise fix the algorithm. Then you can use '5' as the seed. This is quite often good enough for simulations. Sometimes you want to create cryptographic randomness. Then you need to somewhere find some seed of true randomness that is not guessable. In a computer game where the level needs some random elements just seeding the random number generator with the current time might be fine. And so on.
I don't think it disputes that sometimes using a fixed small number is a good seed for reproducibility. The point is just, if you do happen to be in the other camp, where you actually want to populate the full state of the RNG using hardware randomness, C++ does not give you an obvious way to do that.
This generates seed_seq with 19968 bits of random data, which is enough for the 19937 bits of Mersenne Twister internal state.
Note that 19668 bits of random data is overkill; something like 256 or 128 bits would probably enough for practical purposes. But I believe there is no real need to limit the amount of data extracted from a random source. Modern operating systems are pretty good at generating large amounts of random data quickly. But if this is a concern, just change 624 to 4/8/16/32 for 128/256/512/1024 bits of entropy. In practice, I don't think you'll notice a difference either in randomness or initialization speed.
edit: also, if performance is a concern, consider changing mt19937 to mt19937_64, which is the 64-bit variant of mt19937 that is incompatible (generates different numbers) but is almost twice as fast on 64-bit platforms (i.e. most platforms today).
If you're aware/concerned about seeding, you probably aren't using the C++ std prng (mt19937) anyway -- other prngs have desirable properties like vastly smaller state, better performance, or cryptographic security.
It was better than the bad, C interface LCG rand(), I guess. (There are LCG parameters that make for ~objectively better PRNGs than MT, but rand()'s parameters aren't great and its state is too small.)
The Mersenne Twister (MT) was one of the best engines and was the default in many other languages/packages too. See "Applications" section in wikipedia - https://en.wikipedia.org/wiki/Mersenne_Twister
The author identified distribution problems with the 32-bit versions of MT (i am not sure whether similar problems exist with its 64-bit versions) and proposed a different one named "Permuted Congruential Generator (PCG)" which has now been adopted as the default by many of the languages/packages - https://en.wikipedia.org/wiki/Permuted_congruential_generato...
As you can now appreciate, the subject is mathematically complicated and the defaults chosen by the language/package implementer becomes the "most commonly used" and hence reference case. While this is good enough for most "normal" applications if you are doing any special simulations (Monte Carlo or otherwise) and/or specific Numerical Computations it is your responsibility to understand what it is that you need and program accordingly using the various options (if available) or roll your own.
> (i am not sure whether similar problems exist with its 64-bit versions) and proposed a different one named "Permuted Congruential Generator (PCG)"
The 64-bit version might be a bit faster (for certain workloads, on 64-bit hardware) than the 32-bit version, but still wastes the same space and has the same mathematical flaws.
PCG is still not perfect (128-bit math hurts, though the new DXSM variant at least reduces that to 128x64), but its mathematical properties are nicer than the xor* family (its main competitor), and both families are miles ahead of any other RNG out there.
It is understanding the non-trivial statistical properties (even at a simple conceptual level) that is of paramount importance. PRNG is one of the most difficult subjects in Numerical Computation and has nothing whatever to do with any language/package/library etc.
Even today, caution is sometimes required, as illustrated by the following warning in the International Encyclopedia of Statistical Science (2010).
The list of widely used generators that should be discarded is much longer [than the list of good generators]. Do not trust blindly the software vendors. Check the default RNG of your favorite software and be ready to replace it if needed. This last recommendation has been made over and over again over the past 40 years. Perhaps amazingly, it remains as relevant today as it was 40 years ago.
I've found it easier to write my own PRNG than to use the std. Using the std PRNG is about as buggy as my implementation, so the trade-off is reasonable. I usually need non-cryptographically strong PRNGs, so xorshift128+ is sufficient.
I used it because it was recommended by Stephan T. Lavavej, maintainer of Visual Studio's C++ Standard Library, in his "rand() Considered Harmful" talk, back when <random> was introduced. See 11m30s. https://youtu.be/LDPMpc-ENqY?t=10m50s
found one more flakiness over cross platform, when seed mt19937 same way on linux and windows, same compiler, same code... but problem is std::random_device or libc internals differ under the hood. some platforms do random_device as true hardware entropy, others fake it or seed from diff system sources. so seed retrieved isn't stable cross platform. that means mt19937 starts from diff states, causing different random sequences
it's not a bug in mt19937 itself, it's how random_device (or libc randomness) works differently across environments. makes cross platform tests flaky even when logic is rock solid
>>
std::random_device rd; // might differ per platform
std::mt19937 gen(rd()); // seed depends on rd output
std::uniform_int_distribution<> dist(1, 100);
int random_number = dist(gen); // different on linux vs windows tho same code
If you want reproducibility and statelessness, there's random123 which I highly recommend. Particularly useful for GPU code.
(Not cryptographically secure, but passes all statistical tests you throw at it).
This talks about 'bad' seeding quite a lot. But it really depends on what you need what is bad and what is good. Sometimes you need to have a reproducible program so you need to write the random number generator yourself and/or otherwise fix the algorithm. Then you can use '5' as the seed. This is quite often good enough for simulations. Sometimes you want to create cryptographic randomness. Then you need to somewhere find some seed of true randomness that is not guessable. In a computer game where the level needs some random elements just seeding the random number generator with the current time might be fine. And so on.
I don't think it disputes that sometimes using a fixed small number is a good seed for reproducibility. The point is just, if you do happen to be in the other camp, where you actually want to populate the full state of the RNG using hardware randomness, C++ does not give you an obvious way to do that.
Is it possible to initialize a prng in C++’s std correctly?
My standard code for doing this looks like this:
This generates seed_seq with 19968 bits of random data, which is enough for the 19937 bits of Mersenne Twister internal state.Note that 19668 bits of random data is overkill; something like 256 or 128 bits would probably enough for practical purposes. But I believe there is no real need to limit the amount of data extracted from a random source. Modern operating systems are pretty good at generating large amounts of random data quickly. But if this is a concern, just change 624 to 4/8/16/32 for 128/256/512/1024 bits of entropy. In practice, I don't think you'll notice a difference either in randomness or initialization speed.
edit: also, if performance is a concern, consider changing mt19937 to mt19937_64, which is the 64-bit variant of mt19937 that is incompatible (generates different numbers) but is almost twice as fast on 64-bit platforms (i.e. most platforms today).
If you're aware/concerned about seeding, you probably aren't using the C++ std prng (mt19937) anyway -- other prngs have desirable properties like vastly smaller state, better performance, or cryptographic security.
No one uses the <random> header as it's cursed and the usual cult of backwards compatibility ensures it'll stay that way.
There are several high quality alternatives that people use.
How did it get into the standard then?
It was better than the bad, C interface LCG rand(), I guess. (There are LCG parameters that make for ~objectively better PRNGs than MT, but rand()'s parameters aren't great and its state is too small.)
You need to understand PRNGs to answer that question. It is complicated and nothing to do with C++ language itself.
Here is cppreference on PRNGs (note the various engines available) - https://www.cppreference.com/w/cpp/numeric/random.html You have to "know" how to combine the various options available to get an optimal sequence.
The Mersenne Twister (MT) was one of the best engines and was the default in many other languages/packages too. See "Applications" section in wikipedia - https://en.wikipedia.org/wiki/Mersenne_Twister
The author identified distribution problems with the 32-bit versions of MT (i am not sure whether similar problems exist with its 64-bit versions) and proposed a different one named "Permuted Congruential Generator (PCG)" which has now been adopted as the default by many of the languages/packages - https://en.wikipedia.org/wiki/Permuted_congruential_generato...
As you can now appreciate, the subject is mathematically complicated and the defaults chosen by the language/package implementer becomes the "most commonly used" and hence reference case. While this is good enough for most "normal" applications if you are doing any special simulations (Monte Carlo or otherwise) and/or specific Numerical Computations it is your responsibility to understand what it is that you need and program accordingly using the various options (if available) or roll your own.
> (i am not sure whether similar problems exist with its 64-bit versions) and proposed a different one named "Permuted Congruential Generator (PCG)"
The 64-bit version might be a bit faster (for certain workloads, on 64-bit hardware) than the 32-bit version, but still wastes the same space and has the same mathematical flaws.
PCG is still not perfect (128-bit math hurts, though the new DXSM variant at least reduces that to 128x64), but its mathematical properties are nicer than the xor* family (its main competitor), and both families are miles ahead of any other RNG out there.
It is understanding the non-trivial statistical properties (even at a simple conceptual level) that is of paramount importance. PRNG is one of the most difficult subjects in Numerical Computation and has nothing whatever to do with any language/package/library etc.
From https://en.wikipedia.org/wiki/Pseudorandom_number_generator
Even today, caution is sometimes required, as illustrated by the following warning in the International Encyclopedia of Statistical Science (2010).
The list of widely used generators that should be discarded is much longer [than the list of good generators]. Do not trust blindly the software vendors. Check the default RNG of your favorite software and be ready to replace it if needed. This last recommendation has been made over and over again over the past 40 years. Perhaps amazingly, it remains as relevant today as it was 40 years ago.
I've found it easier to write my own PRNG than to use the std. Using the std PRNG is about as buggy as my implementation, so the trade-off is reasonable. I usually need non-cryptographically strong PRNGs, so xorshift128+ is sufficient.
Sadly, people that don't know better use std::mt19937 all the time :-(.
And the only reason is its cool name. Humans are weird
I used it because it was recommended by Stephan T. Lavavej, maintainer of Visual Studio's C++ Standard Library, in his "rand() Considered Harmful" talk, back when <random> was introduced. See 11m30s. https://youtu.be/LDPMpc-ENqY?t=10m50s
Well, it's in std. So there's an appeal to authority (the C++ language authors should be smart, right?) and convenience.
There are still others in <random>. Yet they always use the cool twister one with the funny numbers
What alternative within <random> would you recommend and why?
found one more flakiness over cross platform, when seed mt19937 same way on linux and windows, same compiler, same code... but problem is std::random_device or libc internals differ under the hood. some platforms do random_device as true hardware entropy, others fake it or seed from diff system sources. so seed retrieved isn't stable cross platform. that means mt19937 starts from diff states, causing different random sequences
it's not a bug in mt19937 itself, it's how random_device (or libc randomness) works differently across environments. makes cross platform tests flaky even when logic is rock solid
>>
std::random_device rd; // might differ per platform
std::mt19937 gen(rd()); // seed depends on rd output
std::uniform_int_distribution<> dist(1, 100);
int random_number = dist(gen); // different on linux vs windows tho same code
Why not just run your tests with a fixed seed? e.g.
Relevant: A old Reddit discussion by the same author - https://old.reddit.com/r/cpp/comments/32u4m7/the_behavior_of...