Actually the confidence interval calculation is what I us at work to validate math models with test results for products we build. At my job we say that if a test data falls outside the 90 percent confidence interval we say that it fails to validate the model. In this case the programers are telling us 20 percent is the outcome we should see. As for how it is computed I used what we have used at work it also agrees with with my college text and I see similar things on wikipedia as well.

As for the people that are complain that this is just a RNG "thing" the point of a confidence interval test is to define a band of what kind of results you can expect to see from a set of sample tests that are all independent from one another.
You sample size is too small.

If you've ever applied math to gambling with dice or roulette, and tried to calculate optimal gambling strategies (I have) you should know that you need a much larger sample size to have any confidence in your results.

Applying standard deviation to "the products you build" is NOT the same as applying standard deviation for gambling results.