View Single Post

quickNir's Avatar


quickNir
03.04.2013 , 12:45 PM | #102
First off, to anybody reading through this thread who doesn't have a strong background in probability and statistics, please do not assume anything written in this thread is correct. Some (a minority) of the people posting here know what they're talking about (like for instance the post just above mine), but there's a lot of junk here. So your best bet is really not to trust anything but read about this stuff somewhere else if it interests you.

The testing technique in the original post is legitimate. I've seen tons of posts claiming that the sample size just isn't large enough, without actually responding to the numbers the OP put up. Most of those posts are simply incorrect. There is no fundamental sample size that you need. To find a discrepancy between a claimed hypothesis and reality, the size of the sample required depends on the size of the discrepancy and how certain you want to be. Sometimes 100 samples is enough.

Suppose I have a coin that comes up heads 100% of the time. How long will it take you to determine my coin is rigged? Not very long. After it comes up heads even just 20 times in a row, you will be very suspicious. After 50 times it's virtually a certainty. If, on the other hand, I have a coin that comes up heads 51% of the time, it will take a very large number of samples to prove anything.

To work out what these sample sizes are, there is no alternative except to crunch the numbers, which the original poster did. You start with the hypothesis that 20% is the probability of RE. You do the experiment. You see how likely it is that you would get the outcome you got or one more extreme, if the probability really were 20%. If this probability is low, you justify discarding the hypothesis that 20% is the true probability.

Here is what (in my opinion) is missing from this discussion: Bayesian statistics. Some of the posters are correct in not criticizing the original methodology, but saying that it simply doesn't support the conclusions adequately. Why not? It seems like the confidence interval is pretty convincing. The reason is that 20% isn't just another number. It's the number given to us by the game. Since it's really easy to generate heads randomly (pseudo-randomly, technically) at a 20% level, I tend to suspect that Bioware did not screw this up. In other words I have prior beliefs about the likelihood that 20% is the RE rate, as opposed to other values.

Suppose I flip a coin I find on the street 100 times. I get 75 heads. This is a wildly improbably result for a fair coin. Yet, I will not conclude that the coin I found on the street is unfair. Why? Because if you find a random coin on the street, it is many many many times more likely to be fair than unfair (note that when I say fair, I mean within a small tolerance of 50%, as real coins are). When you work out the math, it results in the final conclusion still being that the coin is likely fair. Because it is more likely that I found a fair coin and had an unusual sequence of flips, then that I found an unfair coin and had a usual sequence of flips.

The same applies here. The evidence would be convincing if there was nothing special about 20%. But I have pretty strong prior beliefs about Bioware programmers being able to do something so simple correctly. In other words, despite the evidence, I think it is more likely that Bioware got this right and that your test was a fluke, then that Bioware screwed this up and your test is representative. So I will require much, much stronger evidence before I believe that 20% is not the true rate.