Right, but are you modeling probabilistic events?

Confidence intervals are useful when we have a definite hypothesis and definite test results. They allow us to represent things like our level of confidence that our measurements are accurate, and how confident we are that our results point to an actual phenomenon and not just statistical noise.

Confidence intervals are also useful when we have definite results from a subset of some larger population and we want to extrapolate from them. We know with complete certainty how people in exit polls voted. Confidence intervals allow us to represent our confidence that these exit polling numbers are an accurate reflection of all the ballots cast.

In your case, it sounds like confidence intervals allow you to represent the confidence with which you can say that a rocket landed where your model said it would because the model is right, and not because of measurement error or expected variability or whatever.

No, they aren't. I think that is precisely the problem here.

There is a very important difference between saying that something has a 20% chance to happen and saying that something will happen 20% of the time.

For the sake of illustration, let's say that we are going to reverse engineer five items, each with a 20% chance to teach us a new schematic. A simple probability table gives us the following percentage chances for each of the five possible outcomes:

0/5 Successfullly teach us a new schematic = 32.77%

1/5 Successfully teach us a new schematic = 40.96%

2/5 Successfully teach us a new schematic = 20.54%

3/5 Successfully teach us a new schematic = 5.12%

4/5 Successfully teach us a new schematic = 0.64%

5/5 Successfully teach us a new schematic = 0.032%

We don't have a definite hypothesis because our model is probabilistic. Our chance of learning exactly one new schematic from five reverse engineering attempts (a perfect 20% success rate) is less than half. It is the most probable of the five outcome - it should occur with greater frequency than any other individual outcome - but it is substantially less probable than all the other outcomes put together.

If your friend said to you "I am going to reverse engineer five items. I bet you I learn exactly one new schematic, no more and no less," you would be smart to bet against him. Your odds of winning are roughly 3:2: If you and your friend made the same bet over and over and over, you ought to win one and a half times as much as you lose.

It is not impossible to make an argument about probability by way of confidence intervals, but I think it is kind of a clunky way to do it.

Common sense tells us that if we perform four trials, and in each of the four trials we learn five new schematics from five reverse engineering attempts (results that we would expect to occur with a frequency of about 1/3000), we can have a high degree of confidence in our inference that something is probably biasing the results.

Things get a lot trickier when the results are less extreme though. How many times do you have to flip heads before you conclude that your coin is not working correctly?

Is a 13.4% success rate over 400+ trials evidence enough to conclude that the system is not working as it is supposed to? The answer to that question depends entirely on how much variance normally exists among measurements like this. The 99.7% rule says that 99.7% of all values in a normal distribution fall within three standard deviations of the mean. In other words, any data point which is more than three standard deviations from the mean is an extreme outlier and extremely unlikely to occur by mere chance. But to say that your results are three standard deviations from the mean, you need to know what the standard deviation for tests like this is. I have no idea what that would be, but my intuition suggests that 13.4% is probably within three of them.

This is admittedly outside my area of expertise, but if you do not know how much statistical variance normally exists across trials of that size I do not think it is even possible to make a meaningful claim about the significance of your results using a confidence interval calculation.

look i don't understand what your problem here is so I will try to make it clear. You are trying to define a probability of an out come of a number of events that are related to one another. If we defined it that way then that would have a different probability of one type of situation occurring over another, that is true.

The point your missing is I am looking at each event, either i get a plan or i don't. This is called a binomial sequence or process. In a binomial process the rate of one event is/should converge to the stated probability of each individuality event in this case 20 percent. Now since we can only do so many test we need to find a way to check if the test set matches what the stated probability of an individuality event. to do this you compute the confidence interval for any given sample size. (the interval changes based on the number of test just as you would expect) for the number of test that i have done the rate of success is outside of the confidence interval. I know this is true. I know that being out of this interval means that rate of 20 percent that Bioware is telling us is WRONG with only a .3 percent chance that I am out of the confidence interval. So this post was to do one of two things.

1) someone else would generate their own test set and see if they get the same results. If they do that is just more demand that number two needs to occur.

2) the developers have said they like it when we tell the with data, when we can, when there is a problem. I am telling them that the math says there is a problem.

I don't understand what you think but I know what I have done and what is means. I also know how easy it is for Bioware to not have tested this enough. In fact if you look at my sample you will see that I hit a 20 percent rate at one point and I though that life was good but that rate didn't hold and things have dropped off. The link should take you to a plot i have made.

http://www.flickr.com/photos/1043952...in/photostream
in fact if you look at the plot you will see that the rate of successes hit 20 percent at one point. The red is the confidence interval and the blue is the rate of the times i got a plan from reverse engineering my greens.