Please upgrade your browser for the best possible experience.

Chrome Firefox Internet Explorer
×

Reverse Engineering is not 20%

STAR WARS: The Old Republic > English > Crew Skills
Reverse Engineering is not 20%

Darth_Sweets's Avatar


Darth_Sweets
02.10.2013 , 12:47 PM | #11
Quote: Originally Posted by Khevar View Post
You sample size is too small.

If you've ever applied math to gambling with dice or roulette, and tried to calculate optimal gambling strategies (I have) you should know that you need a much larger sample size to have any confidence in your results.

Applying standard deviation to "the products you build" is NOT the same as applying standard deviation for gambling results.
Actually the confidence interval test is valid with almost any sample size since, the tightness of the bounds will change with the number of samples used. As for calculating dice rolls that is standard stuff done in any class. (Even in HS) As for this idea that the sample standard deviation, I am not using the standard deviation that the developers are telling us we should see.

Since it seem that it is making an issue I am a flight controls engineer with a masters in engineering, my specialty is Kalman filtering witch is just a fiance statistical tool. I spend my days making math models of IMUs and then validating the models i make with test results for autonomous vehicles, missile and rockets.

Arlanon's Avatar


Arlanon
02.10.2013 , 02:27 PM | #12
Reverse Engineering? Crapshoot!

Cartel Boxes? Crapshoot!

Warzone PUGs? Crapshoot!

Pazzak Tables? Nope can't have those! Gambling would make this a Mature rated game!

Quellryloth's Avatar


Quellryloth
02.10.2013 , 04:06 PM | #13
Hmm... there doesn't seem to be anything wrong with what you did, but it is possible that you were just very, very unlucky. I am going to RE a bunch of stuff with two of my characters in the next few days. When I get to 200 items RE'ed, I will come back to this thread and report.

Khevar's Avatar


Khevar
02.10.2013 , 06:03 PM | #14
Quote: Originally Posted by Darth_Sweets View Post
Actually the confidence interval test is valid with almost any sample size ...
This is incorrect. Sample size affects the confidence interval test.

Going back to gambling, rolling a hard 10 hop is a one in 36 chance.

And yet, it is not hard to sit at a craps table and roll the dice 400 times and only see a single hard 10. It's just too small a sample size.

In modeling statistics of purely random events (e.g. a dice roll) it requires a very large sample size for any deviation to have any real value.

Your post tile claims "Reverse Engieering is not 20%" That conclusion is WAY too broad for the data you present.

Frostbyt's Avatar


Frostbyt
02.10.2013 , 10:22 PM | #15
Quote: Originally Posted by Shibbstah View Post
This can all be solved via a simple term: RNG. Just because it says 20%, doesn't mean you'll RE 5 items and get a blue schematic, and so on. I've gotten the schematic on the first RE. Sometimes I've had to do 15+ to get it. Just unlucky.
pretty sure OP knows a lot more about probability and RNG then you
The Harbinger, Krath Origin
Frost'byte PT • Frost'bite OpMed • Frostbit Sorc • Nomadis Mara • Fróstbite Sniper
Armor'mech CombatMedic • Armsdealer GS • $%^&2 Guard

Kaskali's Avatar


Kaskali
02.11.2013 , 02:06 AM | #16
Quote: Originally Posted by Darth_Sweets View Post
Actually the confidence interval calculation is what I us at work to validate math models with test results for products we build. At my job we say that if a test data falls outside the 90 percent confidence interval we say that it fails to validate the model.
Right, but are you modeling probabilistic events?

Confidence intervals are useful when we have a definite hypothesis and definite test results. They allow us to represent things like our level of confidence that our measurements are accurate, and how confident we are that our results point to an actual phenomenon and not just statistical noise.

Confidence intervals are also useful when we have definite results from a subset of some larger population and we want to extrapolate from them. We know with complete certainty how people in exit polls voted. Confidence intervals allow us to represent our confidence that these exit polling numbers are an accurate reflection of all the ballots cast.

In your case, it sounds like confidence intervals allow you to represent the confidence with which you can say that a rocket landed where your model said it would because the model is right, and not because of measurement error or expected variability or whatever.
Quote:
In this case the programers are telling us 20 percent is the outcome we should see.
No, they aren't. I think that is precisely the problem here.

There is a very important difference between saying that something has a 20% chance to happen and saying that something will happen 20% of the time.

For the sake of illustration, let's say that we are going to reverse engineer five items, each with a 20% chance to teach us a new schematic. A simple probability table gives us the following percentage chances for each of the five possible outcomes:

0/5 Successfullly teach us a new schematic = 32.77%
1/5 Successfully teach us a new schematic = 40.96%
2/5 Successfully teach us a new schematic = 20.54%
3/5 Successfully teach us a new schematic = 5.12%
4/5 Successfully teach us a new schematic = 0.64%
5/5 Successfully teach us a new schematic = 0.032%

We don't have a definite hypothesis because our model is probabilistic. Our chance of learning exactly one new schematic from five reverse engineering attempts (a perfect 20% success rate) is less than half. It is the most probable of the five outcome - it should occur with greater frequency than any other individual outcome - but it is substantially less probable than all the other outcomes put together.

If your friend said to you "I am going to reverse engineer five items. I bet you I learn exactly one new schematic, no more and no less," you would be smart to bet against him. Your odds of winning are roughly 3:2: If you and your friend made the same bet over and over and over, you ought to win one and a half times as much as you lose.

It is not impossible to make an argument about probability by way of confidence intervals, but I think it is kind of a clunky way to do it.

Common sense tells us that if we perform four trials, and in each of the four trials we learn five new schematics from five reverse engineering attempts (results that we would expect to occur with a frequency of about 1/3000), we can have a high degree of confidence in our inference that something is probably biasing the results.

Things get a lot trickier when the results are less extreme though. How many times do you have to flip heads before you conclude that your coin is not working correctly?

Is a 13.4% success rate over 400+ trials evidence enough to conclude that the system is not working as it is supposed to? The answer to that question depends entirely on how much variance normally exists among measurements like this. The 99.7% rule says that 99.7% of all values in a normal distribution fall within three standard deviations of the mean. In other words, any data point which is more than three standard deviations from the mean is an extreme outlier and extremely unlikely to occur by mere chance. But to say that your results are three standard deviations from the mean, you need to know what the standard deviation for tests like this is. I have no idea what that would be, but my intuition suggests that 13.4% is probably within three of them.

This is admittedly outside my area of expertise, but if you do not know how much statistical variance normally exists across trials of that size I do not think it is even possible to make a meaningful claim about the significance of your results using a confidence interval calculation.

Darth_Sweets's Avatar


Darth_Sweets
02.11.2013 , 04:38 AM | #17
Quote: Originally Posted by Kaskali View Post
Right, but are you modeling probabilistic events?

Confidence intervals are useful when we have a definite hypothesis and definite test results. They allow us to represent things like our level of confidence that our measurements are accurate, and how confident we are that our results point to an actual phenomenon and not just statistical noise.

Confidence intervals are also useful when we have definite results from a subset of some larger population and we want to extrapolate from them. We know with complete certainty how people in exit polls voted. Confidence intervals allow us to represent our confidence that these exit polling numbers are an accurate reflection of all the ballots cast.

In your case, it sounds like confidence intervals allow you to represent the confidence with which you can say that a rocket landed where your model said it would because the model is right, and not because of measurement error or expected variability or whatever.
No, they aren't. I think that is precisely the problem here.

There is a very important difference between saying that something has a 20% chance to happen and saying that something will happen 20% of the time.

For the sake of illustration, let's say that we are going to reverse engineer five items, each with a 20% chance to teach us a new schematic. A simple probability table gives us the following percentage chances for each of the five possible outcomes:

0/5 Successfullly teach us a new schematic = 32.77%
1/5 Successfully teach us a new schematic = 40.96%
2/5 Successfully teach us a new schematic = 20.54%
3/5 Successfully teach us a new schematic = 5.12%
4/5 Successfully teach us a new schematic = 0.64%
5/5 Successfully teach us a new schematic = 0.032%

We don't have a definite hypothesis because our model is probabilistic. Our chance of learning exactly one new schematic from five reverse engineering attempts (a perfect 20% success rate) is less than half. It is the most probable of the five outcome - it should occur with greater frequency than any other individual outcome - but it is substantially less probable than all the other outcomes put together.

If your friend said to you "I am going to reverse engineer five items. I bet you I learn exactly one new schematic, no more and no less," you would be smart to bet against him. Your odds of winning are roughly 3:2: If you and your friend made the same bet over and over and over, you ought to win one and a half times as much as you lose.

It is not impossible to make an argument about probability by way of confidence intervals, but I think it is kind of a clunky way to do it.

Common sense tells us that if we perform four trials, and in each of the four trials we learn five new schematics from five reverse engineering attempts (results that we would expect to occur with a frequency of about 1/3000), we can have a high degree of confidence in our inference that something is probably biasing the results.

Things get a lot trickier when the results are less extreme though. How many times do you have to flip heads before you conclude that your coin is not working correctly?

Is a 13.4% success rate over 400+ trials evidence enough to conclude that the system is not working as it is supposed to? The answer to that question depends entirely on how much variance normally exists among measurements like this. The 99.7% rule says that 99.7% of all values in a normal distribution fall within three standard deviations of the mean. In other words, any data point which is more than three standard deviations from the mean is an extreme outlier and extremely unlikely to occur by mere chance. But to say that your results are three standard deviations from the mean, you need to know what the standard deviation for tests like this is. I have no idea what that would be, but my intuition suggests that 13.4% is probably within three of them.

This is admittedly outside my area of expertise, but if you do not know how much statistical variance normally exists across trials of that size I do not think it is even possible to make a meaningful claim about the significance of your results using a confidence interval calculation.
look i don't understand what your problem here is so I will try to make it clear. You are trying to define a probability of an out come of a number of events that are related to one another. If we defined it that way then that would have a different probability of one type of situation occurring over another, that is true.

The point your missing is I am looking at each event, either i get a plan or i don't. This is called a binomial sequence or process. In a binomial process the rate of one event is/should converge to the stated probability of each individuality event in this case 20 percent. Now since we can only do so many test we need to find a way to check if the test set matches what the stated probability of an individuality event. to do this you compute the confidence interval for any given sample size. (the interval changes based on the number of test just as you would expect) for the number of test that i have done the rate of success is outside of the confidence interval. I know this is true. I know that being out of this interval means that rate of 20 percent that Bioware is telling us is WRONG with only a .3 percent chance that I am out of the confidence interval. So this post was to do one of two things.

1) someone else would generate their own test set and see if they get the same results. If they do that is just more demand that number two needs to occur.

2) the developers have said they like it when we tell the with data, when we can, when there is a problem. I am telling them that the math says there is a problem.

I don't understand what you think but I know what I have done and what is means. I also know how easy it is for Bioware to not have tested this enough. In fact if you look at my sample you will see that I hit a 20 percent rate at one point and I though that life was good but that rate didn't hold and things have dropped off. The link should take you to a plot i have made.

http://www.flickr.com/photos/1043952...in/photostream

in fact if you look at the plot you will see that the rate of successes hit 20 percent at one point. The red is the confidence interval and the blue is the rate of the times i got a plan from reverse engineering my greens.

DataBeaver's Avatar


DataBeaver
02.11.2013 , 05:39 AM | #18
I generated ten random series of a thousand samples each with a simple Python script, and here's the result:

http://snag.gy/JPFI4.jpg

Apologies for the horrible colors. Notice how one of the lines is slightly below the 99.7% confidence interval for a while, before eventually climbing back up? Exactly like you are seeing in your experiment.

While 99.7% may seem like an impressive number, it only covers 997 of every 1000 cases. The remaining three fall outside it. And there are many thousands of players on each server; maybe even tens of thousands. So it's far from impossible for someone to have such a streak of bad luck.

Quellryloth's Avatar


Quellryloth
02.11.2013 , 07:28 AM | #19
Quote: Originally Posted by Kaskali View Post
But to say that your results are three standard deviations from the mean, you need to know what the standard deviation for tests like this is. I have no idea what that would be, but my intuition suggests that 13.4% is probably within three of them.
We know exactly what standard deviation is. In fact, we know everything about this distribution (it's what is called a normal distribution) because this is the classic high school binomial distribution problem. But in fact, even if it was not so simple, a lot of other scenarios (almost all that anyone cares about) converge to the same thing. Read up on the central limit theorem if you want to know why.

By the way, here are my results so far: 74 tried, 15 successes.

Khevar's Avatar


Khevar
02.11.2013 , 01:57 PM | #20
@Darth_Sweets, I hope that you recognize that I'm NOT trying to defend the 20% tooltip as correct. I'm also not trying to say that your data is incorrect.

I'm simply saying that your conclusion is premature because your sample size is TOO DAMN SMALL.

You're dealing with a random number generator. As a software developer, I can say with confidence that implementing RNG is very easy. You can pick simplified rng that uses few calculations, or you can do a more complex calculation with crypto rng functions. Every language has a toolset that provides for this.

Using such an RNG in your code is even easier. Example:
Code:
if ( rng.GenerateValue() <= schematic.ChanceForSuccess ) {
    schematic.Success = true;
}
I'm not saying Bioware implemented rng correctly, I'm just saying that implementing a functioning RNG solution is very very easy to do right.

Now, if you want to validate any sort of RNG, you need to take a large enough sample size. The larger your sample size, the closer you should be to the expected results, and if there are deviations, you have enough data to present your case.

A few hundred tests isn't enough. Even 1,000 tests may not be enough. When modeling craps betting strategies and dice roll patterns, I had to get up to 10,000 or more iterations before I was seeing consistent results.