The OP makes one mistake that has nothing to do with sample size.

He assumes that the difference between observed and expected results is normal distributed. But this is only the case for the binomial model with a fair coin (and enough tries). Crafting in SW:ToR however follows a binomial model with a weighted coin so the resulting distribution is not normal.

Most statistical tests fail whenever you test something that is not normal distributed.

In order to check if the chance for a success is truly 20% in the game you need to do more work like a series of

SPRT .

An alternative is to check if the distribution resembles the expected result by applying more heuristic methods, like simply plotting the results of the experiments and compare the resulting image with a graph of the real distribution.