You do not need 10,000 data points. 400 is more than enough.

Whenever someone mentions random number, you'll get all these people with no understanding of probability coming out and say 'you need a bigger sample size'. You don't. The bigger issue however is that 20% is the mean, it doesn't tell you anything about the "distribution". Suppose the 'random number generator' goes like this : fail for the first 800 tries, and success for the next 200. The mean is still 20%, however it's not evenly distributed over time.

According to the data the OP has gotten, there is a 99.7% confidence interval only applies if it's a random even distribution. Most programs, when you use the default 'random' function, suffers from this. This is why when you run something in Python or Excel, you will gets a lot more 'streaks' compared to what the theory indicates.

True randomness is hard to do, most implementation uses 'psuedorandom'. For instance, you can start off with 1000 numbers in a box, scramble them up, and then pick one of them out until no number is left. What number you get will then depend on what number has already been picked out, and you're prone to more 'streaks' because of it.