Given a set of data (~5000 values) I'd like to draw random samples from the same distribution as the original data. The problem is there is no way to know for sure what distribution the original data comes from.
It makes sense to use normal distribution in my case, although I'd like to be able to motivate that decision, and of course I also need to estimate the $(mu,sigma)$ pair.
Any idea on how to accomplish this, preferably within Java environment. I have been using Apache Commons Math and recently stumbled upon Colt library. I was hoping to get it done without bothering with MATLAB and R.
Best Answer
How big are the samples that you need? If substantially smaller than the 5000 points you have, say maximum 100 points or so, you could just take a random subset of your sample. Then you don't even need to assume normality – it's guaranteed to come from the distribution you want!
Otherwise, it seems that the org.apache.commons.math.stat.descriptive.moment
package has a Mean and StandardDeviation class which use the correct formulas. These should give you $mu$ and $sigma$, respectively.
Similar Posts:
- Solved – Parameter estimation for normal distribution in Java
- Solved – Parameter estimation for normal distribution in Java
- Solved – Upper/lower bound and initial domain for lognormal distribution
- Solved – getting different intercept values in R and Java for simple linear regression
- Solved – the name for the distribution shape of a histogram with this kind of curvature