Import the packages first.
library(magrittr)
library(sn)
Situation Description
The central limit theorem is an important computational shortcut for generating and making inference from the sampling distribution of the mean. I will recall that the central limit theorem shortcut relies on a number of conditions, specifically:
 Independent observations
 Identically distributed observations
 Mean and variance exist
 Sample size large enough for convergence
In this simulation study, I.m going to compare the sampling distribution of the mean generated by simulation to the sampling distribution implied by the central limit theorem. I will compare the distributions graphically in QQplots.
This will be a 4 × 4 factorial experiment. The first factor will be the sample size, with N = 5, 10, 20, and 40. The second factor will be the degree of skewness in the underlying distribution. The underlying distribution will be the SkewNormal distribution. The SkewNormal distribution has three parameters: location \(\xi\), scale \(\omega\), and slant \(\alpha\). When the slant parameter is 0, the distribution reverts to the normal distribution. As the slant parameter increases, the distribution becomes increasingly skewed. In this simulation, the slant will be set to 0, 2, 10, 100. Set location and scale to 0 and 1, respectively, for all simulation settings.
Plot preparation
In the very beginning, we need to set up the parameters that do not change in the following steps. The slant of SkewNormal distribution will change later, therefore, only the location \(\xi\) and scale \(\omega\) will be set in this part.
R < 5000
location < 0
scale < 1
location: \(\xi\), scale: \(\omega\), slant: \(\alpha\)
Before creating the function, let’s clarify the functions for calculating the delta, mean and standard deviation for the central limit theorem (CLT).

Delta: \(\delta =\frac{\alpha}{\sqrt{1+\alpha^2}}\)

Mean: \(\xi+\omega \delta \sqrt{\frac{2}{\pi}}\)

Standard deviation: \(\sqrt{\omega^2(1\frac{2\delta^2}{\pi}) }\)
Then define the function for the CLT process and generating the QQplots by using the functions before.
qqplot_creator < function(slant, N) {
delta < slant / (sqrt(1 + slant ^ 2))
# Quantites to calculate/generate
pop_mean < location + scale * delta * (sqrt(2 / pi))
pop_sd < sqrt(scale ^ 2 * (1  ((2 * delta ^ 2) / pi)))
Z < rnorm(R) # generate the normal distribution as the basement
#CLT approximation
sample_dist_clt < Z * (pop_sd / sqrt(N)) + pop_mean
#Simulation approximation
random.skew < array(rsn(R * N, xi = location, omega = scale, alpha = slant),
dim = c(R, N))
sample_dist_sim < apply(random.skew, 1, mean)
qqplot(sample_dist_clt, sample_dist_sim, axes = FALSE, frame.plot=TRUE, ann = FALSE)
abline(0,1)
}
QQplot generation
Now we can set the slants and Ns we want to test in the following steps. As the requirement, the N = 5, 10, 20, and 40 and slant will be set to 0, 2, 10, 100. Then create a sequence to define the points where we want to test.
slant < c(0,2,10,100)
N < c(5,10,20,40)
x < seq(2,2,0.01)
Set a graph for put all of the QQplots together and use the qqplot_creator function to fill the QQplots inside.
par(mfrow=c(4,5),mai=c(0.1,0.1,0.1,0.1), oma = c(0, 4, 4, 0))
for(i in slant){
plot(dsn(x,
xi = location,
omega = scale,
alpha = i),
axes = FALSE,
frame.plot=TRUE,
type = "l",
xlab = NA, ylab = NA)
for(j in N){
qqplot_creator(i, j)
}
}
mtext(text="Distribution N=5 N=10 N=20 N=40",
side = 3,
outer = TRUE)
mtext(text="slant = 100 slant = 10 slant = 2 slant = 0",
side = 2,
outer = TRUE)
Conclusion
Definitely, when the N is bigger, the QQplot will fit the y=x line better, which means the CLT works better when it wants to simulate the distribution. And when the slant is bigger, in other words, the SkewNormal distribution has higher skewness, it will be more difficult to simulate the distribution.