Free Mathematics Tutorials

Free Mathematics Tutorials

Central limit theorem with examples and solutions, central limit theorem [1].

If within a population, with any distribution, that has a mean \( \mu \) and a standard deviation \( \sigma \) we take random samples of size \( n \ge 30 \) with replacement, then the distribution of the sample means is close to a normal distribution with mean \( \mu_{\bar X} \) and standard deviation \( \sigma_{\bar X} \) given by: \[ \mu_{\bar X} = \mu \] \[ \sigma_{\bar X} = \dfrac{\sigma}{\sqrt n} \] It is important to note that the central limit theorem states that the distribution of the sample mean \( \bar X \) tends to a normal distribution regardless of the distribution of the population from which the random samples are drawn. Therefore the central limit theorem allows us to use all normal distribution computational techniques to the distribution of the sample mean as long as the sample \( n \) size is large. \( ( n \ge 30 ) \) and \( \mu \) and \( \sigma \) are known. Note 1) If the population has a normal distribution, the central limit theorem holds even for smaller sample size \( n \). 2) The central limit theorem also holds for populations with binomial distributions as long as \( n(1-p) \ge 5 \).

Sampling Distributions

Pin it!

We now make samples from this population by drawing 2 integers (with replacement) at a time. The list of all possible samples are

Examples Using the Central Limit Theorem with Detailed Solutions

Example 1 Let \( X \) be a random variable with mean \( \mu = 20 \) and standard deviation \( \sigma = 4\). A sample of size 64 is randomly selected from this population. What is the approximate probability that the sample mean \( \bar X \) of the selected sample is less than \( 19 \)? Solution to Example 1 No information about the population distribution is given. However, the mean and the standard deviation of the population are given. The sample size \( n = 64 \) is greater than \( 30 \) and we are asked a question related to the sample mean , we therefore may use the central limit theorem to answer the above question. According to the central limit theorem, the distribution of the sample mean \( \bar X \) is close to a normal distribution with the mean \( \mu_{\bar X} \) and standard deviation \( \sigma_{\bar X} \) given by \( \mu_{\bar X} = \mu = 20 \) \( \sigma_{\bar X} = \dfrac{\sigma}{\sqrt n} = \dfrac{4}{\sqrt {64}} \) We are looking for the probability \( P ( \bar X \lt 19 ) \) The Z-score \( Z \) corresponding to \( \bar X = 19 \) is given by \( Z = \dfrac{\bar X - \mu_{\bar X}}{\sigma_{\bar X}} = \dfrac{19 - 20}{\dfrac{4}{\sqrt {64}}} = - 2 \) Use a table or a normal probability calculator to obtain the probability that the mean of the sample is less than \( 19 \). \( P ( \bar X \lt 19 ) = P ( Z \lt -2 ) \approx 0.0228\)

Example 2 In the first semester of the year 2003, the average return for a group of 251 investing companies was \( 4.5\% \) and the standard deviation was \( 1.5\% \). If a sample of 40 companies is randomly selected from this group, what is the approximate probability that the average return of the companies in this sample was between \( 4\% \) and \( 5\% \) in the first semester of the year 2003? Solution to Example 2 The population is made up of 251 companies with average (mean) return equal to \( 4.5\% \) with standard deviation equal to \( 1.5\% \) The sample is large enough: \( n = 40 (\ge 30) \). We are looking for the probability concerning the average (mean) return, we therefore may use the central limit theorem. Let \( \bar X \) be the random variable representing the mean. According to the central limit theorem, the distribution of \( \bar X \) is close to a normal distribution with the mean and standard deviation given by \( \mu_{\bar X} = \mu = 4.5\% \) \( \sigma_{\bar X} = \dfrac{\sigma}{\sqrt n} = \dfrac{1.5\%}{\sqrt {40}} \) We are looking for the probability \( P ( 4\% \lt \bar X \lt 5\% ) \) The Z-scores \( Z_1 \) and \( Z_2 \) corresponding to \( \bar X_1 = 4\% \) and \( \bar X_2 = 5\% \), respectively, are given by \( Z_1 = \dfrac{\bar X_1 - \mu_{\bar X}}{\sigma_{\bar X}} = \dfrac{4\% - 4.5\%}{\dfrac{1.5\%}{\sqrt {40}}} \approx -2.10818 \) \( Z_2 = \dfrac{\bar X_2 - \mu_{\bar X}}{\sigma_{\bar X}} = \dfrac{5\% - 4.5\%}{\dfrac{1.5\%}{\sqrt {40}}} \approx 2.10818\) Use a table or a normal probability calculator to obtain the probability that average return of the companies in the sample was between \( 4\% \) and \( 5\% \). \( P ( 4\% \lt X \lt 5\% ) = P ( -2.10818 \lt Z \lt 2.10818 ) \approx 0.965\)

Example 3 A pension fund company carries out a study of a large group of mutual funds and find that their average return over a period of 5 years was \( 80\% \) with a standard deviation equal to \( 30\% \). If a sample of \( 50 \) mutual funds is randomly selected from the group, what is the approximate probability that the sample had an average return greater than \( 90\% \) over the 5 year period? Solution to Example 3 The question is related to the average (mean) return and the sample size \( n = 50 \) is large enough (\( \ge 30 ) \), we may therefore use the central limit theorem. Let \( \bar X \) be the random variable representing the mean of the sample. According to the central limit theorem, the distribution of \( \bar X \) is close to a normal distribution with the mean and standard deviation given by \( \mu_{\bar X} = \mu = 80\% \) \( \sigma_{\bar X} = \dfrac{\sigma}{\sqrt n} = \dfrac{30\%}{\sqrt {50}} \) We are looking for the probability \( P ( \bar X \gt 90\% ) \) The Z-scores \( Z \) corresponding to \( 90\% \) is given by \( Z = \dfrac{90\% - 80\%}{\dfrac{30\%}{\sqrt {50}}} \approx 2.35702\) Use a table or a normal probability calculator to obtain the probability that average return of the companies in the sample was greater than \( 90\% \). \( P ( X \gt 90\% ) = P ( Z \gt 2.35702 ) \approx 0.0092 \)

Example 4 The daily number of tools produced by a company is 2000. The average length of the tools is \( 10 \) centimeters with a standard deviation equal to \( 0.3 \) centimeters. If a sample of \( 200 \) tools is selected at random, what is the approximate probability that the average length of the tools in the sample is within \( 0.05 \) centimeter of the average length? Solution to Example 4 The question is related to the average (mean) length of the tool and the sample size \( n = 200 \) is large enough, we may therefore use the central limit theorem. Let \( \bar X \) be the random variable representing the average (mean) of the sample. According to the central limit theorem, the distribution of \( \bar X \) is close to a normal distribution with the mean and standard deviation given by \( \mu_{\bar X} = \mu = 10 \) \( \sigma_{\bar X} = \dfrac{\sigma}{\sqrt n} = \dfrac{0.3}{\sqrt {200}} \) We are looking for the probability that \( \bar X \) is within \( 0.05 \) centimeter of the average length means we are looking for the probability: \( P ( 10 - 0.05 \le \bar X \le 10 + 0.05) \) The Z-scores \( Z_1 \) corresponding to to \( \bar X = 10 - 0.05 = 9.95 \) is given by \( Z_1 = \dfrac{9.95 - 10}{\dfrac{0.3}{\sqrt {200}}} \approx -2.35702\) The Z-scores \( Z_2 \) corresponding to to \( \bar X = 10 + 0.05 = 10.05 \) is given by \( Z_2 = \dfrac{10.05 - 10}{\dfrac{0.3}{\sqrt {200}}} \approx 2.35702\) Use a table or a normal probability calculator to obtain the probability that the average length of the tools in the sample is within \( 0.05 \) centimeter of the average length. \( P ( 9.96 \le \bar X \lt 10.05 ) = P ( -2.35702 \le Z \le 2.35702 ) \approx 0.9816 \)

Example 5 An airplane has a capacity of 200 seats and a total baggage limit of 6000 kilograms. Assume the total weight \( X \) checked by each passenger is a random variable with a mean of 28 kilograms and standard deviation 15 kilograms. If 200 passengers board a flight, what is the approximate probability that the total weight of their baggage will not exceed the limit? Solution to Example 5 For the luggage of the 200 passengers not to exceed 6000 kilograms, the average of the weight \( X \) checked by each passenger must not exceed \( \dfrac{6000}{200} = 30 \) kilograms. Therefore the problem is reduced to find the probability: \( P (\bar X \lt 30) \) where \( \bar X \) is the sample mean of the weight \( X \). Since the sample size is \( n = 200 \), the distribution of \( \bar X \) is close to a normal distribution with mean: \( \mu_{\bar X} = \mu = 28 \) standard deviation: \( \sigma_{\bar X} = \dfrac{\sigma}{\sqrt n} = \dfrac{15}{\sqrt {200}} \) We are looking for the probability that \( \bar X \) is less \( 30 \) written as \( P ( \bar X \le 30) \) The Z-scores \( Z \) corresponding to to \( \bar X = 30 \) is given by \( Z = \dfrac{30 - 28}{\dfrac{15}{\sqrt {200}}} \approx 1.88561\) Use a table or a normal probability calculator to obtain the probability that the total weight of their baggage will not exceed the limit. \( P ( \bar X \le 30) = P ( Z \lt 1.88561 ) \approx 0.9703 \)

More References and links

  • Probability, Statistics and Estimations page 10.
  • Normal Probability Calculator .
  • Mean and Standard deviation - Problems with Solutions .
  • Normal Distribution Problems with Solutions .

Popular Pages

{ezoic-ad-1}

  • Privacy Policy

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Central Limit Theorem | Formula, Definition & Examples

Published on July 6, 2022 by Shaun Turney . Revised on June 22, 2023.

The central limit theorem states that if you take sufficiently large samples from a population, the samples’ means will be normally distributed , even if the population isn’t normally distributed.

Central Limit Theorem

Table of contents

What is the central limit theorem, central limit theorem formula, sample size and the central limit theorem, conditions of the central limit theorem, importance of the central limit theorem, central limit theorem examples, practice questions, other interesting articles, frequently asked questions about the central limit theorem.

The central limit theorem relies on the concept of a sampling distribution , which is the probability distribution of a statistic for a large number of samples taken from a population.

Imagining an experiment may help you to understand sampling distributions:

  • Suppose that you draw a random sample from a population and calculate a statistic for the sample, such as the mean.
  • Now you draw another random sample of the same size, and again calculate the mean .
  • You repeat this process many times, and end up with a large number of means, one for each sample.

The distribution of the sample means is an example of a sampling distribution.

The central limit theorem says that the sampling distribution of the mean will always be normally distributed , as long as the sample size is large enough. Regardless of whether the population has a normal, Poisson, binomial, or any other distribution, the sampling distribution of the mean will be normal.

A normal distribution is a symmetrical, bell-shaped distribution, with increasingly fewer observations the further from the center of the distribution.

The only proofreading tool specialized in correcting academic writing - try for free!

The academic proofreading tool has been trained on 1000s of academic texts and by native English editors. Making it the most accurate and reliable proofreading tool for students.

enumerate the steps in solving problem using central limit theorem brainly

Try for free

Fortunately, you don’t need to actually repeatedly sample a population to know the shape of the sampling distribution. The parameters of the sampling distribution of the mean are determined by the parameters of the population:

  • The mean of the sampling distribution is the mean of the population.

\begin{equation*}\mu_{\bar{x}}=\mu\end{equation*}

  • The standard deviation of the sampling distribution is the standard deviation of the population divided by the square root of the sample size.

\begin{equation*}\sigma_{\bar{x}} = \dfrac{\sigma}{\sqrt{n}}\end{equation*}

We can describe the sampling distribution of the mean using this notation:

\begin{equation*}\bar{X}\sim N (\mu,\dfrac{\sigma}{\sqrt{n}})\end{equation*}

  • X̄ is the sampling distribution of the sample means
  • ~ means “follows the distribution”
  • N is the normal distribution
  • µ is the mean of the population
  • σ is the standard deviation of the population
  • n is the sample size

The sample size ( n ) is the number of observations drawn from the population for each sample. The sample size is the same for all samples.

The sample size affects the sampling distribution of the mean in two ways.

1. Sample size and normality

The larger the sample size, the more closely the sampling distribution will follow a normal distribution .

When the sample size is small, the sampling distribution of the mean is sometimes non-normal. That’s because the central limit theorem only holds true when the sample size is “sufficiently large.”

By convention, we consider a sample size of 30 to be “sufficiently large.”

  • When n < 30 , the central limit theorem doesn’t apply. The sampling distribution will follow a similar distribution to the population. Therefore, the sampling distribution will only be normal if the population is normal.
  • When n ≥ 30 , the central limit theorem applies. The sampling distribution will approximately follow a normal distribution.

2. Sample size and standard deviations

The sample size affects the standard deviation of the sampling distribution. Standard deviation is a measure of the variability or spread of the distribution (i.e., how wide or narrow it is).

  • When n is low , the standard deviation is high. There’s a lot of spread in the samples’ means because they aren’t precise estimates of the population’s mean.
  • When n is high , the standard deviation is low. There’s not much spread in the samples’ means because they’re precise estimates of the population’s mean.

The central limit theorem states that the sampling distribution of the mean will always follow a normal distribution under the following conditions:

  • The sample size is sufficiently large . This condition is usually met if the sample size is n ≥ 30.
  • The samples are independent and identically distributed (i.i.d.) random variables . This condition is usually met if the sampling is random .
  • The population’s distribution has finite variance . Central limit theorem doesn’t apply to distributions with infinite variance, such as the Cauchy distribution. Most distributions have finite variance.

The central limit theorem is one of the most fundamental statistical theorems. In fact, the “central” in “central limit theorem” refers to the importance of the theorem.

Applying the central limit theorem to real distributions may help you to better understand how it works.

Continuous distribution

Suppose that you’re interested in the age that people retire in the United States. The population is all retired Americans, and the distribution of the population might look something like this:

Central Limit Theorem - Continuous-distribution

Age at retirement follows a left-skewed distribution. Most people retire within about five years of the mean retirement age of 65 years. However, there’s a “long tail” of people who retire much younger, such as at 50 or even 40 years old. The population has a standard deviation of 6 years.

Imagine that you take a small sample of the population. You randomly select five retirees and ask them what age they retired.

The mean of the sample is an estimate of the population mean. It might not be a very precise estimate, since the sample size is only 5.

Suppose that you repeat this procedure 10 times, taking samples of five retirees, and calculating the mean of each sample. This is a sampling distribution of the mean .

If you repeat the procedure many more times, a histogram of the sample means will look something like this:

Central Limit Theorem - Sampling-distribution

Although this sampling distribution is more normally distributed than the population, it still has a bit of a left skew .

Notice also that the spread of the sampling distribution is less than the spread of the population.

The central limit theorem says that the sampling distribution of the mean will always follow a normal distribution when the sample size is sufficiently large. This sampling distribution of the mean isn’t normally distributed because its sample size isn’t sufficiently large.

Now, imagine that you take a large sample of the population. You randomly select 50 retirees and ask them what age they retired.

The mean of the sample is an estimate of the population mean. It’s a precise estimate, because the sample size is large.

Again, you can repeat this procedure many more times, taking samples of fifty retirees, and calculating the mean of each sample:

Central Limit Theorem - Mean-of-a-large-sample

In the histogram, you can see that this sampling distribution is normally distributed, as predicted by the central limit theorem.

The standard deviation of this sampling distribution is 0.85 years, which is less than the spread of the small sample sampling distribution, and much less than the spread of the population. If you were to increase the sample size further, the spread would decrease even more.

We can use the central limit theorem formula to describe the sampling distribution:

\bar{X} \sim N (\mu,\dfrac{\sigma}{\sqrt{n}})

Discrete distribution

Approximately 10% of people are left-handed. If we assign a value of 1 to left-handedness and a value of 0 to right-handedness, the probability distribution of left-handedness for the population of all humans looks like this:

Central Limit Theorem - Theorem-discrete-distribution

The population mean is the proportion of people who are left-handed (0.1). The population standard deviation is 0.3.

Imagine that you take a random sample of five people and ask them whether they’re left-handed.

Imagine you repeat this process 10 times, randomly sampling five people and calculating the mean of the sample. This is a sampling distribution of the mean .

If you repeat this process many more times, the distribution will look something like this:

Central Limit Theorem - Theorem-discrete-distribution

The sampling distribution isn’t normally distributed because the sample size isn’t sufficiently large for the central limit theorem to apply.

As the sample size increases, the sampling distribution looks increasingly similar to a normal distribution, and the spread decreases:

Central Limit Theorem - n=10

The sampling distribution of the mean for samples with n = 30 approaches normality. When the sample size is increased further to n = 100, the sampling distribution follows a normal distribution.

We can use the central limit theorem formula to describe the sampling distribution for n = 100.

\bar{X} \sim N (0.1,\dfrac{0.3}{\sqrt{100}})

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Confidence interval
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

In a normal distribution , data are symmetrically distributed with no skew. Most values cluster around a central region, with values tapering off as they go further away from the center.

The measures of central tendency (mean, mode, and median) are exactly the same in a normal distribution.

Normal distribution

The three types of skewness are:

  • Right skew (also called positive skew ) . A right-skewed distribution is longer on the right side of its peak than on its left.
  • Left skew (also called negative skew). A left-skewed distribution is longer on the left side of its peak than on its right.
  • Zero skew. It is symmetrical and its left and right sides are mirror images.

Skewness of a distribution

Samples are used to make inferences about populations . Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Turney, S. (2023, June 22). Central Limit Theorem | Formula, Definition & Examples. Scribbr. Retrieved February 17, 2024, from https://www.scribbr.com/statistics/central-limit-theorem/

Is this article helpful?

Shaun Turney

Shaun Turney

Other students also liked, normal distribution | examples, formulas, & uses, probability distribution | formula, types, & examples, what is probability sampling | types & examples, what is your plagiarism score.

  • IIT JEE Study Material

Central Limit Theorem

The central limit theorem, which is a statistical theory, states that when a large sample size has a finite variance, the samples will be normally distributed, and the mean of samples will be approximately equal to the mean of the whole population.

In other words, the central limit theorem states that for any population with mean and standard deviation, the distribution of the sample mean for sample size N has mean μ and standard deviation σ/√n.

As the sample size gets bigger and bigger, the mean of the sample will get closer to the actual population mean. If the sample size is small, the actual distribution of the data may or may not be normal, but as the sample size gets bigger, it can be approximated by a normal distribution. This statistical theory is useful in simplifying analysis while dealing with stock indexes and much more.

The CLT can be applied to almost all types of probability distributions. But there are some exceptions. For example, if the population has a finite variance. Also, this theorem applies to independent, identically distributed variables. It can also be used to answer the question of how big a sample you want. Remember that as the sample size grows, the standard deviation of the sample average falls because it is the population standard deviation divided by the square root of the sample size. This theorem is an important topic in statistics. In many real-time applications, a certain random variable of interest is a sum of a large number of independent random variables. In such situations, we can use the CLT to justify using the normal distribution.

In this article, students can learn the central limit theorem formula, definition and examples. 

Table of Contents

Central Limit Theorem Statement

The central limit theorem states that whenever a random sample of size n is taken from any distribution with mean and variance, then the sample mean will be approximately a normal distribution with mean and variance. The larger the value of the sample size, the better the approximation of the normal.

Assumptions of the Central Limit Theorem

  • The sample should be drawn randomly following the condition of randomisation.
  • The samples drawn should be independent of each other. They should not influence the other samples.
  • When the sampling is done without replacement, the sample size shouldn’t exceed 10% of the total population.
  • The sample size should be sufficiently large.

The formula for the central limit theorem is given below:

Consider x 1 , x 2 , x 3 ,……,x n are independent and identically distributed with mean μ and finite variance σ 2 , then any random variable Z n as,

Then, the distribution function of Z n converges to the standard normal distribution function as n increases without any bound.

Again, define a random variable U i by

E(U i ) = 0 and V(U i ) = 1

Thus, the moment-generating function can be written as

Since x i are random independent variables, U i are also independent.

As per Taylor series expansion:

Multiply each term by n, and as n → ∞ , all terms but the first go to zero.

Which is the moment-generating function for a standard normal random variable.

The steps used to solve the problem of the central limit theorem that are either involving ‘>’ ‘<’ or “between” are as follows:

1) The information about the mean, population size, standard deviation, sample size and a number that is associated with “greater than”, “less than”, or two numbers associated with both values for a range of “between” is identified from the problem.

2) A graph with a centre as the mean is drawn. 

3)  \(\begin{array}{l}\text{The formula}\ z = \frac{\bar x – \mu}{\frac{\sigma}{\sqrt{n}}}\ \text{is used to find the z-score.}\end{array} \)  

4) The z-table is referred to find the ‘z’ value obtained in the previous step.

5) Case 1: Central limit theorem involving “>”.

Subtract the z-score value from 0.5.

Case 2: Central limit theorem involving “<”.

Add 0.5 to the z-score value.

Case 3: Central limit theorem involving “between”.

Step 3 is executed.

6) The z-value is found along with the x-bar.

The last step is common to all three cases, that is, to convert the decimal obtained into a percentage.

Mean Value Theorem

Correlation Coefficient

Examples on Central Limit Theorem

20 students are selected at random from a clinical psychology class; find the probability that their mean GPA is more than 5. If the average GPA scored by the entire batch is 4.91, the standard deviation is 0.72.

Population mean = μ = 4.91

Population standard deviation= σ = 0.72

Sample size = n = 20 (which is less than 30)

Since the sample size is smaller than 30, use the t-score instead of the z-score, even though the population standard deviation is known.

Substituting the values, we have

Now, find the t-score:

For this problem, the raw score x = 5

Find the probability for the t value using the t-score table. The degree of freedom here would be:

Df = 20 – 1 = 19

P(t ≤ 0.559) = 0.7087

P(t > 0.559) = 1 – 0.7087 = 0.2913

Thus, the probability that the score is more than 5 is 9.13 %.

The average weight of a water bottle is 30 kg, with a standard deviation of 1.5 kg. If a sample of 45 water bottles is selected at random from a consignment and their weights are measured, find the probability that the mean weight of the sample is less than 28 kg.

Population mean: μ = 30 kg

Population standard deviation: σ = 1.5 Kg

Sample size: n = 45 (which is greater than 30)

Using the z-score, we have

The sample standard deviation:

And, \(\begin{array}{l}\sigma_{\bar x}=\frac{1.5}{\sqrt{45}}\end{array} \)   = 6.7082

Find the z-score for the raw score of x = 28 kg

= (28 – 30)(6.7082) = -0.2981

Using the z-score table OR normal CDF function on a statistical calculator,

P(z < -0.2981) = 0.3828

Thus, the probability that the weight of the cylinder is less than 28 kg is 38.28%.

Example 3: The record of weights of the female population follows a normal distribution. Its mean and standard deviation are 65 kg and 14 kg, respectively. If a researcher considers the records of 50 females, then what would be the standard deviation of the chosen sample?

Mean of the population μ = 65 kg

The standard deviation of the population = 14 kg

Sample size n = 50

Standard deviation is given by \(\begin{array}{l}\sigma _{\bar{x}}= \frac{\sigma }{\sqrt{n}}\end{array} \)

= 14/7.071 

Applications of Central Limit Theorem

1] The sample distribution is assumed to be normal when the distribution is unknown or not normally distributed according to the central limit theorem. This method assumes that the given population is distributed normally. It helps in data analysis.

2] The sample mean deviation decreases as we increase the samples taken from the population, which helps in estimating the mean of the population more accurately.

3] The sample mean is used to create a range of values which likely includes the population mean.

4] The concept of the central limit theorem is used in election polls to estimate the percentage of people supporting a particular candidate as confidence intervals.

5] CLT is used in calculating the mean family income in a particular country.

6] It is used in rolling many identical, unbiased dice.

7] The probability distribution for the total distance covered in a random walk will approach a normal distribution.

8] Flipping many coins will result in a normal distribution for the total number of heads (or, equivalently total number of tails).

9] By looking at the sample distribution, CLT can tell whether the sample belongs to a particular population.

Frequently Asked Questions

How to determine the standard error of the mean.

The central limit theorem statement states that, for any population with mean and standard deviation, the distribution of the sample mean for sample size N has mean μ and standard deviation σ/√n. To determine the standard error of the mean, find the standard deviation for the population and divide it by the square root of the sample size.

What are the properties of the central limit theorem?

We can summarise the properties of the central limit theorem for sample means with the following statements: 1. Sampling is a form of any distribution with mean and standard deviation. 2. Provided that n is large (n≥30), as a rule of thumb, the sampling distribution of the sample mean will be approximately a normal distribution with a mean and a standard deviation equal to σ/√n. 3. If the sampling distribution is normal, the sampling distribution of the sample means will be an exact normal distribution for any sample size.

Give an example where the central limit theorem is used in real life.

Biologists use the central limit theorem when they use data from a sample of organisms to make conclusions about the overall population of organisms.

Give the formula for the central limit theorem.

The central limit theorem for the sample means, Z = (x̄-μ)/(σ/√n).

Quiz Image

Put your understanding of this concept to test by answering a few MCQs. Click ‘Start Quiz’ to begin!

Select the correct answer and click on the “Finish” button Check your score and answers at the end of the quiz

Visit BYJU’S for all Maths related queries and study materials

Your result is as below

Request OTP on Voice Call

Leave a Comment Cancel reply

Your Mobile number and Email id will not be published. Required fields are marked *

Post My Comment

enumerate the steps in solving problem using central limit theorem brainly

  • Share Share

Register with Aakash BYJU'S & Download Free PDFs

Register with byju's & watch live videos.

  • 90% Refund @Courses
  • Maths Notes Class 12
  • NCERT Solutions Class 12
  • RD Sharma Solutions Class 12
  • Maths Formulas Class 12
  • Maths Previous Year Paper Class 12
  • Class 12 Syllabus
  • Class 12 Revision Notes
  • Physics Notes Class 12
  • Chemistry Notes Class 12
  • Biology Notes Class 12

Related Articles

  • Coding for Everyone
  • Basic Math Formulas

Number System

  • What is the Division Formula?
  • LCM Formula
  • Distributive Property
  • Consecutive Integers
  • Scientific Notation Formula
  • Binary Formula
  • Convert Binary fraction to Decimal
  • Fibonacci Sequence Formula
  • Direct Variation Formula
  • What is Celsius Formula?
  • Fahrenheit to Celsius (°F to °C)
  • Revenue Formula
  • Selling Price Formula
  • How to calculate the Discount?
  • Simple Interest
  • Compound Interest Formula
  • Monthly Compound Interest Formula
  • Daily Compound Interest Formula
  • Double Time Formula

Basic Geometry

  • Perpendicular Lines
  • Right Angle
  • What is Parallel Lines Formula?
  • Angles Formula
  • Degrees To Radians
  • Area of 2D Shapes
  • Area of Quadrilateral
  • Area of a Square
  • What is the Diameter Formula?
  • Arc Length Formula
  • Central Angle of a Circle Formula
  • Asymptote Formula
  • Axis of Symmetry of a Parabola
  • Centroid of a Trapezoid Formula
  • Area of a Circle: Formula, Derivation, Examples
  • Parallelogram Formulas
  • Perimeter Formulas for Geometric Shapes
  • Perimeter of Triangle
  • Equilateral Triangle
  • Scalene Triangle: Definition, Properties, Formula, Examples
  • Right Angled Triangle | Properties and Formula
  • Perimeter of Rectangle
  • What is the Formula for Perimeter of a Square?
  • Circumference Formula
  • Perimeter of a Parallelogram
  • Rhombus Formula
  • Perimeter of Rhombus Formula
  • Diagonal Formula
  • Diagonal of a Polygon Formula
  • Diagonal of a Square Formula
  • Diagonal of Parallelogram Formula
  • Diagonal of a Cube Formula
  • Euclid Euler Theorem
  • What is Side Angle Side Formula?
  • Polygon Formula - Definition, Symbol, Examples

Mensuration

  • Annulus Area Formula
  • Volume Formulas for 3D Shapes
  • Volume of a Cube
  • Volume of Cylinder
  • Volume of Cone | Formula, Derivation and Examples
  • Volume of a Sphere
  • Surface Area Formulas
  • Surface Area of Cone
  • Surface Area of Sphere | CSA, TSA, Formula and Derivation
  • Surface Area of a Square Pyramid
  • Volume of a Pyramid Formula
  • Frustum of Cone
  • Volume of a Square Pyramid Formula
  • Surface Area of a Prism Formula
  • Frustum of a Regular Pyramid Formula
  • Algebra Formulas
  • Polynomial Formula
  • Factorization of Polynomial
  • What is Factoring Trinomials Formula?
  • a2 - b2 Formula
  • Difference of Cubes
  • Discriminant Formula in Quadratic Equations
  • Sum of Arithmetic Sequence Formula
  • Function Notation Formula
  • Binomial Distribution in Probability
  • Binomial Expansion Formula
  • Binomial Theorem
  • FOIL Method
  • Exponential Decay Formula
  • Factorial Formula
  • Combinations Formula with Examples
  • Fourier Series Formula
  • Maclaurin series

Coordinate Geometry

  • Equation of a Straight Line
  • Equation of a Circle
  • Ellipse Formula

Trigonometry

  • Cofunction Formulas
  • What is Cos Square theta Formula?
  • What are Cosine Formulas?
  • Cosecant Formula
  • Cotangent Formula
  • Tangent Formulas
  • Cot Half Angle Formula
  • 2cosacosb Formula
  • Multiple Angle Formulas
  • Double Angle Formula for Cosine
  • Inverse Trigonometric Functions

Complex Number

  • Absolute Value of a Complex Number
  • Complex Number Power Formula
  • DeMoivre's Theorem
  • Covariance Matrix
  • Determinant of a Matrix
  • Average and Instantaneous Rate of Change
  • Math Calculus - Differential and Integral | Calculus Tutorial
  • Total Derivative
  • Difference Quotient Formula
  • Chain Rule Derivative - Theorem, Proof, Examples
  • Implicit Differentiation
  • Antiderivatives
  • Integration Formulas
  • Integration by Parts
  • Integration by Substitution Formula
  • Definite Integral
  • Area Under the Curve
  • Differentiation and Integration Formula
  • Differential Equations
  • Direction of a Vector Formula
  • Dot and Cross Products on Vectors
  • Cross Product of two Vectors

Probability

  • Conditional Probability
  • Empirical Probability
  • Bayes' Theorem
  • Bernoulli Trials and Binomial Distribution
  • Change of Base Formula

Statistics Formula

  • Circle Graph Formula
  • Mean, Median and Mode
  • Mean Deviation Formula
  • Mean Absolute Deviation
  • Average Deviation Formula
  • Degrees of Freedom
  • Anova Formula

Central Limit Theorem

  • Coefficient of Determination Formula
  • Coefficient of Variation Formula
  • Linear Regression Formula
  • Pearson Correlation Coefficient

Central Limit Theorem in Statistics states that as the sample size increases and its variance is finite, then the distribution of the sample mean approaches normal distribution irrespective of the shape of the population distribution. The query that how much the sample size should increase can be answered that if the sample size is greater than 30 then the statement of the Central Limit Theorem holds true.

In this article on Central Limit Theorem, we will about the definition of the Central Limit Theorem, its example, the Central Limit Theorem Formula, its proof, and its applications.

What is Central Limit Theorem?

Central Limit Theorem explains that the sample distribution of the sample mean resembles the normal distribution irrespective of the fact that whether the variables themselves are distributed normally or not. Central Limit Theorem is often called CLT in abbreviated form.

Central Limit Theorem Definition

Central Limit Theorem states that when large samples usually greater than thirty are taken into consideration then the distribution of sample arithmetic mean approaches the normal distribution irrespective of the fact that random variables were originally distributed normally or not.

Central Limit Theorem Explanation

Let’s say we have a large sample of observations and each sample is randomly produced and independent of other observations. Calculate the average of the observations, thus having a collection of averages of observations. Now as per Central Limit Theorem, if the sample size was adequately large, then the probability distribution of these sample averages will approximate to a normal distribution.

Assumptions of Central Limit Theorem

Central Limit Theorem is valid for the following conditions:

  • The drawing of the sample from the population should be random.
  • The drawing of the sample should be independent of each other.
  • The sample size should not exceed ten percent of the total population when sampling is done without replacement.
  • Sample Size should be adequately large.
  • CLT only holds true for population with finite variance.

Central Limit Theorem Formula

\overline{X}

Central Limit Theorem Proof

Let the independent random variables be X 1 , X 2 , X 3 , . . . . , X n which are identically distributed and where their mean is zero(μ = 0) and their variance is one(σ 2 = 1).

\dfrac{\overline X - \mu}{\frac{\sigma}{\sqrt n}}

Here, according to Central Limit Theorem, Z approximates to Normal Distribution as the value of n increases.

Let m(t) be the Moment Generating Function of Xi

⇒ M'(1) = E(Xi) = μ = 0

⇒ M”(0) = E(Xi 2 ) = 1

The Moment Generating Function for Xi/√n is given as E[e tXi/√n ]

Since, X 1 X 2 , X 3 . . . X n are independent, hence the Moment Generating Function for (X 1 + X 2 + X 3 + . . . + X n )/√n is given as [M(t/√n)] n

Let us assume as function

f(t) = log M(t)

⇒ f(0) = log M(0) = 0

⇒ f'(0) = M'(0)/M(0) = μ/1 = μ

⇒ f”(0) = (M(0).M”(0) – M'(0)2)/M'(0)2 = 1

Now, using L’ Hospital Rule we will find t/√n as t 2 /2

⇒ [M(t/√n)] 2 = [e f(t/√n) ] n

⇒ [e nf(t/√n) ] = e^(t 2 /2)

Thus the Central Limit Theorem has been proved by getting Moment Generating Function of a Standard Normal Distribution.

Steps to Solve Problems on Central Limit Theorem

Problems of Central Limit Theorem that involves >, < or between can be solved by the following steps:

Step 1: First identify the >, < associated with sample size, population size, mean and variance in the problem. Also there can be ‘betwee; associated with range of two numbers. Step 2: Draw a Graph with Mean as Centre Step 3: Find the Z-Score using the formula Step 4: Refer to the Z table to find the value of Z obtained in the previous step. Step 5: If the problem involves ‘>’ subtract the Z score from 0.5; if the problem involves ‘<‘ add 0.5 to the Z score and if the problem involves ‘between’ then perform only step 3 and 4. Step 6: The Z score value is found along Step 7: Convert the decimal value obtained in all three cases to decimal.

Central Limit Theorem Application

Central Limit Theorem is generally used to predict the characteristics of a population from a set of sample. It can be applied in various fields. Some of the applications Central Limit Theorem are mentioned below:

  • Central Limit Theorem is used by Economist and Data Scientist to draw conclusion about population to make a statistical model.
  • Central Limit Theorem is used by Biologists to make accurate predictions about the characteristics of the population from set of sample.
  • Manufacturing Industries use Central Limit Theorem to predict overall defective items produced by selecting random products from a sample.
  • Central Limit Theorem is used in surveys to predict the characteristics of the population or to predict the average response of the population by analyzing a sample of obtained responses.
  • CLT can be used in Machine Learning to make conclusion about the performance of the model.
Probability Distribution Function Standard Deviation Central Tendency

Solved Examples on Central Limit Theorem

Example 1. The male population’s weight data follows a normal distribution. It has a mean of 70 kg and a standard deviation of 15 kg. What would the mean and standard deviation of a sample of 50 guys be if a researcher looked at their records?

Given: μ = 70 kg, σ = 15 kg, n = 50 As per the Central Limit Theorem, the sample mean is equal to the  population mean. Hence,   = μ = 70 kg Now,   = 15/√50 ⇒    ≈ 2.1 kg

Example 2. A distribution has a mean of 69 and a standard deviation of 420. Find the mean and standard deviation if a sample of 80 is drawn from the distribution.

Given: μ = 69, σ = 420, n = 80 As per the Central Limit Theorem, the sample mean is equal to the  population mean. Hence,   = μ = 69  Now,  ⇒  = 420/√80 ⇒    = 46.95 

Example 3. The mean age of people in a colony is 34 years. Suppose the standard deviation is 15 years. The sample of size is 50. Find the mean and standard deviation of the sample.

Given: μ = 34, σ = 15, n = 50 As per the Central Limit Theorem, the sample mean is equal to the  population mean. Hence,   = μ = 34 years Now,  ⇒  = 15/√50 ⇒   = 2.12 years

Example 4. The mean age of cigarette smokers is 35 years. Suppose the standard deviation is 10 years. The sample size is 39. Find the mean and standard deviation of the sample.

Given: μ = 35, σ = 10, n = 39 As per the Central Limit Theorem, the sample mean is equal to the  population mean. Hence,   = μ = 35 years Now,   = 10/√39 ⇒   = 1.601 years

Example 5. The mean time taken to read a newspaper is 8.2 minutes. Suppose the standard deviation is one minute. Take a sample of size 70. Find its mean and standard deviation.

Given: μ = 8.2, σ = 1, n = 70 As per the Central Limit Theorem, the sample mean is equal to the  population mean. Hence,   = μ = 8.2 minutes Now,   = 1/√70 ⇒   = 0.11 minutes

Example 6. A distribution has a mean of 12 and a standard deviation of 3. Find the mean and standard deviation if a sample of 36 is drawn from the distribution.

Given: μ = 12, σ = 3, n = 36 As per the Central Limit Theorem, the sample mean is equal to the  population mean. Hence,   = μ = 12 Now,   = 3/√36 ⇒   = 0.5

Example 7. A distribution has a mean of 4 and a standard deviation of 5. Find the mean and standard deviation if a sample of 25 is drawn from the distribution.

Given: μ = 4, σ = 5, n = 25 As per the Central Limit Theorem, the sample mean is equal to the  population mean. Hence,   = μ = 4 Now,  ⇒  = 5/√25 ⇒   = 1

FAQs on Central Limit Theorem

1. what is central limit theorem in statistics.

Central Limit Theroem in statistics states that whenever we take a large sample size of a population then the distribution of sample mean approximates to the normal distribution.

2. When does Central Limit Theorem apply?

Central Limit theorem applies when the sample size is larger usually greater than 30.

3. Why is Central Limit Theorem important?

Central Limit Theorem is important as it helps to make accurate prediction about a population just by analyzing the sample.

4. How to solve Central Limit Theorem?

The Central Limit Theorem can be solved by finding Z score which is calculated by using the formula Z = . The detailed process has been discussed under the heading “Steps to Solve Central Limit Theorem”.

5. What is Moment Generating Function?

Moment Generating Function is a function that encodes the moment of a random variable into a function. It is the expectation of a function of Random Variable. It acts as alternative to Probability Distribution Function and Cumulative Distribution Function for a random variable

Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now !

Looking for a place to share your ideas, learn, and connect? Our Community portal is just the spot! Come join us and see what all the buzz is about!

Please Login to comment...

  • Maths-Class-12
  • Maths-Formulas
  • School Learning
  • School Mathematics
  • simranarora5sos
  • ashutoshunfc
  • Apple's New AI-powered Tool: Editing Through Text Prompts
  • Rebranding Google Bard to Gemini: All You Need to Know, Android App and Advanced subscriptions
  • Youtube TV's Multiview Update: Tailor Your Experience in 4 Easy Steps
  • Kore.ai Secures $150 Million for AI-Powered Growth
  • 10 Best IPTV Service Provider Subscriptions

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

  • 7.3 Using the Central Limit Theorem
  • Introduction
  • 1.1 Definitions of Statistics, Probability, and Key Terms
  • 1.2 Data, Sampling, and Variation in Data and Sampling
  • 1.3 Frequency, Frequency Tables, and Levels of Measurement
  • 1.4 Experimental Design and Ethics
  • 1.5 Data Collection Experiment
  • 1.6 Sampling Experiment
  • Chapter Review
  • Bringing It Together: Homework
  • 2.1 Stem-and-Leaf Graphs (Stemplots), Line Graphs, and Bar Graphs
  • 2.2 Histograms, Frequency Polygons, and Time Series Graphs
  • 2.3 Measures of the Location of the Data
  • 2.4 Box Plots
  • 2.5 Measures of the Center of the Data
  • 2.6 Skewness and the Mean, Median, and Mode
  • 2.7 Measures of the Spread of the Data
  • 2.8 Descriptive Statistics
  • Formula Review
  • 3.1 Terminology
  • 3.2 Independent and Mutually Exclusive Events
  • 3.3 Two Basic Rules of Probability
  • 3.4 Contingency Tables
  • 3.5 Tree and Venn Diagrams
  • 3.6 Probability Topics
  • Bringing It Together: Practice
  • 4.1 Probability Distribution Function (PDF) for a Discrete Random Variable
  • 4.2 Mean or Expected Value and Standard Deviation
  • 4.3 Binomial Distribution
  • 4.4 Geometric Distribution
  • 4.5 Hypergeometric Distribution
  • 4.6 Poisson Distribution
  • 4.7 Discrete Distribution (Playing Card Experiment)
  • 4.8 Discrete Distribution (Dice Experiment Using Three Regular Dice)
  • 5.1 Continuous Probability Functions
  • 5.2 The Uniform Distribution
  • 5.3 The Exponential Distribution
  • 5.4 Continuous Distribution
  • 6.1 The Standard Normal Distribution
  • 6.2 Using the Normal Distribution
  • 6.3 Normal Distribution (Lap Times)
  • 6.4 Normal Distribution (Pinkie Length)
  • 7.1 The Central Limit Theorem for Sample Means (Averages)
  • 7.2 The Central Limit Theorem for Sums
  • 7.4 Central Limit Theorem (Pocket Change)
  • 7.5 Central Limit Theorem (Cookie Recipes)
  • 8.1 A Single Population Mean using the Normal Distribution
  • 8.2 A Single Population Mean using the Student t Distribution
  • 8.3 A Population Proportion
  • 8.4 Confidence Interval (Home Costs)
  • 8.5 Confidence Interval (Place of Birth)
  • 8.6 Confidence Interval (Women's Heights)
  • 9.1 Null and Alternative Hypotheses
  • 9.2 Outcomes and the Type I and Type II Errors
  • 9.3 Probability Distribution Needed for Hypothesis Testing
  • 9.4 Rare Events, the Sample, Decision and Conclusion
  • 9.5 Additional Information and Full Hypothesis Test Examples
  • 9.6 Hypothesis Testing of a Single Mean and Single Proportion
  • 10.1 Two Population Means with Unknown Standard Deviations
  • 10.2 Two Population Means with Known Standard Deviations
  • 10.3 Comparing Two Independent Population Proportions
  • 10.4 Matched or Paired Samples
  • 10.5 Hypothesis Testing for Two Means and Two Proportions
  • 11.1 Facts About the Chi-Square Distribution
  • 11.2 Goodness-of-Fit Test
  • 11.3 Test of Independence
  • 11.4 Test for Homogeneity
  • 11.5 Comparison of the Chi-Square Tests
  • 11.6 Test of a Single Variance
  • 11.7 Lab 1: Chi-Square Goodness-of-Fit
  • 11.8 Lab 2: Chi-Square Test of Independence
  • 12.1 Linear Equations
  • 12.2 Scatter Plots
  • 12.3 The Regression Equation
  • 12.4 Testing the Significance of the Correlation Coefficient
  • 12.5 Prediction
  • 12.6 Outliers
  • 12.7 Regression (Distance from School)
  • 12.8 Regression (Textbook Cost)
  • 12.9 Regression (Fuel Efficiency)
  • 13.1 One-Way ANOVA
  • 13.2 The F Distribution and the F-Ratio
  • 13.3 Facts About the F Distribution
  • 13.4 Test of Two Variances
  • 13.5 Lab: One-Way ANOVA
  • A | Review Exercises (Ch 3-13)
  • B | Practice Tests (1-4) and Final Exams
  • C | Data Sets
  • D | Group and Partner Projects
  • E | Solution Sheets
  • F | Mathematical Phrases, Symbols, and Formulas
  • G | NOTEs for the TI-83, 83+, 84, 84+ Calculators

It is important for you to understand when to use the central limit theorem . If you are being asked to find the probability of the mean, use the clt for the mean. If you are being asked to find the probability of a sum or total, use the clt for sums. This also applies to percentiles for means and sums.

If you are being asked to find the probability of an individual value, do not use the clt. Use the distribution of its random variable.

Examples of the Central Limit Theorem

Law of large numbers.

The law of large numbers says that if you take samples of larger and larger size from any population, then the mean x ¯ x ¯ of the sample tends to get closer and closer to μ . From the central limit theorem, we know that as n gets larger and larger, the sample means follow a normal distribution. The larger n gets, the smaller the standard deviation gets. (Remember that the standard deviation for X ¯ X ¯ is σ n σ n .) This means that the sample mean x ¯ x ¯ must be close to the population mean μ . We can say that μ is the value that the sample means approach as n gets larger. The central limit theorem illustrates the law of large numbers.

Central Limit Theorem for the Mean and Sum Examples

Example 7.8.

A study involving stress is conducted among the students on a college campus. The stress scores follow a uniform distribution with the lowest stress score equal to one and the highest equal to five. Using a sample of 75 students, find:

  • The probability that the mean stress score for the 75 students is less than two.
  • The 90 th percentile for the mean stress score for the 75 students.
  • The probability that the total of the 75 stress scores is less than 200.
  • The 90 th percentile for the total stress score for the 75 students.

Let X = one stress score.

Problems a and b ask you to find a probability or a percentile for a mean . Problems c and d ask you to find a probability or a percentile for a total or sum . The sample size, n , is equal to 75.

Since the individual stress scores follow a uniform distribution, X ~ U (1, 5) where a = 1 and b = 5 (See Continuous Random Variables for an explanation on the uniform distribution).

μ X = a + b 2 a + b 2 = 1 + 5 2 1 + 5 2 = 3

σ X = ( b – a ) 2 12 ( b – a ) 2 12 = ( 5 – 1) 2 12 ( 5 – 1) 2 12 = 1.15

For problems a. and b., let X ¯ X ¯ = the mean stress score for the 75 students. Then,

X ¯ X ¯ ∼ N ( 3,  1 .15 75 ) ( 3,  1 .15 75 )

a. Find P ( x ¯ x ¯ < 2). Draw the graph.

a. P ( x ¯ x ¯ < 2) = 0

The probability that the mean stress score is less than two is about zero.

normalcdf ( 1,2,3, 1 .15 75 ) ( 1,2,3, 1 .15 75 ) = 0

The smallest stress score is one.

b. Find the 90 th percentile for the mean of 75 stress scores. Draw a graph.

b. Let k = the 90 th precentile.

Find k , where P ( x ¯ x ¯ < k ) = 0.90.

The 90 th percentile for the mean of 75 scores is about 3.2. This tells us that 90% of all the means of 75 stress scores are at most 3.2, and that 10% are at least 3.2.

invNorm ( 0 .90,3, 1.15 75 ) ( 0 .90,3, 1.15 75 ) = 3.2

For problems c and d, let ΣX = the sum of the 75 stress scores. Then, ΣX ~ N [(75)(3), ( 75 ) ( 75 ) (1.15)]

c. Find P ( Σx < 200). Draw the graph.

c. The mean of the sum of 75 stress scores is (75)(3) = 225

The standard deviation of the sum of 75 stress scores is ( 75 ) ( 75 ) (1.15) = 9.96

P ( Σx < 200) = 0

The probability that the total of 75 scores is less than 200 is about zero.

normalcdf (75,200,(75)(3), ( 75 ) ( 75 ) (1.15)).

The smallest total of 75 stress scores is 75, because the smallest single score is one.

d. Find the 90 th percentile for the total of 75 stress scores. Draw a graph.

d. Let k = the 90 th percentile.

Find k where P ( Σx < k ) = 0.90.

The 90 th percentile for the sum of 75 scores is about 237.8. This tells us that 90% of all the sums of 75 scores are no more than 237.8 and 10% are no less than 237.8.

invNorm (0.90,(75)(3), ( 75 ) ( 75 ) (1.15)) = 237.8

Use the information in Example 7.8 , but use a sample size of 55 to answer the following questions.

  • Find P ( x ¯ x ¯ < 7).
  • Find P ( Σx > 170).
  • Find the 80 th percentile for the mean of 55 scores.
  • Find the 85 th percentile for the sum of 55 scores.

Example 7.9

Suppose that a market research analyst for a cell phone company conducts a study of their customers who exceed the time allowance included on their basic cell phone contract; the analyst finds that for those people who exceed the time included in their basic contract, the excess time used follows an exponential distribution with a mean of 22 minutes.

Consider a random sample of 80 customers who exceed the time allowance included in their basic cell phone contract.

Let X = the excess time used by one INDIVIDUAL cell phone customer who exceeds his contracted time allowance.

X ∼ Exp ( 1 22 ) ( 1 22 ) . From previous chapters, we know that μ = 22 and σ = 22.

Let X ¯ X ¯ = the mean excess time used by a sample of n = 80 customers who exceed their contracted time allowance.

X ¯ X ¯  ~  N ( 22,  22 80 ) ( 22,  22 80 ) by the central limit theorem for sample means

Using the clt to find probability

  • Find the probability that the mean excess time used by the 80 customers in the sample is longer than 20 minutes. This is asking us to find P( x ¯ x ¯ > 20). Draw the graph.
  • Suppose that one customer who exceeds the time limit for his cell phone contract is randomly selected. Find the probability that this individual customer's excess time is longer than 20 minutes. This is asking us to find P ( x > 20).
  • Explain why the probabilities in parts a and b are different.

Find: P ( x ¯ x ¯ > 20)

P ( x ¯ x ¯ > 20) = 0.79199 using normalcdf ( 20,1E99,22, 22 80 ) ( 20,1E99,22, 22 80 )

The probability is 0.7919 that the mean excess time used is more than 20 minutes, for a sample of 80 customers who exceed their contracted time allowance.

1E99 = 10 99 and –1E99 = –10 99 . Press the EE key for E. Or just use 10 99 instead of 1E99.

Find P (x > 20). Remember to use the exponential distribution for an individual: X ~ E x p ( 1 22 ) X ~ E x p ( 1 22 ) . P ( x > 20 )  =  e ( − ( 1 22 ) ( 20 ) ) P ( x > 20 )  =  e ( − ( 1 22 ) ( 20 ) ) or e (–0.04545(20)) = 0.4029

  • P ( x > 20) = 0.4029 but P ( x ¯ x ¯ > 20) = 0.7919
  • The probabilities are not equal because we use different distributions to calculate the probability for individuals and for means.
  • When asked to find the probability of an individual value, use the stated distribution of its random variable; do not use the clt. Use the clt with the normal distribution when you are being asked to find the probability for a mean.

Using the clt to find percentiles

Let k = the 95 th percentile. Find k where P ( x ¯ x ¯ < k ) = 0.95

k = 26.0 using invNorm ( 0 .95,22 , 22 80 ) ( 0 .95,22 , 22 80 ) = 26.0

The 95 th percentile for the sample mean excess time used is about 26.0 minutes for random samples of 80 customers who exceed their contractual allowed time.

Ninety five percent of such samples would have means under 26 minutes; only five percent of such samples would have means above 26 minutes.

Use the information in Example 7.9 , but change the sample size to 144.

  • Find P (20 < x ¯ x ¯ < 30).
  • Find P ( Σx is at least 3,000).
  • Find the 75 th percentile for the sample mean excess time of 144 customers.
  • Find the 85 th percentile for the sum of 144 excess times used by customers.

Example 7.10

In the United States, a robbery occurs every two minutes, on average, according to a number of studies. Suppose the standard deviation is 0.5 minutes and the sample size is 100.

  • Find the median, the first quartile, and the third quartile for the sample mean time of robberies in the United States.
  • Find the median, the first quartile, and the third quartile for the sum of sample times of robberies in the United States.
  • Find the probability that a robbery occurs on the average between 1.75 and 1.85 minutes.
  • Find the value that is two standard deviations above the sample mean.
  • Find the IQR for the sum of the sample times.
  • 50 th percentile = μ x = μ = 2
  • 25 th percentile = invNorm (0.25,2,0.05) = 1.97
  • 75 th percentile = invNorm (0.75,2,0.05) = 2.03
  • 50 th percentile = μ Σx = n ( μ x ) = 100(2) = 200
  • 25 th percentile = invNorm(0.25,200,5) = 196.63
  • 75 th percentile = invNorm(0.75,200,5) = 203.37
  • P (1.75 < x ¯ x ¯ < 1.85) = normalcdf (1.75,1.85,2,0.05) = 0.0013
  • Using the z -score equation, z  =  x ¯ – μ x ¯ σ x ¯ z  =  x ¯ – μ x ¯ σ x ¯ , and solving for x , we have x = 2(0.05) + 2 = 2.1
  • The IQR is 75 th percentile – 25 th percentile = 203.37 – 196.63 = 6.74

Try It 7.10

Based on data from the National Health Survey, females between the ages of 18 and 24 have an average systolic blood pressures (in mm Hg) of 114.8 with a standard deviation of 13.1. Systolic blood pressure for females between the ages of 18 to 24 follow a normal distribution.

  • If one female from this population is randomly selected, find the probability that their systolic blood pressure is greater than 120.
  • If 40 females from this population are randomly selected, find the probability that their mean systolic blood pressure is greater than 120.
  • If the sample were four females between the ages of 18 to 24 and we did not know the original distribution, could the central limit theorem be used?

Example 7.11

A study was done regarding attendance at Broadway shows in New York City. The age range of the attendees was 14 to 61. The mean age was 30.9 years with a standard deviation of nine years.

  • In a sample of 25 attendees, what is the probability that the mean age is less than 35?
  • Is it likely that the mean age of the sample group could be more than 50 years? Interpret the results.
  • In a sample of 49 attendees, what is the probability that the sum of the ages is no less than 1,600?
  • Is it likely that the sum of the ages of the 49 attendees is at most 1,595? Interpret the results.
  • Find the 95th percentile for the sample mean age of 65 attendees. Interpret the results.
  • Find the 90th percentile for the sum of the ages of 65 attendees. Interpret the results.
  • P ( x ¯ x ¯ < 35) = normalcdf (- E 99,35,30.9,1.8) = 0.9886
  • P ( x ¯ x ¯ > 50) = normalcdf (50, E 99,30.9,1.8) ≈ 0. For this sample group, it is almost impossible for the group’s average age to be more than 50. However, it is still possible for an individual in this group to have an age greater than 50.
  • P ( Σx ≥ 1,600) = normalcdf (1600,E99,1514.10,63) = 0.0864
  • P ( Σx ≤ 1,595) = normalcdf (-E99,1595,1514.10,63) = 0.9005. This means that there is a 90% chance that the sum of the ages for the sample group n = 49 is at most 1595.
  • The 95th percentile = invNorm (0.95,30.9,1.1) = 32.7. This indicates that 95% of the attendees in the sample of 65 are younger than 32.7 years, on average.
  • The 90th percentile = invNorm (0.90,2008.5,72.56) = 2101.5. This indicates that 90% of the attendees in the sample of 65 have a sum of ages less than 2,101.5 years.

Try It 7.11

According to Boeing data, the 757 airliner carries 200 passengers and has doors with a height of 72 inches. Assume for a certain population of men we have a mean height of 69.0 inches and a standard deviation of 2.8 inches.

  • What doorway height would allow 95% of men to enter the aircraft without bending?
  • Assume that half of the 200 passengers are men. What mean doorway height satisfies the condition that there is a 0.95 probability that this height is greater than the mean height of 100 men?
  • For engineers designing the 757, which result is more relevant: the height from part a or part b? Why?

HISTORICAL NOTE

Normal Approximation to the Binomial

Historically, being able to compute binomial probabilities was one of the most important applications of the central limit theorem. Binomial probabilities with a small value for n (say, 20) were displayed in a table in a book. To calculate the probabilities with large values of n , you had to use the binomial formula, which could be very complicated. Using the normal approximation to the binomial distribution simplified the process. To compute the normal approximation to the binomial distribution, take a simple random sample from a population. You must meet the conditions for a binomial distribution :

  • there are a certain number n of independent trials
  • the outcomes of any trial are success or failure
  • each trial has the same probability of a success p

Recall that if X is the binomial random variable, then X ~ B ( n, p ). The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities np and nq must both be greater than five ( np > 5 and nq > 5; the approximation is better if they are both greater than or equal to 10). Then the binomial can be approximated by the normal distribution with mean μ = np and standard deviation σ = n p q n p q . Remember that q = 1 – p . In order to get the best approximation, add 0.5 to x or subtract 0.5 from x (use x + 0.5 or x – 0.5). The number 0.5 is called the continuity correction factor and is used in the following example.

Example 7.12

Suppose in a local Kindergarten through 12 th grade (K - 12) school district, 53 percent of the population favor a charter school for grades K through 5. A simple random sample of 300 is surveyed.

  • Find the probability that at least 150 favor a charter school.
  • Find the probability that at most 160 favor a charter school.
  • Find the probability that more than 155 favor a charter school.
  • Find the probability that fewer than 147 favor a charter school.
  • Find the probability that exactly 175 favor a charter school.

Let X = the number that favor a charter school for grades K trough 5. X ~ B ( n, p ) where n = 300 and p = 0.53. Since np > 5 and nq > 5, use the normal approximation to the binomial. The formulas for the mean and standard deviation are μ = np and σ = n p q n p q . The mean is 159 and the standard deviation is 8.6447. The random variable for the normal distribution is Y . Y ~ N (159, 8.6447). See The Normal Distribution for help with calculator instructions.

For part a, you include 150 so P ( X ≥ 150) has normal approximation P ( Y ≥ 149.5) = 0.8641.

normalcdf (149.5,10^99,159,8.6447) = 0.8641.

For part b, you include 160 so P ( X ≤ 160) has normal appraximation P ( Y ≤ 160.5) = 0.5689.

normalcdf (0,160.5,159,8.6447) = 0.5689

For part c, you exclude 155 so P ( X > 155) has normal approximation P ( y > 155.5) = 0.6572.

normalcdf (155.5,10^99,159,8.6447) = 0.6572.

For part d, you exclude 147 so P ( X < 147) has normal approximation P ( Y < 146.5) = 0.0741.

normalcdf (0,146.5,159,8.6447) = 0.0741

For part e, P ( X = 175) has normal approximation P (174.5 < Y < 175.5) = 0.0083.

normalcdf (174.5,175.5,159,8.6447) = 0.0083

Because of calculators and computer software that let you calculate binomial probabilities for large values of n easily, it is not necessary to use the the normal approximation to the binomial distribution, provided that you have access to these technology tools. Most school labs have Microsoft Excel, an example of computer software that calculates binomial probabilities. Many students have access to the TI-83 or 84 series calculators, and they easily calculate probabilities for the binomial distribution. If you type in "binomial probability distribution calculation" in an Internet browser, you can find at least one online calculator for the binomial.

For Example 7.12 , the probabilities are calculated using the following binomial distribution: ( n = 300 and p = 0.53). Compare the binomial and normal distribution answers. See Discrete Random Variables for help with calculator instructions for the binomial.

P ( X ≥ 150) : 1 - binomialcdf (300,0.53,149) = 0.8641

P ( X ≤ 160) : binomialcdf (300,0.53,160) = 0.5684

P ( X > 155) : 1 - binomialcdf (300,0.53,155) = 0.6576

P ( X < 147) : binomialcdf (300,0.53,146) = 0.0742

P ( X = 175) :(You use the binomial pdf.) binomialpdf (300,0.53,175) = 0.0083

Try It 7.12

In a city, 46 percent of the population favor the incumbent, Dawn Morgan, for mayor. A simple random sample of 500 is taken. Using the continuity correction factor, find the probability that at least 250 favor Dawn Morgan for mayor.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/introductory-statistics-2e/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Introductory Statistics 2e
  • Publication date: Dec 13, 2023
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/introductory-statistics-2e/pages/1-introduction
  • Section URL: https://openstax.org/books/introductory-statistics-2e/pages/7-3-using-the-central-limit-theorem

© Dec 6, 2023 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Module 7: The Central Limit Theorem

Using the central limit theorem, learning outcomes.

  • Apply and interpret the central limit theorem for means
  • Classify continuous word problems by their distributions

It is important for you to understand when to use the  central limit theorem . If you are being asked to find the probability of the mean, use the central limit theorem for the mean. If you are being asked to find the probability of a sum or total, use the central limit theorem for sums. This also applies to percentiles for means and sums.

Note:  If you are being asked to find the probability of an individual value, do not use the central limit theorem. Use the distribution of its random variable.

Examples of the central limit theorem, law of large numbers.

The  law of large numbers says that if you take samples of larger and larger size from any population, then the mean [latex]\displaystyle\overline{{x}}[/latex] must be close to the population mean μ . We can say that μ is the value that the sample means approach as n gets larger. The central limit theorem illustrates the law of large numbers.

Central Limit Theorem for the Mean and Sum Examples

A study involving stress is conducted among the students on a college campus.  The stress scores follow a uniform distribution with the lowest stress score equal to one and the highest equal to five. Using a sample of 75 students, find:

  • The probability that the mean stress score for the 75 students is less than two.
  • The 90th percentile for the mean stress score for the 75 students.
  • The probability that the total of the 75 stress scores is less than 200.
  • The 90th percentile for the total stress score for the 75 students.

Let  X = one stress score.

Problems 1. and 2. ask you to find a probability or a percentile for a  mean . Problems 3. and 4. ask you to find a probability or a percentile for a total or sum . The sample size, n , is equal to 75.

Since the individual stress scores follow a uniform distribution, X ~ U (1, 5) where a = 1 and b = 5.

[latex]\displaystyle{\mu}_{X}=\frac{{a+b}}{{2}}=\frac{{1+5}}{{2}}={3}[/latex]

[latex]\displaystyle{\sigma}_{X}=\sqrt{\frac{{{(b-a)}^{2}}}{{12}}}=\sqrt{\frac{{({5-1)}^{2}}}{{12}}}[/latex]= 1.15

For problems 1. and 2., let [latex]\displaystyle\overline{X}[/latex] = the mean stress score for the 75 students. Then,

[latex]\displaystyle\overline{X}\sim{N}({3},\frac{{1.15}}{{\sqrt{75}}})\text{ where } {n}={75}[/latex].

  • Find P([latex]\displaystyle\overline{x}{<}{2}[/latex]). Draw the graph.
  • Find the 90th percentile for the mean of 75 stress scores. Draw a graph.
  • Find P ( Σx < 200). Draw the graph.
  • Find the 90th percentile for the total of 75 stress scores. Draw a graph.

This is a normal distribution curve over a horizontal axis. The peak of the curve coincides with the point 3 on the horizontal axis. A point, 2, is marked at the left edge of the curve.

Excel: NORM.DIST(2, 3, [latex]\displaystyle{\frac{1.15}{\sqrt{75}}}[/latex], 1) = 2.524E-14  [latex] \approx [/latex] 0

TI 83/84: normalcdf [latex]\displaystyle{({1},{2},{3},\frac{{1.15}}{\sqrt{{75}}})}={0}[/latex] Remember  that the smallest stress score is one, so that is the first input in the normalcdf function.

This is a normal distribution curve. The peak of the curve coincides with the point 3 on the horizontal axis. A point, k, is labeled to the right of 3. A vertical line extends from k to the curve. The area under the curve to the left of k is shaded. The shaded area shows that P(x-bar < k) = 0.90.

The 90th percentile for the mean of 75 scores is about 3.2. This tells us that 90% of all the means of 75 stress scores are at most 3.2, and that 10% are at least 3.2.

Excel: NORM.INV(0.90, 3, [latex]\displaystyle{\frac{1.15}{\sqrt{75}}}) \approx [/latex] 3.2

TI 83/84: invNorm(0.90, 3, [latex]\displaystyle{\frac{1.15}{\sqrt{75}}}) \approx [/latex] 3.2

For problems 3. and 4., let ΣX = the sum of the 75 stress scores. Then, [latex]\displaystyle\sum{X}{\sim}{N}{[{({75})}{({3})},{(\sqrt{{75}})}{({1.15})}]}[/latex]

This is a normal distribution curve over a horizontal axis. The peak of the curve coincides with the point 225 on the horizontal axis. A point, 200, is marked at the left edge of the curve.

Excel: NORM.DIST(200, 75*3, 9.96, 1) = 0.006036

TI 83/84: normalcdf (75,200,(75)(3),[latex]\displaystyle{(\sqrt{{75}})}[/latex](1.15)).

Remember , the smallest total of 75 stress scores is 75, because the smallest single score is one.

The probability that the total of 75 scores is less than 200 is about zero.

This is a normal distribution curve. The peak of the curve coincides with the point 225 on the horizontal axis. A point, k, is labeled to the right of 225. A vertical line extends from k to the curve. The area under the curve to the left of k is shaded. The shaded area shows that P(sum of x < k) = 0.90.

Excel: NORM.INV(0.90,75*3, 9.96) = 237.8

TI 83/84: invNorm (0.90,(75)(3),[latex]\displaystyle{(\sqrt{{75}})}[/latex](1.15)) = 237.8

The 90th percentile for the sum of 75 scores is about 237.8. This tells us that 90% of all the sums of 75 scores are no more than 237.8 and 10% are no less than 237.8.

Use the information in “ Central Limit Theorem for the Mean and Sum Examples “, but use a sample size of 55 to answer the following questions.

  • Find [latex]\displaystyle{P}{(\overline{{x}}{<}{2.7})}[/latex].
  • Find [latex]\displaystyle{P}{(\sum{x}{>}{170})}[/latex].
  • Find the 80th percentile for the mean of 55 scores.
  • Find the 85th percentile for the sum of 55 scores.
  • [latex]0.0265[/latex]
  • [latex]0.2789[/latex]
  • [latex]3.13[/latex]
  • [latex]173.84[/latex]

Suppose that a market research analyst for a cell phone company conducts a study of their customers who exceed the time allowance included on their basic cell phone contract; the analyst finds that for those people who exceed the time included in their basic contract, the excess time used follows an exponential distribution with a mean of 22 minutes.

Consider a random sample of 80 customers who exceed the time allowance included in their basic cell phone contract.

Let  X = the excess time used by one INDIVIDUAL cell phone customer who exceeds his contracted time allowance.

[latex]\displaystyle{X}{\sim}{E}{x}{p}{(\frac{{1}}{{22}})}[/latex]. We did not cover the exponential distribution, but can solve this using the Central Limit Theorem (clt). For the distribution of individual values,  μ = 22 and σ = 22.

Let [latex]\displaystyle\overline{{X}}[/latex] = the mean excess time used by a sample of n = 80 customers who exceed their contracted time allowance.

[latex]\displaystyle\overline{{X}}{\sim}{N}{({22},\frac{{22}}{{\sqrt{{80}}}})}[/latex] by the central limit theorem for sample means

  • Using the clt to find probability . Find the probability that the mean excess time used by the 80 customers in the sample is longer than 20 minutes. This is asking us to find [latex]\displaystyle{P}{(\overline{{x}}{>}{20})}[/latex]. Draw the graph.
  • Using the clt to find percentiles. Find the 95th percentile for the sample mean excess time for samples of 80 customers who exceed their basic contract time allowances. Draw a graph.

1. Find: [latex]\displaystyle{P}{(\overline{{x}}{>}{20})}[/latex]

Excel: 1-NORM.DIST(20, 22, [latex]\displaystyle{\frac{22}{\sqrt{80}}}[/latex], 1) = 0.7919

TI 83/84:  normalcdf [latex]\displaystyle{({20},{1}\text{E99},{22},\frac{{22}}{\sqrt{{80}}})}[/latex]

The probability that the mean excess time used is more than 20 minutes, for a sample of 80 customers who exceed their contracted time allowance is 79.19%.

This is a normal distribution curve. The peak of the curve coincides with the point 22 on the horizontal axis. A point, 20, is labeled to the left of 22. A vertical line extends from 20 to the curve. The area under the curve to the right of k is shaded. The shaded area shows that P(x-bar data-verified=

Remember , 1E99 = 10 99 and –1E99 = –10 99 . Press the EE key for E. Or just use 10 99 instead of 1E99.

2.  Let k = the 95th percentile. Find k where [latex]\displaystyle{P}{(\overline{{x}}{<}{k})}={0.95}[/latex]

Excel: NORM.INV(0.95, 22, [latex]\displaystyle{\frac{22}{\sqrt{80}}}[/latex])=26.0

This is a normal distribution curve. The peak of the curve coincides with the point 22 on the horizontal axis. A point, k, is labeled to the right of 22. A vertical line extends from k to the curve. The area under the curve to the left of k is shaded. The shaded area shows that P(x-bar < k) = 0.95.

The 95th percentile for the sample mean excess time used is about 26.0 minutes for random samples of 80 customers who exceed their contractual allowed time.  Ninety five percent of such samples would have means under 26 minutes; only five percent of such samples would have means above 26 minutes.

Use the information in previous example, but change the sample size to 144.

  • Find [latex]\displaystyle{P}{({20}{<}\overline{{x}}{<}{30})}[/latex].
  • Find the 75th percentile for the sample mean excess time of [latex]144[/latex] customers.
  • [latex]0.8623[/latex]
  • [latex]23.2[/latex]

In the United States, someone is sexually assaulted every two minutes, on average, according to a number of studies. Suppose the standard deviation is 0.5 minutes and the sample size is 100.

  • Find the median, the first quartile, and the third quartile for the sample mean time of sexual assaults in the United States.
  • Find the median, the first quartile, and the third quartile for the sum of sample times of sexual assaults in the United States.
  • Find the probability that a sexual assault occurs on the average between 1.75 and 1.85 minutes.
  • Find the value that is two standard deviations above the sample mean.
  • Find the IQR for the sum of the sample times.

1. We have, [latex]\displaystyle{\mu}_{x}={\mu}={2}{\text{and}}{\sigma}_{x}=\frac{{\sigma}}{{\sqrt{n}}}=\frac{{0.5}}{{10}}=0.05[/latex],

(1) Median = 50th percentile = μ x = μ = 2

(2) First Quartile = 25th percentile = NORM.INV(0.25, 2, 0.05) = 1.97 (or on TI 83/84:  invNorm (0.25,2,0.05) = 1.97)

(3) Third Quartile = 75th percentile = NORM.INV(0.75, 2, 0.05) = 2.03 (or on TI 83/84:  invNorm (0.75,2,0.05) = 2.03)

2. We have[latex]\displaystyle{\mu}_{\sum{x}}=n{\mu}_{x}={100}({2})= 200\text{and}{\sigma}_{\mu{x}}=({\sigma})_{x}=10(0.5)=5[/latex].

(1) Median = 50th percentile = [latex]\displaystyle{\mu}_{\sum{x}}=n{\mu}_{x}[/latex]=100(2)=200

(2) First Quartile = 25th percentile = NORM.INV(0.25, 200, 0.05) = 196.63 (or on TI 83/84: invNorm (0.25, 200, 0.05) = 196.63)

(3) Third Quartile = 75th percentile = NORM.INV(0.75, 200, 0.05) = 203.37 (or on TI 83/84: invNorm (0.75, 200, 0.05) = 203.37)

3. [latex]\displaystyle{P}{(1.75{<}{\overline{x}}{<}{1.85})}[/latex]= NORM.DIST(1.85, 2, 0.05, 1) – NORM.DIST(1.75, 2, 0.05, 1) = 0.0013 (or on TI 83/84: normalcdf (1.75,1.85,2,0.05) = 0.0013)

4. Using the z -score equation, [latex]\displaystyle{z}=\frac{{\overline{x}-{\mu}_{\overline{x}}}}{{{\sigma}_{\overline{x}}}}[/latex].  Solving for x, we have x = 2(0.05) + 2 = 2.1

5. The IQR is 75th percentile – 25th percentile = 203.37 – 196.63 = 6.74

Based on data from the National Health Survey, women between the ages of [latex]18[/latex] and [latex]24[/latex] have an average systolic blood pressures (in mm Hg) of [latex]114.8[/latex] with a standard deviation of [latex]13.1[/latex]. Systolic blood pressure for women between the ages of [latex]18[/latex] to [latex]24[/latex] follow a normal distribution.

  • If one woman from this population is randomly selected, find the probability that her systolic blood pressure is greater than [latex]120[/latex].
  • If [latex]40[/latex] women from this population are randomly selected, find the probability that their mean systolic blood pressure is greater than [latex]120[/latex].
  • If the sample were four women between the ages of [latex]18[/latex] to [latex]24[/latex] and we did not know the original distribution, could the central limit theorem be used?

1. P ([latex]x[/latex] > 120) = 1 – NORM.DIST(120, 114.8, 13.1, 1) = 0.3457 (or on TI 83/84: normalcdf [latex](120, 1E99, 114.8, 13.1) \approx{0.3457}[/latex]).

There is about a 35% chance that the randomly selected woman will have a systolic blood pressure greater than [latex]120[/latex].

2. P ( [latex]\overline{x}[/latex]>120) = 1-NORM.DIST(120, 114.8, 13.1/sqrt(40), 1) [latex]\approx[/latex] 0.006 (or on TI 83/84: normalcdf [latex](120, 1E99, 114.8, \dfrac{13.1}{\sqrt{40}})\approx{0.006}[/latex]). There is only a 0.6% chance that the average systolic blood pressure for the randomly selected group is greater than [latex]120[/latex].

3. The central limit theorem could not be used if the sample size were four and we did not know the original distribution was normal. The sample size would be too small.

A study was done about violence against prostitutes and the symptoms of the post-traumatic stress that they developed. The age range of the prostitutes was 14 to 61. The mean age was 30.9 years with a standard deviation of nine years.

  • In a sample of 25 prostitutes, what is the probability that the mean age of the prostitutes is less than 35?
  • Is it likely that the mean age of the sample group could be more than 50 years? Interpret the results.
  • In a sample of 49 prostitutes, what is the probability that the sum of the ages is no less than 1,600?
  • Is it likely that the sum of the ages of the 49 prostitutes is at most 1,595? Interpret the results.
  • Find the 95th percentile for the sample mean age of 65 prostitutes. Interpret the results.
  • Find the 90th percentile for the sum of the ages of 65 prostitutes. Interpret the results.

1. P ( x< 35) = NORM.DIST(35, 30.9, 9/sqrt(25), 1) = 0.9886 (or on TI 83/84:  normalcdf (- E 99,35,30.9,1.8) = 0.9886)

2. P ( x> 50) = 1 – NORM.DIST(50, 30.9, 9/sqrt(25), 1) [latex]\approx[/latex] 0 (or on TI 83/84:  normalcdf (50, E 99,30.9,1.8) ≈ 0).

For this sample group, it is almost impossible for the group’s average age to be more than 50. However, it is still possible for an individual in this group to have an age greater than 50.

3 . P ( Σx ≥ 1,600) = 1 – NORM.DIST(1600, 30.9*49,9*sqrt(49) (or on TI83/84:  normalcdf (1600,E99,1514.10,63)) = 0.0864

4. P ( Σx ≤ 1,595) = normalcdf (-E99,1595,1514.10,63) = 0.9005. This means that there is a 90% chance that the sum of the ages for the sample group n = 49 is at most 1595.

5. The 95th percentile = invNorm (0.95,30.9,1.1) = 32.7. This indicates that 95% of the prostitutes in the sample of 65 are younger than 32.7 years, on average.

6. The 90th percentile = invNorm (0.90,2008.5,72.56) = 2101.5. This indicates that 90% of the prostitutes in the sample of 65 have a sum of ages less than 2,101.5 years.

According to Boeing data, the 757 airliner carries 200 passengers and has doors with a mean height of 72 inches. Assume for a certain population of men we have a mean of 69.0 inches and a standard deviation of 2.8 inches.

  • What mean doorway height would allow 95% of men to enter the aircraft without bending?
  • Assume that half of the 200 passengers are men. What mean doorway height satisfies the condition that there is a 0.95 probability that this height is greater than the mean height of 100 men?
  • For engineers designing the 757, which result is more relevant: the height from part 1 or part 2? Why?
  • We know that μ x = μ = 69 and we have σ x = 2.8. The height of the doorway is found to be =NORM.INV(0.95, 69, 2.8) = 73.61 inches (or on TI 83/84: invNorm (0.95,69,2.8) = 73.61)
  • We know that μ x = μ = 69 and we have [latex]\displaystyle{\sigma}\overline{x}[/latex] = 2.8/sqrt(100). So, =NORM.INV(0.95, 69, 0.28) = 69.49 inches (or on TI 83/84: invNorm (0.95,69,0.28) = 69.49)
  • When designing the doorway heights, we need to incorporate as much variability as possible in order to accommodate as many passengers as possible. Therefore, we need to use the result based on part 1.

Historical Note: Normal Approximation to the Binomial

Historically, being able to compute binomial probabilities was one of the most important applications of the central limit theorem. Binomial probabilities with a small value for n  (say, 20) were displayed in a table in a book. To calculate the probabilities with large values of n , you had to use the binomial formula, which could be very complicated. Using the normal approximation to the binomial distribution simplified the process. To compute the normal approximation to the binomial distribution, take a simple random sample from a population. You must meet the conditions for a binomial distribution :

  • there are a certain number n of independent trials
  • the outcomes of any trial are success or failure
  • each trial has the same probability of a success p

Recall that if X is the binomial random variable, then X ~ B ( n, p ). The shape of the binomial distribution needs to be similar to the shape of the normal distribution. To ensure this, the quantities np and nq must both be greater than five ( np > 5 and nq > 5; the approximation is better if they are both greater than or equal to 10). Then the binomial can be approximated by the normal distribution with mean μ = np and standard deviation . Remember that q = 1 – p . In order to get the best approximation, add 0.5 to x or subtract 0.5 from x (use x + 0.5 or x – 0.5). The number 0.5 is called the continuity correction factor and is used in the following example.

Suppose in a local Kindergarten through 12th grade (K – 12) school district, 53 percent of the population favor a charter school for grades K through 5. A simple random sample of 300 is surveyed.

  • Find the probability that at least 150 favor a charter school.
  • Find the probability that at most 160 favor a charter school.
  • Find the probability that more than 155 favor a charter school.
  • Find the probability that fewer than 147 favor a charter school.
  • Find the probability that exactly 175 favor a charter school.

Let X = the number that favor a charter school for grades K trough 5. X ~ B ( n, p ) where n = 300 and p = 0.53. Since np > 5 and nq > 5, use the normal approximation to the binomial. The formulas for the mean and standard deviation are μ = np and . The mean is 159 and the standard deviation is 8.6447. The random variable for the normal distribution is Y . Y ~ N (159, 8.6447).

For part a, you include 150 so P ( X ≥ 150) has normal approximation P ( Y ≥ 149.5) = 0.8641.

normalcdf (149.5,10^99,159,8.6447) = 0.8641.

For part b, you include 160 so P ( X ≤ 160) has normal approximation P ( Y ≤ 160.5) = 0.5689.

normalcdf (0,160.5,159,8.6447) = 0.5689

For part c, you exclude 155 so P ( X > 155) has normal approximation P ( y > 155.5) = 0.6572.

normalcdf (155.5,10^99,159,8.6447) = 0.6572.

For part d, you exclude 147 so P ( X < 147) has normal approximation P ( Y < 146.5) = 0.0741.

normalcdf (0,146.5,159,8.6447) = 0.0741

For part e, P ( X = 175) has normal approximation P (174.5 < Y < 175.5) = 0.0083.

normalcdf (174.5,175.5,159,8.6447) = 0.0083

Because of calculators and computer software that let you calculate binomial probabilities for large values of n easily, it is not necessary to use the the normal approximation to the binomial distribution, provided that you have access to technology tools such as Microsoft Excel or TI 83/84.

For the example above, the exact probabilities are calculated using the following binomial distribution on a TI 83/84: ( n = 300 and p = 0.53). Compare the binomial and normal distribution answers.

P ( X ≥ 150) : 1 - binomialcdf (300,0.53,149) = 0.8641

P ( X ≤ 160) : binomialcdf (300,0.53,160) = 0.5684

P ( X > 155) : 1 - binomialcdf (300,0.53,155) = 0.6576

P ( X < 147) : binomialcdf (300,0.53,146) = 0.0742

P ( X = 175) :(You use the binomial pdf.) binomialpdf (300,0.53,175) = 0.0083

In a city, 46 percent of the population favor the incumbent, Dawn Morgan, for mayor. A simple random sample of 500 is taken. Using the continuity correction factor, find the probability that at least 250 favor Dawn Morgan for mayor.

Data from the Wall Street Journal.

“National Health and Nutrition Examination Survey.” Center for Disease Control and Prevention. Available online at http://www.cdc.gov/nchs/nhanes.htm (accessed May 17, 2013).

Concept Review

The central limit theorem can be used to illustrate the law of large numbers. The law of large numbers states that the larger the sample size you take from a population, the closer the sample mean gets to μ .

  • Using the Central Limit Theorem. Provided by : OpenStax. Located at : . License : CC BY: Attribution
  • Introductory Statistics . Authored by : Barbara Illowski, Susan Dean. Provided by : Open Stax. Located at : http://cnx.org/contents/[email protected] . License : CC BY: Attribution . License Terms : Download for free at http://cnx.org/contents/[email protected]

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

9.1: Central Limit Theorem for Bernoulli Trials

  • Last updated
  • Save as PDF
  • Page ID 3163

  • Charles M. Grinstead & J. Laurie Snell
  • Swarthmore College and Dartmouth College via American Mathematical Society

The second fundamental theorem of probability is the Central Limit Theorem. This theorem says that if \(S_n\) is the sum of \(n\) mutually independent random variables, then the distribution function of \(S_n\) is well-approximated by a certain type of continuous function known as a normal density function, which is given by the formula

\[f_{\mu,\sigma}(x) = \frac{1}{\sqrt {2\pi}\sigma}e^{-(x-\mu)^2/(2\sigma^2)}\ ,\]

as we have seen in Chapter 5. In this section, we will deal only with the case that \(\mu = 0\) and \(\sigma = 1\). We will call this particular normal density function the standard normal density, and we will denote it by \(\phi(x)\):

\[\phi(x) = \frac {1}{\sqrt{2\pi}}e^{-x^2/2}\ .\] A graph of this function is given in Figure [fig \(\PageIndex{1}\)]. It can be shown that the area under any normal density equals 1.

clipboard_e3b7791de2ffa96c002ae1f3d6e31a2ee.png

The Central Limit Theorem tells us, quite generally, what happens when we have the sum of a large number of independent random variables each of which contributes a small amount to the total. In this section we shall discuss this theorem as it applies to the Bernoulli trials and in Section 1.2 we shall consider more general processes. We will discuss the theorem in the case that the individual random variables are identically distributed, but the theorem is true, under certain conditions, even if the individual random variables have different distributions.

Bernoulli Trials

Consider a Bernoulli trials process with probability \(p\) for success on each trial. Let \(X_i = 1\) or 0 according as the \(i\)th outcome is a success or failure, and let \(S_n = X_1 + X_2 +\cdots+ X_n\). Then \(S_n\) is the number of successes in \(n\) trials. We know that \(S_n\) has as its distribution the binomial probabilities \(b(n,p,j)\). In Section 3.2, we plotted these distributions for \(p = .3\) and \(p = .5\) for various values of \(n\) (see Figure [fig 3.8]).

We note that the maximum values of the distributions appeared near the expected value \(np\), which causes their spike graphs to drift off to the right as \(n\) increased. Moreover, these maximum values approach 0 as \(n\) increased, which causes the spike graphs to flatten out.

Standardized Sums

We can prevent the drifting of these spike graphs by subtracting the expected number of successes \(np\) from \(S_n\), obtaining the new random variable \(S_n - np\). Now the maximum values of the distributions will always be near 0.

To prevent the spreading of these spike graphs, we can normalize \(S_n - np\) to have variance 1 by dividing by its standard deviation \(\sqrt{npq}\) (see Exercise 6.2.13 and Exercise 6.2.16

Definition: Term

The of \(S_n\) is given by

\[S_n^* = \frac {S_n - np}{\sqrt{npq}}\ .\]

\(S_n^*\) always has expected value 0 and variance 1.

Suppose we plot a spike graph with the spikes placed at the possible values of \(S_n^*\): \(x_0\), \(x_1\), …, \(x_n\), where

\[x_j = \frac {j - np}{\sqrt{npq}}\ \]

We make the height of the spike at \(x_j\) equal to the distribution value \(b(n, p, j)\). An example of this standardized spike graph, with \(n = 270\) and \(p = .3\), is shown in Figure \(\PageIndex{2}\). This graph is beautifully bell-shaped. We would like to fit a normal density to this spike graph. The obvious choice to try is the standard normal density, since it is centered at 0, just as the standardized spike graph is. In this figure, we have drawn this standard normal density. The reader will note that a horrible thing has occurred: Even though the shapes of the two graphs are the same, the heights are quite different.

clipboard_e4f796593e4c10bbc07fc360c076ebfb9.png

If we want the two graphs to fit each other, we must modify one of them; we choose to modify the spike graph. Since the shapes of the two graphs look fairly close, we will attempt to modify the spike graph without changing its shape. The reason for the differing heights is that the sum of the heights of the spikes equals 1, while the area under the standard normal density equals 1. If we were to draw a continuous curve through the top of the spikes, and find the area under this curve, we see that we would obtain, approximately, the sum of the heights of the spikes multiplied by the distance between consecutive spikes, which we will call \(\epsilon\). Since the sum of the heights of the spikes equals one, the area under this curve would be approximately \(\epsilon\). Thus, to change the spike graph so that the area under this curve has value 1, we need only multiply the heights of the spikes by \(1/\epsilon\). It is easy to see from Equation \(\PageIndex{1}\) that

\[\epsilon = \frac {1}{\sqrt {npq}}\ .\]

In Figure \(\PageIndex{2}\) we show the standardized sum \(S^*_n\) for \(n = 270\) and \(p = .3\), after correcting the heights, together with the standard normal density. (This figure was produced with the program CLTBernoulliPlot .) The reader will note that the standard normal fits the height-corrected spike graph extremely well. In fact, one version of the Central Limit Theorem (see Theorem \(\PageIndex{1}\)) says that as \(n\) increases, the standard normal density will do an increasingly better job of approximating the height-corrected spike graphs corresponding to a Bernoulli trials process with \(n\) summands.

clipboard_ecbfa9e1a08a27278f8a1103ff1a3ff22.png

Let us fix a value \(x\) on the \(x\)-axis and let \(n\) be a fixed positive integer. Then, using Equation [eq \(\PageIndex{1}\):], the point \(x_j\) that is closest to \(x\) has a subscript \(j\) given by the formula \[j = \langle np + x \sqrt{npq} \rangle\ ,\] where \(\langle a \rangle\) means the integer nearest to \(a\). Thus the height of the spike above \(x_j\) will be \[\sqrt{npq}\,b(n,p,j) = \sqrt{npq}\,b(n,p,\langle np + x_j \sqrt{npq} \rangle)\ .\] For large \(n\), we have seen that the height of the spike is very close to the height of the normal density at \(x\). This suggests the following theorem.

Theorem \(\PageIndex{1}\)

(Central Limit Theorem for Binomial Distributions) For the binomial distribution \(b(n,p,j)\) we have \[\lim_{n \to \infty} \sqrt{npq}\,b(n,p,\langle np + x\sqrt{npq} \rangle) = \phi(x)\ ,\] where \(\phi(x)\) is the standard normal density.

The proof of this theorem can be carried out using Stirling’s approximation from Section 3.1. We indicate this method of proof by considering the case \(x = 0\). In this case, the theorem states that \[\lim_{n \to \infty} \sqrt{npq}\,b(n,p,\langle np \rangle) = \frac 1{\sqrt{2\pi}} = .3989\ldots\ .\] In order to simplify the calculation, we assume that \(np\) is an integer, so that \(\langle np \rangle = np\). Then \[\sqrt{npq}\,b(n,p,np) = \sqrt{npq}\,p^{np}q^{nq} \frac {n!}{(np)!\,(nq)!}\ .\] Recall that Stirling’s formula (see Theorem 3.3) states that \[n! \sim \sqrt{2\pi n}\,n^n e^{-n} \qquad \mbox {as \,\,\,} n \to \infty\ .\] Using this, we have \[\sqrt{npq}\,b(n,p,np) \sim \frac {\sqrt{npq}\,p^{np}q^{nq} \sqrt{2\pi n}\,n^n e^{-n}}{\sqrt{2\pi np} \sqrt{2\pi nq}\,(np)^{np} (nq)^{nq} e^{-np} e^{-nq}}\ ,\] which simplifies to \(1/\sqrt{2\pi}\).

Approximating Binomial Distributions

We can use Theorem \(\PageIndex{1}\) to find approximations for the values of binomial distribution functions. If we wish to find an approximation for \(b(n, p, j)\), we set \[j = np + x\sqrt{npq}\] and solve for \(x\), obtaining \[x = {\frac{j-np}{\sqrt{npq}}}\ .\]

Theorem \(\PageIndex{1}\) then says that

\[\sqrt{npq} ,b(n,p,j)\]

is approximately equal to \(\phi(x)\), so \[\begin{align} b(n,p,j) &\approx& {\frac{\phi(x)}{\sqrt{npq}}}\\ &=& {\frac{1}{\sqrt{npq}}} \phi\biggl({\frac{j-np}{\sqrt{npq}}}\biggr) \end{align}\]

Example \(\PageIndex{1}\)

Let us estimate the probability of exactly 55 heads in 100 tosses of a coin. For this case \(np = 100 \cdot 1/2 = 50\) and \(\sqrt{npq} = \sqrt{100 \cdot 1/2 \cdot 1/2} = 5\). Thus \(x_{55} = (55 - 50)/5 = 1\) and

\[\begin{align} P(S_{100} = 55) \sim \frac{\phi(1)}{5} &=& \frac{1}{5} \left( \frac{1}{\sqrt{2\pi}}e^{-1/2} \right) \\ &=& .0484 \end{align}\]

To four decimal places, the actual value is .0485, and so the approximation is very good.

The program CLTBernoulliLocal illustrates this approximation for any choice of \(n\), \(p\), and \(j\). We have run this program for two examples. The first is the probability of exactly 50 heads in 100 tosses of a coin; the estimate is .0798, while the actual value, to four decimal places, is .0796. The second example is the probability of exactly eight sixes in 36 rolls of a die; here the estimate is .1093, while the actual value, to four decimal places, is .1196.

The individual binomial probabilities tend to 0 as \(n\) tends to infinity. In most applications we are not interested in the probability that a specific outcome occurs, but rather in the probability that the outcome lies in a given interval, say the interval \([a, b]\). In order to find this probability, we add the heights of the spike graphs for values of \(j\) between \(a\) and \(b\). This is the same as asking for the probability that the standardized sum \(S_n^*\) lies between \(a^*\) and \(b^*\), where \(a^*\) and \(b^*\) are the standardized values of \(a\) and \(b\). But as \(n\) tends to infinity the sum of these areas could be expected to approach the area under the standard normal density between \(a^*\) and \(b^*\). The states that this does indeed happen.

Theorem \(\PageIndex{2}\)

Ce ntral Limit Theorem for Bernoulli Trials) Let \(S_n\) be the number of successes in \(n\) Bernoulli trials with probability \(p\) for success, and let \(a\) and \(b\) be two fixed real numbers. Then \[\lim_{n \rightarrow \infty} P\biggl(a \le \frac{S_n - np}{\sqrt{npq}} \le b\biggr) = \int_a^b \phi(x)\,dx\ .\]

This theorem can be proved by adding together the approximations to \(b(n,p,k)\) given in Theorem 9.1.1.

We know from calculus that the integral on the right side of this equation is equal to the area under the graph of the standard normal density \(\phi(x)\) between \(a\) and \(b\). We denote this area by \(NA(a^*, b^*)\). Unfortunately, there is no simple way to integrate the function \(e^{-x^2/2}\), and so we must either use a table of values or else a numerical integration program. (See Figure [tabl 9.1] for values of \(\NA(0, z)\). A more extensive table is given in Appendix A.)

It is clear from the symmetry of the standard normal density that areas such as that between \(-2\) and 3 can be found from this table by adding the area from 0 to 2 (same as that from \(-2\) to 0) to the area from 0 to 3.

Approximation of Binomial Probabilities

Suppose that \(S_n\) is binomially distributed with parameters \(n\) and \(p\). We have seen that the above theorem shows how to estimate a probability of the form \[P(i \le S_n \le j)\ , \label{eq 9.2}\] where \(i\) and \(j\) are integers between 0 and \(n\). As we have seen, the binomial distribution can be represented as a spike graph, with spikes at the integers between 0 and \(n\), and with the height of the \(k\)th spike given by \(b(n, p, k)\). For moderate-sized values of \(n\), if we standardize this spike graph, and change the heights of its spikes, in the manner described above, the sum of the heights of the spikes is approximated by the area under the standard normal density between \(i^*\) and \(j^*\).

Table  \(\PageIndex{1}\) of values of NA(0, z), the normal area from 0 to z

clipboard_ecf92e83c1f57e6ffc970acf314883a3f.png

It turns out that a slightly more accurate approximation is afforded by the area under the standard normal density between the standardized values corresponding to \((i - 1/2)\) and \((j + 1/2)\); these values are

\[i^* = \frac{i - 1/2 - np}{\sqrt {npq}}\] and \[j^* = \frac{j + 1/2 - np}{\sqrt {npq}}\ .\] Thus, \[P(i \le S_n \le j) \approx \NA\Biggl({\frac{i - \frac{1}{2} - np}{\sqrt {npq}}} , {\frac{j + {\frac{1}{2}} - np}{\sqrt {npq}}}\Biggr)\ .\]

It should be stressed that the approximations obtained by using the Central Limit Theorem are only approximations, and sometimes they are not very close to the actual values (see Exercise 9.2.111).

We now illustrate this idea with some examples.

Example \(\PageIndex{2}\)

A coin is tossed 100 times. Estimate the probability that the number of heads lies between 40 and 60 (the word “between" in mathematics means inclusive of the endpoints). The expected number of heads is \(100 \cdot 1/2 = 50\), and the standard deviation for the number of heads is \(\sqrt{100 \cdot 1/2 \cdot 1/2} = 5\). Thus, since \(n = 100\) is reasonably large, we have \[\begin{aligned} P(40 \le S_n \le 60) &\approx& P\left( \frac {39.5 - 50}5 \le S_n^* \le \frac {60.5 - 50}5 \right) \\ &=& P(-2.1 \le S_n^* \le 2.1) \\ &\approx& \NA(-2.1,2.1) \\ &=& 2\NA(0,2.1) \\ &\approx& .9642\ . \end{aligned}\] The actual value is .96480, to five decimal places.

Note that in this case we are asking for the probability that the outcome will not deviate by more than two standard deviations from the expected value. Had we asked for the probability that the number of successes is between 35 and 65, this would have represented three standard deviations from the mean, and, using our 1/2 correction, our estimate would be the area under the standard normal curve between \(-3.1\) and 3.1, or \(2\NA(0,3.1) = .9980\). The actual answer in this case, to five places, is .99821.

It is important to work a few problems by hand to understand the conversion from a given inequality to an inequality relating to the standardized variable. After this, one can then use a computer program that carries out this conversion, including the 1/2 correction. The program CLTBernoulliGlobal is such a program for estimating probabilities of the form \(P(a \leq S_n \leq b)\).

Example \(\PageIndex{3}\)

Dartmouth College would like to have 1050 freshmen. This college cannot accommodate more than 1060. Assume that each applicant accepts with probability .6 and that the acceptances can be modeled by Bernoulli trials. If the college accepts 1700, what is the probability that it will have too many acceptances?

If it accepts 1700 students, the expected number of students who matriculate is \(.6 \cdot 1700 = 1020\). The standard deviation for the number that accept is \(\sqrt{1700 \cdot .6 \cdot .4} \approx 20\). Thus we want to estimate the probability \[\begin{aligned} P(S_{1700} > 1060) &=& P(S_{1700} \ge 1061) \\ &=& P\left( S_{1700}^* \ge \frac {1060.5 - 1020}{20} \right) \\ &=& P(S_{1700}^* \ge 2.025)\ .\end{aligned}\]

From Table [tabl 9.1], if we interpolate, we would estimate this probability to be \(.5 - .4784 = .0216\). Thus, the college is fairly safe using this admission policy.

Applications to Statistics

There are many important questions in the field of statistics that can be answered using the Central Limit Theorem for independent trials processes. The following example is one that is encountered quite frequently in the news. Another example of an application of the Central Limit Theorem to statistics is given in Section 1.2.

Example \(\PageIndex{4}\)

One frequently reads that a poll has been taken to estimate the proportion of people in a certain population who favor one candidate over another in a race with two candidates. (This model also applies to races with more than two candidates \(A\) and \(B\), and two ballot propositions.) Clearly, it is not possible for pollsters to ask everyone for their preference. What is done instead is to pick a subset of the population, called a sample, and ask everyone in the sample for their preference. Let \(p\) be the actual proportion of people in the population who are in favor of candidate \(A\) and let \(q = 1-p\). If we choose a sample of size \(n\) from the population, the preferences of the people in the sample can be represented by random variables \(X_1,\ X_2,\ \ldots,\ X_n\), where \(X_i = 1\) if person \(i\) is in favor of candidate \(A\), and \(X_i = 0\) if person \(i\) is in favor of candidate \(B\). Let \(S_n = X_1 + X_2 + \cdots + X_n\). If each subset of size \(n\) is chosen with the same probability, then \(S_n\) is hypergeometrically distributed. If \(n\) is small relative to the size of the population (which is typically true in practice), then \(S_n\) is approximately binomially distributed, with parameters \(n\) and \(p\).

The pollster wants to estimate the value \(p\). An estimate for \(p\) is provided by the value \(\bar p = S_n/n\), which is the proportion of people in the sample who favor candidate \(B\). The Central Limit Theorem says that the random variable \(\bar p\) is approximately normally distributed. (In fact, our version of the Central Limit Theorem says that the distribution function of the random variable

\[S_n^* = \frac{S_n - np}{\sqrt{npq}}\]

is approximated by the standard normal density.) But we have

\[\bar p = \frac{S_n - np}{\sqrt {npq}}\sqrt{\frac{pq}{n}}+p\ ,\]

i.e., \(\bar p\) is just a linear function of \(S_n^*\). Since the distribution of \(S_n^*\) is approximated by the standard normal density, the distribution of the random variable \(\bar p\) must also be bell-shaped. We also know how to write the mean and standard deviation of \(\bar p\) in terms of \(p\) and \(n\). The mean of \(\bar p\) is just \(p\), and the standard deviation is

\[\sqrt{\frac{pq}{n}}\ .\]

Thus, it is easy to write down the standardized version of \(\bar p\); it is

\[\bar p^* = \frac{\bar p - p}{\sqrt{pq/n}}\ .\]

Since the distribution of the standardized version of \(\bar p\) is approximated by the standard normal density, we know, for example, that 95% of its values will lie within two standard deviations of its mean, and the same is true of \(\bar p\). So we have

\[P\left(p - 2\sqrt{\frac{pq}{n}} < \bar p < p + 2\sqrt{\frac{pq}{n}}\right) \approx .954\ .\]

Now the pollster does not know \(p\) or \(q\), but he can use \(\bar p\) and \(\bar q = 1 - \bar p\) in their place without too much danger. With this idea in mind, the above statement is equivalent to the statement

\[P\left(\bar p - 2\sqrt{\frac{\bar p \bar q}{n}} < p < \bar p + 2\sqrt{\frac{\bar p \bar q}{n}}\right) \approx .954\ .\]

The resulting interval

\[\left( \bar p - \frac {2\sqrt{\bar p \bar q}}{\sqrt n},\ \bar p + \frac {2\sqrt{\bar p \bar q}}{\sqrt n} \right)\]

is called the for the unknown value of \(p\). The name is suggested by the fact that if we use this method to estimate \(p\) in a large number of samples we should expect that in about 95 percent of the samples the true value of \(p\) is contained in the confidence interval obtained from the sample. In Exercise \(\PageIndex{11}\) you are asked to write a program to illustrate that this does indeed happen.

The pollster has control over the value of \(n\). Thus, if he wants to create a 95% confidence interval with length 6%, then he should choose a value of \(n\) so that

\[\frac {2\sqrt{\bar p \bar q}}{\sqrt n} \le .03\ .\]

Using the fact that \(\bar p \bar q \le 1/4\), no matter what the value of \(\bar p\) is, it is easy to show that if he chooses a value of \(n\) so that

\[\frac{1}{\sqrt n} \le .03\ ,\]

he will be safe. This is equivalent to choosing

\[n \ge 1111\ .\]

clipboard_e9f29f683acb5313d23fd1798661ca6ba.png

So if the pollster chooses \(n\) to be 1200, say, and calculates \(\bar p\) using his sample of size 1200, then 19 times out of 20 (i.e., 95% of the time), his confidence interval, which is of length 6%, will contain the true value of \(p\). This type of confidence interval is typically reported in the news as follows: this survey has a 3% margin of error. In fact, most of the surveys that one sees reported in the paper will have sample sizes around 1000. A somewhat surprising fact is that the size of the population has apparently no effect on the sample size needed to obtain a 95% confidence interval for \(p\) with a given margin of error. To see this, note that the value of \(n\) that was needed depended only on the number .03, which is the margin of error. In other words, whether the population is of size 100 , 000 or 100 , 000 , 000, the pollster needs only to choose a sample of size 1200 or so to get the same accuracy of estimate of \(p\). (We did use the fact that the sample size was small relative to the population size in the statement that \(S_n\) is approximately binomially distributed.)

In Figure [fig \(\PageIndex{1}\)], we show the results of simulating the polling process. The population is of size 100 , 000, and for the population, \(p = .54\). The sample size was chosen to be 1200. The spike graph shows the distribution of \(\bar p\) for 10 , 000 randomly chosen samples. For this simulation, the program kept track of the number of samples for which \(\bar p\) was within 3% of .54. This number was 9648, which is close to 95% of the number of samples used.

Another way to see what the idea of confidence intervals means is shown in Figure [fig \(\PageIndex{6}\) ] . In this figure, we show 100 confidence intervals, obtained by computing \(\bar p\) for 100 different samples of size 1200 from the same population as before. The reader can see that most of these confidence intervals (96, to be exact) contain the true value of \(p\).

clipboard_e17b8814dcbbac8099ffe0205d0a746ff.png

The Gallup Poll has used these polling techniques in every Presidential election since 1936 (and in innumerable other elections as well). Table [table \(\PageIndex{1}\)] 1 shows the results of their efforts. The reader will note that most of the approximations to \(p\) are within 3% of the actual value of \(p\). The sample sizes for these polls were typically around 1500. (In the table, both the predicted and actual percentages for the winning candidate refer to the percentage of the vote among the “major" political parties. In most elections, there were two major parties, but in several elections, there were three.)

This technique also plays an important role in the evaluation of the effectiveness of drugs in the medical profession. For example, it is sometimes desired to know what proportion of patients will be helped by a new drug. This proportion can be estimated by giving the drug to a subset of the patients, and determining the proportion of this sample who are helped by the drug.

Historical Remarks

The Central Limit Theorem for Bernoulli trials was first proved by Abrahamde Moivre and appeared in his book, first published in 1718. 2

De Moivre spent his years from age 18 to 21 in prison in France because of his Protestant background. When he was released he left France for England, where he worked as a tutor to the sons of noblemen. Newton had presented a copy of his to the Earl of Devonshire. The story goes that, while de Moivre was tutoring at the Earl’s house, he came upon Newton’s work and found that it was beyond him. It is said that he then bought a copy of his own and tore it into separate pages, learning it page by page as he walked around London to his tutoring jobs. De Moivre frequented the coffeehouses in London, where he started his probability work by calculating odds for gamblers. He also met Newton at such a coffeehouse and they became fast friends. De Moivre dedicated his book to Newton.

Confidence interval simulation.provides the techniques for solving a wide variety of gambling problems. In the midst of these gambling problems de Moivre rather modestly introduces his proof of the Central Limit Theorem, writing

A Method of approximating the Sum of the Terms of the Binomial \((a + b)^n\) expanded into a Series, from whence are deduced some practical Rules to estimate the Degree of Assent which is to be given to Experiments. 3

De Moivre’s proof used the approximation to factorials that we now call Stirling’s formula. De Moivre states that he had obtained this formula before Stirling but without determining the exact value of the constant \(\sqrt{2\pi}\). While he says it is not really necessary to know this exact value, he concedes that knowing it “has spread a singular Elegancy on the Solution."

The complete proof and an interesting discussion of the life of de Moivre can be found in the book by F. N. David. 4

Exercise \(\PageIndex{1}\):  

Let \(S_{100}\) be the number of heads that turn up in 100 tosses of a fair coin. Use the Central Limit Theorem to estimate

  • \(P(S_{100} \leq 45)\).
  • \(P(45 < S_{100} < 55)\).
  • \(P(S_{100} > 63)\).
  • \(P(S_{100} < 57)\).

Exercise \(\PageIndex{2}\):  

Let \(S_{200}\) be the number of heads that turn up in 200 tosses of a fair coin. Estimate

  • \(P(S_{200} = 100)\).
  • \(P(S_{200} = 90)\).
  • \(P(S_{200} = 80)\).

Exercise \(\PageIndex{3}\):  

A true-false examination has 48 questions. June has probability 3/4 of answering a question correctly. April just guesses on each question. A passing score is 30 or more correct answers. Compare the probability that June passes the exam with the probability that April passes it.

Exercise \(\PageIndex{4}\):  

Let \(S\) be the number of heads in 1 , 000 , 000 tosses of a fair coin. Use (a) Chebyshev’s inequality, and (b) the Central Limit Theorem, to estimate the probability that \(S\) lies between 499 , 500 and 500 , 500. Use the same two methods to estimate the probability that \(S\) lies between 499 , 000 and 501 , 000, and the probability that \(S\) lies between 498 , 500 and 501 , 500.

Exercise \(\PageIndex{5}\):  

A rookie is brought to a baseball club on the assumption that he will have a .300 batting average. (Batting average is the ratio of the number of hits to the number of times at bat.) In the first year, he comes to bat 300 times and his batting average is .267. Assume that his at bats can be considered Bernoulli trials with probability .3 for success. Could such a low average be considered just bad luck or should he be sent back to the minor leagues? Comment on the assumption of Bernoulli trials in this situation.

Exercise \(\PageIndex{6}\):  

Once upon a time, there were two railway trains competing for the passenger traffic of 1000 people leaving from Chicago at the same hour and going to Los Angeles. Assume that passengers are equally likely to choose each train. How many seats must a train have to assure a probability of .99 or better of having a seat for each passenger?

Exercise \(\PageIndex{7}\):

Dartmouth admits 1750 students. What is the probability of too many acceptances?

Exercise \(\PageIndex{8}\):  

A club serves dinner to members only. They are seated at 12-seat tables. The manager observes over a long period of time that 95 percent of the time there are between six and nine full tables of members, and the remainder of the time the numbers are equally likely to fall above or below this range. Assume that each member decides to come with a given probability \(p\), and that the decisions are independent. How many members are there? What is \(p\)?

Exercise \(\PageIndex{9}\):  

Let \(S_n\) be the number of successes in \(n\) Bernoulli trials with probability .8 for success on each trial. Let \(A_n = S_n/n\) be the average number of successes. In each case give the value for the limit, and give a reason for your answer.

  • \(\lim_{n \to \infty} P(A_n = .8)\).
  • \(\lim_{n \to \infty} P(.7n < S_n < .9n)\).
  • \(\lim_{n \to \infty} P(S_n < .8n + .8\sqrt n)\).
  • \(\lim_{n \to \infty} P(.79 < A_n < .81)\).

Exercise \(\PageIndex{10}\):  

Find the probability that among 10 , 000 random digits the digit 3 appears not more than 931 times.

Exercise \(\PageIndex{11}\):  

Write a computer program to simulate 10 , 000 Bernoulli trials with probability .3 for success on each trial. Have the program compute the 95 percent confidence interval for the probability of success based on the proportion of successes. Repeat the experiment 100 times and see how many times the true value of .3 is included within the confidence limits.

Exercise \(\PageIndex{12}\):  

A balanced coin is flipped 400 times. Determine the number \(x\) such that the probability that the number of heads is between \(200 - x\) and \(200 + x\) is approximately .80.

Exercise \(\PageIndex{13}\):  

A noodle machine in Spumoni’s spaghetti factory makes about 5 percent defective noodles even when properly adjusted. The noodles are then packed in crates containing 1900 noodles each. A crate is examined and found to contain 115 defective noodles. What is the approximate probability of finding at least this many defective noodles if the machine is properly adjusted?

Exercise \(\PageIndex{14}\):  

A restaurant feeds 400 customers per day. On the average 20 percent of the customers order apple pie.

  • Give a range (called a 95 percent confidence interval) for the number of pieces of apple pie ordered on a given day such that you can be 95 percent sure that the actual number will fall in this range.
  • How many customers must the restaurant have, on the average, to be at least 95 percent sure that the number of customers ordering pie on that day falls in the 19 to 21 percent range?

Exercise \(\PageIndex{15}\):  

Recall that if \(X\) is a random variable, the of \(X\) is the function \(F(x)\) defined by \[F(x) = P(X \leq x)\ .\]

  • Let \(S_n\) be the number of successes in \(n\) Bernoulli trials with probability \(p\) for success. Write a program to plot the cumulative distribution for \(S_n\).
  • Modify your program in (a) to plot the cumulative distribution \(F_n^*(x)\) of the standardized random variable \[S_n^* = \frac {S_n - np}{\sqrt{npq}}\ .\]
  • Define the \(N(x)\) to be the area under the normal curve up to the value \(x\). Modify your program in (b) to plot the normal distribution as well, and compare it with the cumulative distribution of \(S_n^*\). Do this for \(n = 10, 50\), and \(100\).

Exercise \(\PageIndex{16}\):  

In Example 3.12, we were interested in testing the hypothesis that a new form of aspirin is effective 80 percent of the time rather than the 60 percent of the time as reported for standard aspirin. The new aspirin is given to \(n\) people. If it is effective in \(m\) or more cases, we accept the claim that the new drug is effective 80 percent of the time and if not we reject the claim. Using the Central Limit Theorem, show that you can choose the number of trials \(n\) and the critical value \(m\) so that the probability that we reject the hypothesis when it is true is less than .01 and the probability that we accept it when it is false is also less than .01. Find the smallest value of \(n\) that will suffice for this.

Exercise \(\PageIndex{17}\):  

In an opinion poll it is assumed that an unknown proportion \(p\) of the people are in favor of a proposed new law and a proportion \(1-p\) are against it. A sample of \(n\) people is taken to obtain their opinion. The proportion \({\bar p}\) in favor in the sample is taken as an estimate of \(p\). Using the Central Limit Theorem, determine how large a sample will ensure that the estimate will, with probability .95, be correct to within .01.

Exercise \(\PageIndex{18}\):  

A description of a poll in a certain newspaper says that one can be 95% confident that error due to sampling will be no more than plus or minus 3 percentage points. A poll in the New York Times taken in Iowa says that “according to statistical theory, in 19 out of 20 cases the results based on such samples will differ by no more than 3 percentage points in either direction from what would have been obtained by interviewing all adult Iowans." These are both attempts to explain the concept of confidence intervals. Do both statements say the same thing? If not, which do you think is the more accurate description?

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Mathematics LibreTexts

6: The Central Limit Theorems

  • Last updated
  • Save as PDF
  • Page ID 125699

In this chapter, you will study means and the central limit theorem , which is one of the most powerful and useful ideas in all of statistics. There are two alternative forms of the theorem, and both alternatives are concerned with drawing finite samples size n from a population with a known mean, \(\mu\), and a known standard deviation, \(\sigma\). The first alternative says that if we collect samples of size \(n\) with a "large enough \(n\)," calculate each sample's mean, and create a histogram of those means, then the resulting histogram will tend to have an approximate normal bell shape. The second alternative says that if we again collect samples of size \(n\) that are "large enough," calculate the sum of each sample and create a histogram, then the resulting histogram will again tend to have a normal bell-shape.

  • 6.1: Prelude to the Central Limit Theorem The central limit theorem states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed.
  • 6.2E: The Central Limit Theorem for Sample Means (Exercises)
  • 6.3: The Central Limit Theorem for Sample Proportions The central limit theorem tells us that for a population with any distribution, the distribution of the sums for the sample means approaches a normal distribution as the sample size increases. In other words, if the sample size is large enough, the distribution of the sums can be approximated by a normal distribution even if the original population is not normally distributed.
  • 6.4E: Using the Central Limit Theorem (Exercises)
  • 6.5: Central Limit Theorem - Pocket Change (Worksheet) A statistics Worksheet: The student will demonstrate and compare properties of the central limit theorem.
  • 6.6: Central Limit Theorem - Cookie Recipes (Worksheet) A statistics Worksheet: The student will demonstrate and compare properties of the central limit theorem.
  • 6.E: The Central Limit Theorem (Exercises) These are homework exercises to accompany the Textmap created for "Introductory Statistics" by OpenStax. Complementary General Chemistry question banks can be found for other Textmaps and can be accessed here. In addition to these publicly available questions, access to private problems bank for use in exams and homework is available to faculty only on an individual basis; please contact Delmar Larsen for an account with access permission.

Template:ContribOpenStax

IMAGES

  1. Central Limit Theorem Problem

    enumerate the steps in solving problem using central limit theorem brainly

  2. Estimating Mean Using Central Limit Theorem

    enumerate the steps in solving problem using central limit theorem brainly

  3. The central limit theorem

    enumerate the steps in solving problem using central limit theorem brainly

  4. Understanding Central Limit Theorem With An Example

    enumerate the steps in solving problem using central limit theorem brainly

  5. [Solved] A 1.Can you help me explain the usefulness of Central Limit

    enumerate the steps in solving problem using central limit theorem brainly

  6. Applying the central limit theorem to find probability example 3

    enumerate the steps in solving problem using central limit theorem brainly

COMMENTS

  1. II. Enumerate the steps in solving problem using Central Limit Theorem

    Answer: A Central Limit Theorem word problem will most likely contain the phrase "assume the variable is normally distributed", or one like it. Step-by-step explanation: 1. General Steps Step 1: Identify the parts of the problem. Your question should state: the mean (average or μ) the standard deviation (σ) population size sample size (n)

  2. 7.3: Using the Central Limit Theorem

    OpenStax. It is important for you to understand when to use the central limit theorem (clt). If you are being asked to find the probability of the mean, use the clt for the mean. If you are being asked to find the probability of a sum or total, use the clt for sums. This also applies to percentiles for means and sums.

  3. Central Limit Theorem with Examples and Solutions

    1) If the population has a normal distribution, the central limit theorem holds even for smaller sample size n n. 2) The central limit theorem also holds for populations with binomial distributions as long as n(1 − p) ≥ 5 n ( 1 − p) ≥ 5. Sampling Distributions We present an example with small samples ( n = 2

  4. Central Limit Theorem

    Published on July 6, 2022 by Shaun Turney . Revised on June 22, 2023. The central limit theorem states that if you take sufficiently large samples from a population, the samples' means will be normally distributed, even if the population isn't normally distributed. Example: Central limit theorem

  5. 7.3 Using the Central Limit Theorem

    7.3 σ X = ( b - a) 2 12 = ( 5 - 1) 2 12 = 1 .15

  6. 7.E: The Central Limit Theorem (Exercises)

    The central limit theorem can be used to illustrate the law of large numbers. The law of large numbers states that the larger the sample size you take from a population, the closer the sample mean \(\bar{x}\) gets to \(\mu\). Use the following information to answer the next ten exercises: A manufacturer produces 25-pound lifting weights. The ...

  7. 12.1: The Central Limit Theorem

    The Central Limit Theorem tells us that: 1) the new random variable, X1 + X2 + … + Xn n = ¯ Xn will approximately be N(μ, σ2 n). 2) the new random variable, X1 + X2 + … + Xn will be approximately N(nμ, nσ2). Additionally, notice how general the Central Limit Theorem is! We are saying the distribution of X1, X2, X3, …, Xn can be ...

  8. 7.4: Using the Central Limit Theorem

    The central limit theorem illustrates the law of large numbers. Example 7.4.1. A study involving stress is conducted among the students on a college campus. The stress scores follow a uniform distribution with the lowest stress score equal to one and the highest equal to five. Using a sample of 75 students, find:

  9. 5.3: The Central Limit Theorem for Sums

    The central limit theorem for sums says that if you keep drawing larger and larger samples and taking their sums, the sums form their own normal distribution (the sampling distribution), which approaches a normal distribution as the sample size increases. The normal distribution has a mean equal to the original mean multiplied by the sample ...

  10. Using the Central Limit Theorem

    The central limit theorem illustrates the law of large numbers. Central Limit Theorem for the Mean and Sum Examples. A study involving stress is conducted among the students on a college campus. The stress scores follow a uniform distribution with the lowest stress score equal to one and the highest equal to five. Using a sample of 75 students ...

  11. 7.2: The Central Limit Theorem for Sums

    The central limit theorem for sums says that if you keep drawing larger and larger samples and taking their sums, the sums form their own normal distribution (the sampling distribution), which approaches a normal distribution as the sample size increases. The normal distribution has a mean equal to the original mean multiplied by the sample ...

  12. 7: The Central Limit Theorem

    7.2: The Central Limit Theorem for Sums. The central limit theorem tells us that for a population with any distribution, the distribution of the sums for the sample means approaches a normal distribution as the sample size increases. In other words, if the sample size is large enough, the distribution of the sums can be approximated by a normal ...

  13. Formulas

    Central Limit Theorem (CLT) states that the sampling distribution of the sample means approaches a normal distribution as the sample size is larger. ... Steps. The steps used to solve the problem of the central limit theorem that are either involving '>' '<' or "between" are as follows: 1) The information about the mean, population ...

  14. Central Limit Theorem: Definition, Formula, Derivation & Examples

    Central Limit Theorem in Statistics states that as the sample size increases and its variance is finite, then the distribution of the sample mean approaches normal distribution irrespective of the shape of the population distribution.

  15. Central Limit Theorem: a real-life application

    The Central Limit Theorem (CLT) is one of the most popular theorems in statistics and it's very useful in real world problems. In this article we'll see why the Central Limit Theorem is so useful and how to apply it. In a lot of situations where you use statistics, the ultimate goal is to identify the characteristics of a population.

  16. 7: The Central Limit Theorem

    The central limit theorem can be used to illustrate the law of large numbers. The law of large numbers states that the larger the sample size you take from a population, the closer the sample mean <x> gets to μ . The central limit theorem illustrates the law of large numbers. 7.3E: Using the Central Limit Theorem (Exercises) 7.4: Central Limit ...

  17. 7.3 Using the Central Limit Theorem

    Examples of the Central Limit Theorem Law of Large Numbers. The law of large numbers says that if you take samples of larger and larger size from any population, then the mean x ¯ x ¯ of the sample tends to get closer and closer to μ.From the central limit theorem, we know that as n gets larger and larger, the sample means follow a normal distribution. . The larger n gets, the smaller the ...

  18. Numerate the step in solving problem using central limit theorem

    answer answered Numerate the step in solving problem using central limit theorem Answer No one rated this answer yet — why not be the first? 😎 gwynindino report flag outlined Answer: Compute 1. population mean 2. population variance 3. population standard deviation

  19. 7.1 The Central Limit Theorem for Sample Means

    The central limit theorem for sample means says that if you keep drawing larger and larger samples and calculating their means, the sample means form their own normal distribution. The normal distribution has the same mean as the original distribution and a variance that equals the original variance divided by the sample size. μx = μ.

  20. Using the Central Limit Theorem

    The central limit theorem illustrates the law of large numbers. Central Limit Theorem for the Mean and Sum Examples. A study involving stress is conducted among the students on a college campus. The stress scores follow a uniform distribution with the lowest stress score equal to one and the highest equal to five. Using a sample of 75 students ...

  21. Using Central Limit Theorem

    Modified 12 years, 11 months ago. Viewed 1k times. 5. Can anyone help me with it: Using the central limit theorem for suitable Poisson random variables, prove that. limn→∞e−n∑k=0n nk k! = 1/2 lim n → ∞ e − n ∑ k = 0 n n k k! = 1 / 2. Thanks!

  22. 9.1: Central Limit Theorem for Bernoulli Trials

    The states that this does indeed happen. Theorem 9.1.2. Central Limit Theorem for Bernoulli Trials) Let Sn be the number of successes in n Bernoulli trials with probability p for success, and let a and b be two fixed real numbers. Then lim n → ∞P(a ≤ Sn − np √npq ≤ b) = ∫b aϕ(x)dx . Proof.

  23. 6: The Central Limit Theorems

    The central limit theorem can be used to illustrate the law of large numbers. The law of large numbers states that the larger the sample size you take from a population, the closer the sample mean <x> gets to μ . The central limit theorem illustrates the law of large numbers. 6.4E: Using the Central Limit Theorem (Exercises) 6.5: Central Limit ...