Statistics
Random variables
A numerical description of the outcome of a random experiment is called a random variable. It can be either discrete or continuous.
Random variables are denoted with capital letters
The possible outcomes are denoted with regular letters
Probability that the outcome of is is denoted by
gives the expected value, which represents the mean value (outcome) of the random variable.
gives the variance, which is a measure of the variability of the random variable's outcomes.
, which any addition/subtraction function within can be expanded.
If and are independent:
For :
Probability distributions
A continuous random variable has probabilities as the area under its curve. Hence, for any outcome is . A discrete random variable has specific probabilities assigned to an outcome.
Probability distribution functions
A p.d.f is a continuous function that returns the probability of the given outcome. The following is an example of a p.d.f:
Note that , as the total probability of any event is 1. This condition must be true for to be a valid p.d.f. Hence for the example
Cumulative distribution function
To obtain a c.d.f where , all we have to do is integrate the p.d.f:
Using our example:
And to convert a c.d.f back to its p.d.f, all we have to do is differentiate :
Normal distribution
The Normal distribution is given by the following formula (which you don't have to memorize):
To calculate , we have to standardize our by , . Here is the standard normal variable.
For , to find , locate the header and leftmost column in the z-table such that their sum is . The corresponding intersecting cell is .
gives the value in
Sampling
For a given population of size , is the true mean and is the true variance.
A sample is a subset of the population of size . The sample mean is and the sample variance is . They can be calculated as:
The statistics for each random sample observation of the population are random.
The sampling distribution is the distribution of the statistics of the random samples.
The random variable is the mean of a random sample of observations if:
- The observations are independent.
- The observations are identically distributed.
- .
If is normally distributed:
The standard error captures how off the sample statistics are from the true population value. The value decreases as n increases.
The central limit theorem states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution.
Hypothesis testing
Test Statistic: Result that is calculated from the sample
Null Hypothesis : Hypothesis assumed correct
Alternative Hypothesis : Parameter if assumption shown wrong
- Define null and alternative hypotheses
- Upper tail test:
- Lower tail test:
- Two-tailed test:
- is the opposite of
- Identify variables , and OR
- Calculate statistic value:
- Given standard deviation : z-score
- Given sample standard deviation : t-score
- Calculate p-value (use corresponding table)
- Upper tail: or
- Lower tail: or
- Two-tailed: or
- Compare p-value with :
- If , reject (evidence against )
- If , do not reject (no evidence against )
- State conclusion
Note that the p-value does not tell the chance that is true or false, but rather the chance of observing the sample data if is true.