Statistics Cheat Sheet

1. Descriptive Statistics

1.1 Measures of Central Tendency

Measures of central tendency include the mean, median, and mode, which summarize the center of a data set:

Example:

Find the mean, median, and mode of the data set: 2, 3, 5, 5, 7, 8, 9.

1.2 Measures of Dispersion

Measures of dispersion include the range, variance, and standard deviation, which describe the spread of the data:

Example:

Calculate the range, variance, and standard deviation of the data set: 2, 4, 4, 4, 5, 5, 7, 9.

2. Probability Theory

2.1 Probability Basics

Probability measures the likelihood of an event occurring, expressed as:

\[ P(A) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}} \]

Example:

What is the probability of rolling a sum of 7 with two six-sided dice?

2.2 Conditional Probability

Conditional probability is the probability of an event occurring given that another event has already occurred, denoted as \( P(A \mid B) \):

\[ P(A \mid B) = \frac{P(A \cap B)}{P(B)} \]

Example:

If the probability of event A is 0.4 and the probability of event B is 0.5, with \( P(A \cap B) = 0.2 \), what is \( P(A \mid B) \)?

3. Probability Distributions

3.1 Binomial Distribution

The binomial distribution models the number of successes in a fixed number of independent trials, with a success probability \( p \) in each trial. The probability of \( k \) successes in \( n \) trials is:

\[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \]

Example:

What is the probability of getting exactly 3 heads in 5 flips of a fair coin?

3.2 Normal Distribution

The normal distribution is a continuous probability distribution that is symmetric around the mean. The probability density function is:

\[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]

Example:

If a dataset has a mean \( \mu = 100 \) and standard deviation \( \sigma = 15 \), what is the probability that a value is between 85 and 115?

3.3 Poisson Distribution

The Poisson distribution models the number of events occurring within a fixed interval of time or space, with a given mean rate \( \lambda \). The probability of observing \( k \) events is:

\[ P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!} \]

Example:

If a website receives an average of 4 hits per minute, what is the probability of receiving exactly 6 hits in a minute?

4. Confidence Intervals

4.1 Confidence Interval for the Mean

A confidence interval for the mean \( \mu \) is an interval estimate, which provides a range of values within which the true mean is likely to fall. For a sample mean \( \bar{x} \) and standard deviation \( s \), the 95% confidence interval is:

\[ \bar{x} \pm t_{\alpha/2, n-1} \frac{s}{\sqrt{n}} \]

Example:

Compute a 95% confidence interval for the mean if \( \bar{x} = 100 \), \( s = 15 \), and \( n = 25 \).

5. Hypothesis Testing

5.1 Null and Alternative Hypotheses

The null hypothesis \( H_0 \) is a statement of no effect or no difference, while the alternative hypothesis \( H_1 \) is a statement indicating the presence of an effect or difference. Hypothesis testing determines whether to reject \( H_0 \) in favor of \( H_1 \).

Example:

Test whether a new drug is more effective than the standard treatment. \( H_0: \mu_{\text{new}} \leq \mu_{\text{standard}} \), \( H_1: \mu_{\text{new}} > \mu_{\text{standard}} \).

5.2 p-Value and Significance Level

The p-value measures the strength of evidence against the null hypothesis. If the p-value is less than the significance level \( \alpha \), reject \( H_0 \). Common significance levels are 0.05 or 0.01.

Example:

If the p-value is 0.03 and the significance level is 0.05, should you reject the null hypothesis?

6. Correlation and Regression

6.1 Correlation Coefficient

The correlation coefficient \( r \) measures the strength and direction of the linear relationship between two variables. It ranges from -1 to 1:

Example:

Compute the correlation coefficient for the data pairs (1, 2), (2, 3), (3, 4), (4, 5), (5, 6).

6.2 Simple Linear Regression

Simple linear regression models the relationship between a dependent variable \( y \) and an independent variable \( x \) using the equation:

\[ y = \beta_0 + \beta_1 x + \epsilon \]

where \( \beta_0 \) is the y-intercept, \( \beta_1 \) is the slope, and \( \epsilon \) is the error term.

Example:

Find the linear regression equation for the data: (1, 2), (2, 3), (3, 5), (4, 4), (5, 6).

7. Analysis of Variance (ANOVA)

7.1 One-Way ANOVA

One-way ANOVA tests whether there are statistically significant differences between the means of three or more independent groups. The null hypothesis states that all group means are equal.

Example:

Test whether three different teaching methods lead to different student performance scores.

8. Chi-Square Tests

8.1 Chi-Square Test for Independence

The Chi-square test for independence assesses whether two categorical variables are independent of each other. The test statistic is:

\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]

where \( O \) is the observed frequency and \( E \) is the expected frequency.

Example:

Test whether there is an association between gender (male, female) and preference for a product (like, dislike).

9. Non-Parametric Tests

9.1 Wilcoxon Signed-Rank Test

The Wilcoxon Signed-Rank Test is a non-parametric test used to compare two related samples or repeated measurements on a single sample. It assesses whether their population mean ranks differ.

Example:

Compare the median scores of two treatments applied to the same subjects.

10. Time Series Analysis

10.1 Moving Averages

Moving averages smooth out short-term fluctuations and highlight longer-term trends in time series data. The simple moving average (SMA) for period \( t \) is:

\[ SMA_t = \frac{1}{n} \sum_{i=0}^{n-1} X_{t-i} \]

Example:

Calculate the 3-period moving average for the following time series: 10, 20, 30, 40, 50.

11. Bayesian Statistics

11.1 Bayes' Theorem

Bayes' Theorem relates the conditional probability of event \( A \) given \( B \) to the conditional probability of \( B \) given \( A \), and the individual probabilities of \( A \) and \( B \):

\[ P(A \mid B) = \frac{P(B \mid A) P(A)}{P(B)} \]

Example:

If the probability of having a disease is 1%, the probability of testing positive given the disease is 90%, and the probability of testing positive without the disease is 5%, what is the probability of having the disease given a positive test result?

12. Sampling Methods

12.1 Simple Random Sampling

Simple random sampling is a method where each individual in the population has an equal chance of being selected. It is the most basic sampling technique.

Example:

Select 5 students randomly from a class of 30.

13. Central Limit Theorem

13.1 Central Limit Theorem

The Central Limit Theorem states that the distribution of the sample mean will approach a normal distribution as the sample size increases, regardless of the population's distribution, provided the sample size is sufficiently large.

Example:

Explain how the Central Limit Theorem applies to the average heights of a large sample of students.

14. Experimental Design

14.1 Randomized Controlled Trials (RCT)

Randomized Controlled Trials (RCT) are experiments where participants are randomly assigned to different treatment groups to test the effect of an intervention.

Example:

Design an RCT to test the effectiveness of a new educational method on student performance.

15. Quality Control

15.1 Control Charts

Control charts are used in quality control to monitor whether a process is in statistical control. Common types include X-bar charts (for the mean) and R-charts (for range).

Example:

Create an X-bar chart to monitor the diameter of manufactured bolts.