If you’re looking for variability, standardization and sampling distributions chapter notes, then you’re definitely in the right place. These notes are not only for my fellow Bertelsmann Udacity Data Science Scholarship students that maybe are just beginning or simply they want to go through the material once more but for any of you that expressed an interested in this subject.

data science variability

Variability

Variability measures how much your scores differ from each other. In other words refers to how spread out a group of data is.

Measures of Variability: Range​

Measures of Variability: Range

Range is the simplest measure of variability. You take the smallest number and subtract it from the largest number to calculate the range. This shows the spread of our data. The range is sensitive to outliers, or values that are significantly higher or lower than the rest of the data set, and should not be used when outliers are present.

Measures of Variability: IQR​

Measures of Variability: IQR

When working with sets of data that contain outliers we can use IQR (interquartile range). The IQR, or the middle fifty, is the range for the middle fifty percent of the data.

Measures of Variability: Variance​

Measures of Variability: Variance

Variance is the average of the squared deviations from the mean

Measure of Variability: Standard Deviation

Measure of Variability: Standard Deviation

It represents the square root of the variance. Like the variance, the standard deviation measures how close the scores in the data set are to the mean.

formula for standardization

Standardizing

Standardization coverts individual scores to standard scores and allows us to determine where the score falls in relation to other scores.

Sampling Distribution

A sampling distribution is the frequency distribution of a statistic over many random samples from a single population.

central limit theorem formula

Central Limit Theorem

The Central Limit Theorem states that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger.