Sociological Statistics

Published on Jan 15, 2019

No Description


The Central Limit Theorem

Chapter 8
Photo by allenran 917

generalize from a sample to a population

Photo by Mervyn Chan

Taking probability and sampling to the next level

Photo by Martin Adams


Photo by dylan nolte

the real magic

quantifying likelihood we are right
Photo by Krystal Ng

Central Limit Theorem's superpowers

(a list of 4 exciting things)

1) if we have info on the population, we can infer about any given sample

Photo by Will Echols

2) if we have info about a proper sample, we can infer about the population

Photo by rawpixel

3) If we have data on a population and a sample, we can determine how weird that sample is

4) if we have info on two samples, we can determine whether they likely came from the same population

info = averages & variation

variation between groups vs. within groups

race - does it genetically exist?

Photo by Omar Lopez

CLT: degree of confidence

it's not magic, it's probability

How? Repeated samples' means for any population will be roughly normally distributed around the population's actual mean.

Photo by JeepersMedia

whoa Nelly...

  • A population has a mean
  • Multiple samples each have a mean
  • Most of the sample means will be near the population mean, but not all
  • Sample means will be normally distributed, so 68% will be within 1 SD
  • This is all true even if the population is not normally distributed

household income distribution example

Photo by Kelly Sikkema

in a representative sample

  • our best guess of the mean of any sample is the population mean
  • the proportions in the sample should roughly mirror the population
Photo by pedrosimoes7

more samples?

the means will get closer to being normal
Photo by monkeyinfez

larger sample sizes?

the tighter the distribution bunches (less affected by outliers)
Photo by Centophobia

Caveat: the Central Limit Theorem needs sample sizes of at least 30 to work

Photo by kreg.steppe

68% within 1 standard deviation

95% within 2, 99.7% within 3

standard error:

standard deviation of the sample means
Photo by Q9F

standard deviation = dispersion in 1 group, avg distance from each case to the mean

standard error = dispersion of the sample means (multiple samples), average distance of each sample's mean from the true mean

Photo by Aldo Schumann

big standard error?

means are spread out in the samples
Photo by publicenergy

put another way, sample means will cluster around the population mean less tightly if there's lots of variation in the population

Photo by Andrew Ridley

Standard error formula

for understanding, not calculating
Photo by rawpixel

standard deviation in the numerator

sample size in the denominator

Photo by Rakesh JV

we don't often know the st deviation in the population


68% of sample means will be within 1 st error of the true mean long as we have big enough samples

put it all together

  • means of large samples will normally distribute around population mean, even if population isn't normal
  • most will be close to the pop. mean
  • probability says 68% within 1 st. error, 95% within 2 st. errors
  • if it isn't likely chance, there's probably some other factor in play

Leeda Copley

Haiku Deck Pro User