Finally I completed the online course Statistics on Udacity. Overall, it is a good introduction to basic descriptive statistics. If you are new to it or want a refreshment on the concept, this is the course you can consider.
The course staff is creating summary pages of each lesson which is really awesome (you probably have to login to see it). I just want to make my own note out of it.
Statistics involves collection, organization, analysis, interpretation , and presentation of data.
- Design or find a problem which needs to solved/analyzed using statistics.
- Collect data relevant to the problem.
- Apply statistic techniques to reach conclusions (or results).
Population: we always needs to define the population before any study. If it relates to human beings, then the largest population should be the entire human race.
Sample: we are not able to study the whole population so we take a subset to infer the population. (Someone might say hey our machine can handle big data now. But I’m still not convinced that we don’t need to take sample any more).
To do this inference from sample to population, we need a representative sample with a good size as well as sound research. However, getting a good sample is not easy (another post maybe).
Sampling error is a vital measurement which describes how much the results obtained from the sample are applicable for the whole population.
Correlation does not imply Causation! Always remember that! Two variables A as independent variable and B as dependent variable are correlated. But this does not mean A causes B since there may be other factors are causing B.