2011年2月24日星期四

Statistics 2: Inferential statistics concepts

INFERENTIAL STATISTICS

Inferential statistics is used to infer the information about population from the observation of a sample.

Population-Sampling frame-Sampling pool (probability/random sampling or non-probability sampling)-Sample

Estimation

Estimation of numerical data

The principal is that if we can draw all samples from a population, then the distribution of sample means will take a normal curve with the sample means that equal the population means situated in the center. However, we can only have the observation of one sample, and we don't know where exactly located that sample in the distribution of samples, in other words, we don't know whether we have a typical representative sample or not. We then estimate the standard deviation of population from the sd of sample by divided (n-1). That is also why the standard deviation calculation is often divide variance by (n-1) not n. We then have standard deviation of sampling distribution of means by divide sd of population by square root of sample size. According to nature of the normal distribution, we estimate the range of population means from the means of sample with a certain confidence level. Say (sample means-2sd; sample means+2sd) with 95% of confidence level.

Student t shows the adjustment of confidence level when the sample size varies. Basically when the sample size is higher than 30, it respects the normal distribution.

Estimation of categorical data

The principal is the same only we treat category percentage here. The calculation of standard deviation of sampling distribution of percentage will be:

NewImage

Another difference is that the confidence level of the estimation is not only influence by the sample size but also by the category percentage in the population. As there is no student t table for categorical data, when have another benchmark: smaller category percentage of sample * sample size > 1000, if not increase sample size.

Hypothesis testing

A good hypothesis should:

1) reflect the theoretical background and available references

2) short and clear affirmative sentence

3) relationship between variables

4) testable

Inferencial statistics basic concept:
1. Null hypothesis vs. research hypothesis (nondirectional and two tailed test; directional and one tailed test)

Null hypothesis means in the whole population, there is no difference between two samples, or in other words the independent variables doesn't cause the different distribution of dependent variable. the observed variance is due to the coincidence.

2. Normal Curve

Mean=Median; 34.13% of the data is between mean and mean+1 standard deviation (or -1); 13.59% between +1 and +2 sd; 2.15% between +2 and +3 sd, 0.13% more than +3 sd. (Probability: < +1 84%; (+1,+2) 14%; (+2, +3) 2.15%)

3. Standard score (Z)

measure the distance tween a data x and the sd, which can infer the probability of the appearance of a data x.

4. Significance rate: 5%
Z score is to determine whether one event is caused by purely chance or just a result of casual distribution of probability. thus, if one event is hardly to happen under normal condition, that means below 5% (Z>I1.65I), then null hypothesis is wrong.

5. Significance level (p)

6. Degree of freedom (df)


Type I error (α): while null hypothesis true you decide null hypothesis is wrong. and Type II error (β): vice versa
null hypothesis is about the total and can not be verified directly.

Type II error is related to the sample size

I think the most valuable part of today's chapiter is about the design of inferencial statistics. the significance of statistic itself is meaningless. the most important task goes to the analytic work generating the hypothesis about variables. in the practical aspect, the sample construction is also more important then the verification itself. 

How to control the variables and how to choose the appropriate instruments is what I'll continue to learn. but I don't think it will be more difficult and more complicated than the original academic analysis, especially with help of computer software. 

P.108

没有评论:

发表评论