Zhijin Wu PhD
  
Formula for midtrem 2 (44.5 KB)
(IT has some extra formulas you won't need yet)
Lecture 1 (73.1 KB)
Introduction to statistics
Lecture 2 (0.1 MB)
exploratory analysis and summary statistics. (SMMR Chapter 2)
lecture 3 (0.2 MB)
Introduction to probability (SMMR 3.1-3.3//MSDA Chap. 1). A review of permutation/combination may be helpful before the class http://www.youtube.com/watch?v=buyevw1flZk
Lecture 4 (0.2 MB)
discrete random variables (SMMR 3.4,3.6,3.7//MSDA 2.1)
Lecture 5 (0.6 MB)
Oct 4 (SMMR 3.8 //MSDA2.2.1, 2.2.3)
Lecture 6 (0.1 MB)
mean and variance of a random variable (MSDA 4.1-4.3)
Lecture 6b (0.1 MB)
additional normal distribution
lecture 7 (85.5 KB)
likelihood (MSSD 9.1,9.2)
Lecture 8 (2.7 MB)
Law of large numbers and Central Limit Theorem (MSDA Chapter 5 & 7)
Lecture 9 (0.3 MB)
Confidence Intervals (SMMR Chapter 4; MSDA Chapter 7)
Lecture10 (0.2 MB)
SMMR Chapter 4
lecture13a (0.1 MB)
extra lecture on ANOVA
lecture13b (0.1 MB)
correlation and regression



2 
<October 2018>
SuMoTuWeThFrSa
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910
More papers you can see examples
You should have access to these papers as long as you are on Brown Network.
http://www.nejm.org/doi/full/10.1056/nejmoa0909859

See Table 1 for comparison of baseline characteristics.
More examples of Hypothesis Testing
Example from the internet

Brian Lopes at UNC provides these examples:
1. An insurance company is reviewing its current policy rates. When originally setting the rates they believed that the average claim amount was $1,800. They are concerned that the true mean is actually higher than this, because they could potentially lose a lot of money. They randomly select 40 claims, and calculate a sample mean of $1,950. Assuming that the standard deviation of claims is $500, and set significance level 0.05, test to see if the insurance company should be concerned.

Hint:  They are concerned that the true mean is actually higher than this (READ: one-sided), because they could potentially lose a lot of money. They randomly select 40 claims (READ: n=40), and calculate a sample mean of $1,950 (READ: X.bar). Assuming that the standard deviation of claims is $500 (READ: known variance), and set significance level 0.05 (READ: alpha), test to see if the insurance company should be concerned.His solution used 1.96 as cut off, it should have been 1.64 for a one-sided test.

2. Trying to encourage people to stop driving to campus, the university claims that on average it takes people 30 minutes to find a parking space on campus.Brian does not think it takes so long to find a spot. In fact he had a sample of five and calculated average time spent to find a parking space is 20min. Assuming that the time it takes to find a parking spot is normal and Brian had an independent sample of five, and that the standard deviation is 6 minutes, perform a hypothesis test with level 0.10 to see if his claim is correct.

Hint: The university claims that on average it takes people 30 minutes (READ: null hypothesis) to find a parking space on campus.Brian does not think it takes so long (READ: one-sided alternative) to find a spot. In fact he had a sample of five and calculated average time spent to find a parking space is 20min. Assuming that the time it takes to find a parking spot is normal and Brian had an independent sample of five, and that the standard deviation is 6 minutes (READ: known variance), perform a hypothesis test with level 0.10 to see if his claim is correct.




3. A sample of 40 sales receipts from a grocery store has sample average $137 and sample standard deviation $30.2. Use these values to test whether or not the mean is sales at the grocery store are different from $150.

Hint: A sample of 40 sales receipts from a grocery store has sample average $137 and sample standard deviation $30.2 (READ: sample SD is estimated, thus you should use student t-test rather than Z-test as the solution file gives). Use these values to test whether or not the mean is sales at the grocery store are different from $150 (READ: two-sided alternative).




For solution, see Brian Lopes' webpage HERE Note that for Q1 the critical value was mistaken. Note that for Q3 I recommend a different answer. See hint for Q3.

4. A study investigates if nicotine patch helps smoking cessation. The study randomly assigned smokers to use a nicotine patch versus a placebo patch. For 50 smokers in the nicotine patch group and 55 smokers in the placebo patch group, 23 and 11 stopped smoking, respectively. Test whether the nicotine patch works using a two-sided test. (Treat 50 as a large sample).

Hint: This is a comparison of two proportions in two random samples. p1.hat=23/50, p2.hat=11/55, n1=50, n2=55. You may choose the most conventional alpha=0.05, or simply compute the p-value first.
More examples of Hypothesis Testing
Example from the internet

Brian Lopes at UNC provides these examples:
An insurance company is reviewing its current policy rates. When originally setting the rates they believed that the average claim amount was $1,800. They are concerned that the true mean is actually higher than this, because they could potentially lose a lot of money. They randomly select 40 claims, and calculate a sample mean of $1,950. Assuming that the standard deviation of claims is $500, and set ® = :05,
test to see if the insurance company should be concerned.

For solution, see Brian Lopes' webpage HERE
More Confidence Interval Examples
Below are some additional examples on confidence interval. You can use these as exercises. To check your answer, highlight the blank space after each question to view the solution.

Q: 
Suppose a student measuring the boiling temperature of a certain liquid observes the readings (in degrees Celsius) 102.5, 101.7, 103.1, 100.9, 100.5, and 102.2 on 6 different samples of the liquid. He calculates the sample mean to be 101.82. If he knows that the measurements follow a normal distribution and the standard deviation for this procedure is 1.2 degrees, what is the confidence interval for the population mean at a 95% and 90% confidence level?

A: Notice that "he knows the standard deviation". The critical value for a 95% confidence interval is 1.96, where (1-0.95)/2 = 0.025. A 95% confidence interval for the unknown mean is ((101.82 - (1.96*0.49)), (101.82 + (1.96*0.49))) = (101.82 - 0.96, 101.82 + 0.96) = (100.86, 102.78).

For confidence level 90%,  (1-90%)/2 = 0.05. The critical value z* for this level is equal to 1.645, so the 90% confidence interval is ((101.82 - (1.645*0.49)), (101.82 + (1.645*0.49))) = (101.82 - 0.81, 101.82 + 0.81) = (101.01, 102.63)



Q:
A dataset on "Normal Body Temperature, Gender, and Heart Rate" contains 130 observations of body temperature, along with the gender of each individual and his or her heart rate. A summary of descriptive statistics is provided below:
Descriptive Statistics 
Variable N Mean Median Tr Mean StDev SE Mean
TEMP 130 98.249 98.300 98.253 0.733 0.064
Variable Min Max Q1 Q3
TEMP 96.300 100.800 97.800 98.700

Find a 95% confidence interval for the mean body temperature.

A:   To find a 95% confidence interval for the mean based on the sample mean 98.249 and sample standard deviation 0.733, first find the 0.025 critical value t* for 129 degrees of freedom. This value is approximately 1.962, the critical value for 100 degrees of freedom (found in Table E in Moore and McCabe). The estimated standard deviation for the sample mean is 0.733/sqrt(130) = 0.064, the value provided in the SE MEAN column of the MINITAB descriptive statistics. A 95% confidence interval, then, is approximately ((98.249 - 1.962*0.064), (98.249 + 1.962*0.064)) = (98.249 - 0.126, 98.249+ 0.126) = (98.123, 98.375).

 

 

Q:The following data (n = 10) were drawn from a normal population
-4.26549, -4.50909, 1.26475, 1.42241, 2.73875, 11.954, 3.61592, -9.68883, -2.96558, -3.48133
The sample mean is  μ = −0.39145 and the sample variance is  35.431. Construct a 95% confidence interval.

A:  We begin with normal distribution, so we don't need CLT to know that the sample mean also follows normal distribution. But we do not know the variance and have to estimate it from the data. Thus we use t-distribution with degree of freedom n-1=9.

t (0.025,df=9)= 2.2622

CI: (−4.65, 3.87)

Q: The same experiment was repeated but this time for n = 100.   Now the sample mean is μ= 2.1373 and the sample variance is 31.855. What is the 95% confidence interval now?

A: when sample size changes your degree of freedom changes.

t (0.025,df=99) = 1.9842

CI: [1.02, 3.26]




Q: A candidate in a two-person election commissions a poll to determine who is ahead. The pollster randomly chooses 500 registered voters and determines that 260 out of the 500 favor the candidate. Estimate the support rate for this candidate in all registered votes and construct a 95% confidence interval for this rate.

A:  Use the formula for estimating proportions. p.hat=0.52. n=500

Lower limit: 0.52 - (1.96)(0.0223) - 0.001 = 0.475
Upper limit: 0.52 + (1.96)(0.0223) + 0.001 = 0.565

CI: (.475,.565)

NOTE: The solution was taken from a book which added a little continuity correction .5/N to expand the CI a little bit, because the normal distribution is an approximation. This is not necessary for your exercise/hw/exam.

  0.52 +/- (1.96)(0.0223) is a perfectly acceptable.

Q:


More examples on probability
9.27
See more examples of probability problems, especially conditional probability below. For each example, treat it as an exercise and try answering it yourself first. Use a diagram or contingency table if necessary.

From Stony Brook University
From Yale
Some exercises in multiple choice form The detailed answers are in the drop down window.
<October 2018>
SuMoTuWeThFrSa
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910