layout: true --- class: title-slide <br> <br> .right-panel[ # Confidence intervals ## Dr. Uma Ravat University of California at Santa Barbara <br> <br> Copyright © <a href="https://www.pstat.ucsb.edu/people/uma-ravat">Dr. Uma Ravat</a> <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">CC BY-NC-SA 4.0</a> ] --- class: middle # Now, let's flip things a bit - I do not show you the candy machine or give you any details about it - I show you one random sample with 25 candy. .center[ <img src="./img/pic-sample-only.png" > ] - What is best guess/estimate for the proportion of orange candy in the candy machine from which I got my sample? ??? - you can type up your answer in the chat, but wait till i tell you to hit enter - Put your guesses in the chat - Since you have one sample proportion (= .4), you probably used it as a point estimate of the population proportion I am asking you to guess. --- <img src="./img/pic-candymachine-sample.png" width="65%" style="display: block; margin: auto;" /> ??? How many of you were correct? --- class: center middle ## Quite rare to hit the exact population parameter if you simply use the sample statistic as a point estimate of the population parameter you are trying to estimate. --- ## This is what happens in reality: - You only have one sample - the data that you have observed/collected. - You do not know the population distribution or parameters. - and you need to make _inferences_ about the population parameters based on the one sample you have. .pull-left[ <img src="./img/pic-sample-only.png" width="65%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="./img/pic-candymachine-sample.png" width="75%" style="display: block; margin: auto;" /> ] --- class: middle ## Ideas to improve our guesstimate so that we are more likely to include the true population parameter ? - Think about reasons why using `point estimate` as our guess, we will rarely guess the true population parameter correctly? - **Sampling variability** - If incorporate sampling variability into our guess by report a range of plausible values around the point estimate, we have a good shot that the range of values captures the true proportion of orange candy in the candy machine --- class: middle ### How should we incorporate sampling variability (SE) into our estimate/guess for the true population proportion `\(p\)` ? - By CLT, under certain conditions, `\(\hat{p} \sim \text{approximately } N(\text{mean} = p, \text{SE} = \sqrt{\frac{p(1-p)}{n}})\)` - SE of sampling distribution of `\(\hat{p}\)` can be estimated to be `\(\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)` since we do not know the true `\(p\)` and by CLT we can assume `\(\hat{p}\)` is close to `\(p\)`. --- class: middle # Confidence intervals - are interval estimates (rather than point estimates) that are used to estimate the population parameter - includes a plausible range of values for the population parameter by incorporating SE to the point estimate. - are calculated at desired levels of confidence generally 95%,99%, 90%, 80% etc --- ### 95% Confidence Interval for one proportion `$$\hat{p} \pm 1.96 \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$` - we estimated SE of `\(\hat{p}\)` to be `\(\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)` - we assume `\(\hat{p} \sim \text{approximately normal }\)` - For a normal distribution 95% of the data falls within `1.96` standard deviations of the mean. --- class: middle .pull-left[ <img src="07_b_ci_prop_files/figure-html/unnamed-chunk-4-1.png" width="400px" /> ] .pull-right[ ```r qnorm(0.975, mean = 0 , sd = 1) ``` ``` ## [1] 1.959964 ``` ```r qnorm(0.025, mean = 0 , sd = 1) ``` ``` ## [1] -1.959964 ``` ] --- class: inverse middle center # Example Calculating a confidence interval --- .pull-left[ For our candy sample `\(\hat{p} = 0.4\)` <img src="./img/pic-sample-only.png" width="45%" style="display: block; margin: auto;" /> ```r phat = 0.4 # sample proportion n= 25 # sample size SE = round(sqrt((phat*(1-phat))/n),4) SE ``` ``` ## [1] 0.098 ``` ```r moe = round(1.96*SE,4) moe ``` ``` ## [1] 0.1921 ``` ] -- .pull-right[ so for this sample 95% Confidence Interval is calculated as follows `$$\hat{p} \pm 1.96 \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$` = 0.4 `\(\pm\)` `1.96*` 0.098 <br> = 0.4 `\(\pm\)` 0.1921 <br> = (0.4 `\(-\)` 0.1921,0.4 `\(+\)` 0.1921 ) <br> = (0.2079,0.5921 ) <br> ] --- class: middle For our candy sample `\(\hat{p} = 0.4\)` 95% Confidence Interval = (0.2079,0.5921 ) ## Interpreting this confidence interval We are 95% confident that the true proportion of orange candy in the candy machine falls between 0.2079 and 0.5921. #### Got it right this time! Our interval estimate includes `\(p = 0.57\)`! -- - .important[We only know we are right because I showed you the contents of the candy machine. Otherwise, there is no way to know whether we got it right or wrong.] --- class: inverse middle # Activity 1: Calculating Confidence Intervals 1. Go [here](http://www.rossmanchance.com/applets/2021/oneprop/OneProp.htm?candy=1) 2. Draw one candy samples of size 25 from the candy machine with `\(p = 0.57\)` 3. build a confidence intervals for this sample 4. Does the interval you constructed include the true `\(p = 0.57\)`? 5. Put your answer in the chat. --- class: middle .pull-left[ For another candy sample `\(\hat{p} = 0.36\)` <img src="./img/pic-sample-only2.png" width="45%" style="display: block; margin: auto;" /> ```r phat = 0.36 n= 25 SE = round(sqrt((phat*(1-phat))/n),4) SE ``` ``` ## [1] 0.096 ``` ```r moe = round(1.96*SE,4) moe ``` ``` ## [1] 0.1882 ``` ] -- .pull-right[ so for this sample 95% Confidence Interval is calculated as follows `$$\hat{p} \pm 1.96 \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$` = 0.36 `\(\pm\)` `1.96*` 0.096 <br> = 0.36 `\(\pm\)` 0.1882 <br> = (0.36 `\(-\)` 0.1882,0.36 `\(+\)` 0.1882 ) <br> = (0.1718,0.5482 ) <br> = (0.1718,0.5482 ) ] --- class: middle For another candy sample `\(\hat{p} = 0.36\)` 95% Confidence Interval = (0.1718,0.5482) ### Interpreting this confidence interval We are 95% confident that the true proportion of orange candy in the candy machine falls between 0.1718 and 0.5482. #### Got it **wrong** this time! Our interval estimate **does not** include `\(p = 0.57\)` :( - .important[We only know we are wrong because I showed you the contents of the candy machine. Otherwise, there is no way to know whether we got it right or wrong.] --- class: inverse middle # Activity 2: Simulating Confidence Intervals We will draw candy samples of size 25 from the candy machine with `\(p = 0.57\)`, build a confidence intervals for each sample and study the confidence intervals 1. Go [here](http://www.rossmanchance.com/applets/2021/confsim/ConfSim.html) 2. In **Describe process:** set Statistic = Proportions, Distribution = Binomial, Method = Walds ( since we are using the z-interval), `\(p\)`, the population proportion = 0.57, sample size = 25, number of intervals = 1 3. Confidence interval = 95% 4. Click the button Sample a few times --- class: middle # Why do we say 95% confidence ? Suppose we take many samples and build a confidence interval corresponding to each sample using the equation `point estimate ± 1.96×SE`. then - about **95%** of all these intervals we constructed would **contain the true population proportion** `\(p\)` (which is 0.57 in our example) - about 5% of all these intervals would not contain the true population proportion `\(p\)` --- class: middle In other words, to guess the **unknown population parameter**, if we use the equation `point estimate ± 1.96×SE` instead of just guessing `point estimate`, then the interval we provide will correctly include the unknown population parameter 95% of the time. That is **awesome** considering that guessing using just the `point estimate` would most often be wrong ! :-) --- class: inverse middle center # Things about Confidence levels --- class: middle # Confidence levels Commonly used confidence levels in practice are 90%, 95%, 98%, and 99%. A confidence interval can be calculated using the formula $$ \text{point estimate} \pm z^\star \times \text{SE} $$ - `\(z^\star\)` is called the critical value. - `\(z^\star \times SE\)` is called the _margin of error_ - width of CI = 2 X margin of error --- class: middle # Width of a confidence interval If we want to be more certain that we capture the population parameter, i.e. increase our confidence level, should we use a wider interval or a smaller interval? -- _A wider interval._ --- class: middle ### Can you see any drawbacks to using a wider interval? <img src="./img/garfield.png" width="900" height="250"> -- _If the interval is too wide it may not be very informative._ .footnote[Source: OpenIntro.org] --- class: middle # Changing confidence levels $$ \text{point estimate} \pm z^\star \times \text{SE} $$ In the above formula, for a given sample, - point estimate is the sample statistic ( `\(\hat{p}\)` ) - SE is the standard error which is approx `\(\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)` Thus, as confidence level changes **only** - `\(z^\star\)` in the above formula needs to be adjusted according to confidence level --- class: middle ### For a 95% confidence interval, why is `\(z^\star = 1.96\)`? .pull-left[ <img src="07_b_ci_prop_files/figure-html/unnamed-chunk-11-1.png" width="380px" /> ] .pull-right[ ```r qnorm(0.025, mean = 0 , sd = 1) ``` ``` ## [1] -1.959964 ``` ```r qnorm(0.975, mean = 0 , sd = 1) # critical value ``` ``` ## [1] 1.959964 ``` ] --- class: middle # Exercise: ### What is `\(z^\star\)` for a 98% confidence interval? -- .pull-left[ <img src="07_b_ci_prop_files/figure-html/unnamed-chunk-14-1.png" width="400px" /> ] -- .pull-right[ ```r qnorm(0.01, mean = 0 , sd = 1) ``` ``` ## [1] -2.326348 ``` ```r qnorm(0.99, mean = 0 , sd = 1) # critical value ``` ``` ## [1] 2.326348 ``` ] --- class: middle # Width of the confidence interval CI = `\(\text{point estimate} \pm \text { critical value} \times \text{standard error}\)` - Higher level of confidence means larger critical value. - Larger critical value means higher margin of error. - Higher margin of error means wider CI. Thus 99% CI would be the wider compared to 95% CI --- class: middle ## Constructing Confidence Intervals point estimate `\(\pm\)` critical value `\(\times\)` standard error 1. Calculate point estimate. 2. Calculate critical value. Use `R` to calculate this 3. Calculate standard error. 4. Construct the confidence interval. --- class: middle ## Confidence Intervals for other population parameters Confidence Interval = point estimate `\(\pm\)` critical value `\(\times\)` SE of the estimate | | Parameter of Interest | Point Estimate | critical value | standard error of the estimate | |-------------------------------|-----------------------|-----------------------------------| | Proportion | `\(p\)` | `\(\hat{p}\)` |z | `\(\sqrt{\frac{p(1-p)}{n}}\)` | | Mean | `\(\mu\)` | `\(\bar x\)` |z for large n or t for small n | `\(\sqrt{\frac{s^2}{n}}\)` | `\(p\)` is the sample proportion, `\(\bar{x}\)` is the sample mean, `\(s^2\)` is the sample variance. Conditions for CLT should hold to be able to use z as critical value --- class: middle # Conditions for CLT to hold - random sample (each observation is independently drawn.) - for proportions - np and n(1-p) large (10 or more) - for means - sample size n should be large (30 or more) - if sample size n is small, then use t distribution with `\(n-1\)` degrees of freedom to find the critical value --- class: middle # Finding t critical value for 95% CI when `\(n = 15\)` ![](07_b_ci_prop_files/figure-html/unnamed-chunk-17-1.png)<!-- --> The critical value is: ```r qt(0.975, df = 14) ``` ``` ## [1] 2.144787 ``` --- class: middle Confidence intervals are ... - interval estimates for population parameters constructed from a sample - only about population parameters, not individual observations - confidence is in the process used to generate the confidence intervals - confidence is not in the fact that the confidence interval contains or does not contain the population parameter .important[There is no probability associated with a confidence interval. Either the population parameter is in the interval or not.] --- # Acknowledgement Thanks to Dr Mine Dogucu for suggestions for improvement of this material. This content has been developed and shaped by referring to several materials including 1. [Dr. Allan Rossman's Ask good questions blog](https://askgoodquestions.blog/) 2. [OpenIntro.org resources](https://www.openintro.org/book/os/) 3. [Dr. Mine Dogucu materials](https://mdogucu.ics.uci.edu/l)