What would be the most appropriate probability distribution for each of the following random variables:
Consider the following plots. Write down these probabilities:
Consider the following plots. Write down these probabilities:
You can use rbinom to sample from a bimomial distribution.
Suppose the probability of a specific disease is 0.2 and we want to know the probability of observing 3 out of 10 people affected by the disease: \(P(Y=3)\). We can use dbinom, which returns the probability of a specific value.
dbinom(x=3, size = 10, prob = 0.2)## [1] 0.2013266Find the probability of observing either 3 or fewer patients.
Note that Binomial(1, 0.2) is the same as Bernoulli(0.2):
rbinom(n=10, size = 1, prob = 0.2)##  [1] 1 0 0 1 0 0 0 1 0 0rbinom(n=10, size = 10, prob = 0.2)##  [1] 4 2 2 0 3 4 3 3 2 4We can plot the probability mass function (pmf).
x <- 0:10
pmf <- dbinom(x, size=10, prob=0.2)
plot(x, pmf, type="h", xlab="Number of Successes", ylab="Probability Mass", main="Binomial(10, 0.2)")
points(x,pmf, pch=16)
abline(h=0, col="gray")Or, we can use ggplot:
df <- data.frame(x = x, y = pmf)
ggplot(data = df,  aes(x = x, y = y, xend = x, yend = rep(0, length(x)))) +
  geom_point() + geom_segment() + 
  xlab("Number of Successes") + ylab("Probability Mass") +
  scale_x_continuous(breaks=x)Now generate 1000 samples form Binom(10, 0.2) distribution and plot the distribution of the resulting data.
Again suppose that we are interested in the probability of observing 3 or fewer affected people in a group of 10. We could of course sum the values of pmf: \(P(Y \leq 3) = P(Y=0) + P(Y=1) + P(Y=2) + P(Y=3)\). However, it is easier to use the cumulative distribution function for a binomial random variable pbinom to obtain the lower tail probability:
pbinom(3, size=10, prob=0.2, lower.tail=TRUE)## [1] 0.8791261By changing the lower.tail option to FALSE, we can find the upper tail probability \(P(Y>3)\).
Suppose BMI in a specific population has a normal distribution with mean of 25 and variance of 16: \(X \sim N(25, 16)\). Then we can simulate 5 values from this distribution using the rnorm function.
rnorm(n=5, mean=25, sd=4)## [1] 21.97157 25.66485 22.40555 20.73714 22.70862These numbers can be regarded as BMI values for 5 randomly selected people from this population. In the rnorm function, the first parameter the number of samples, the second parameter is the mean and the third parameter is the standard deviation (not the variance).
You can also plot the pdf:
x <- seq(from=10, to=40, length=100)
fx<- dnorm(x, mean=25, sd=4)
plot(x, fx, type="l", xlab="BMI", ylab="Density", main="N(25, 16)")
abline(h=0, col="gray")Or, we can use ggplot:
df <- data.frame(x=x, y=fx)
ggplot(data = df, aes(x =x)) + 
  geom_function(fun = dnorm, args = list(mean = 25, sd = 4))+
  xlab("BMI") + ylab("Density")Now generate 1000 samples from \(N(25, 16)\) and plot the distribution of the resulting data.
Remember that for continuous variables the probability of a specific value is always zero. Instead, for continuous variables, we are interested in the probability of observing a value in a given interval. For instance, the probability of observing a BMI less than or equal to 18.5 is the area under the density curve to the left of 18.5. In R, we find this probability with the cumulative distribution function pnorm:
pnorm(18.5, mean=25, sd=4, lower.tail=TRUE)## [1] 0.05208128Once again, we can find the upper tail probability \(P(X > 22)\) by setting the option lower.tail=FALSE.
The qnorm function returns the quantile for normal distributions is. For example, the 0.05 quantile for the above distribution is
qnorm(0.05, mean=25, sd=4, lower.tail=T)## [1] 18.42059Now find \(P(25 < X \le 30)\).
Consider Binomial(20, 0.3) distribution. Do the following tasks:
Consider \(N(3, 2.1)\) distribution. Do the following tasks:
For the probability distributions Binomial(100, 0.3) and \(N(30, 21)\), find the lower tail probability of 35 and the upper tail probability of 27. Compare the results based on the two distributions.
Suppose \(X\) has a \(t\)-distribution with 6 degrees of freedom.
National Heart, Lung and Blood Institute defines the following categories based on Systolic Blood Pressure (\(SBP\)): - Normal: \(SBP \le 120\) - Prehypertension: $ 120 < SBP $ - High blood pressure: \(SBP > 140\)
If \(SBP\) in the US has a normal distribution such that \(SBP \sim N(125, 15^{2})\),
Assume that BMI in US has a \(N(27, 6^2)\) distribution. Following the recommendation by National Heart, Lung, and Blood Institute, we define the following BMI categories:
Underweight: \(BMI \le 18.5\)
Normal weight: $ 18.5 < BMI $
Overweight: $ 25 < BMI $
Obesity: \(BMI > 30\)
Use R to find the probability of each group.
Find the intervals that include 68, 95, and 99.7% of the population.
What is the probability of being either underweight OR obese (i.e., the union of the two intervals)?
What are the lower and upper tail probabilities for BMI equal to 29.2?
For the above question, we denote BMI as \(X\). Find the value \(x\) such that \(P(X \le x) = 0.2\). Next, find the value \(x\) such that \(P(X > x) = 0.2\).
If the height (in inches) of newborn babies has the \(N(18, 1)\) distribution, what is the probability the the height of a newborn baby is between 17 and 20 inches? What is the distribution of height in centimeters (1 inch = 2.54 cm)? Using this distribution, what is the probability that the height of a newborn baby is between 43.18 cm (17 inches) and 50.80 cm (20 inches)?
Suppose the distribution of systolic blood pressure, \(X\), among people suffering from hypertension is \(N(153, 4^{2})\). Further, suppose that researchers have found a new treatment that drops systolic blood pressure by 4 points on average. The effect of drug, \(Y\), varies among patients randomly and it does not depend on their current blood pressure level. If the variance of \(Y\) is 1. What is the mean (expectation) and variance of systolic blood pressure if every person in the population starts using the drug? What is the distribution of systolic blood pressure in this case if we assume \(Y\) has a normal distribution?