```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ```{r message=FALSE} library(tidyverse) ``` ## Data The dataset, labeled as `arthiris` is provided on the [Teaching Statistics in Health Sciences Resource Portal](https://www.causeweb.org/tshs/arthritis-treatment/). The following information about the dataset is provided. >Rheumatoid arthritis (RA) patients in two age ranges who were receiving care at a clinic in Philadelphia are included. Variables include age and sex, several indicators of disease activity and whether or not patients were administered selected common treatments for RA. ```{r message=FALSE} arthritis <- read_csv("https://raw.githubusercontent.com/cosmos-uci-dshs/data/main/RheumArth_Tx_AgeComparisons.csv") %>% janitor::clean_names() %>% mutate(sex = case_when(sex == 0 ~ "female", sex == 1 ~ "male")) %>% mutate(sex = as.factor(sex)) %>% mutate(age_gp = case_when(age_gp == 1 ~ "control", age_gp == 2 ~ "elderly")) %>% mutate(age_gp = as.factor(age_gp)) %>% mutate(cdai_yn = case_when(cdai_yn == 1 ~ "no", cdai_yn == 2 ~ "yes")) %>% mutate(cdai_yn = as.factor(cdai_yn)) ``` A data dictionary for this dataset is provided [online](https://www.causeweb.org/tshs/datasets/RheumArth_Tx_AgeComparisons_Data%20Dictionary.pdf). Note that the variable has been changed to adhere to tidyverse style guide and a little bit of data cleaning has been done. Throughout the lab make sure to answer each question with code and text. We don't want you to just run code but also be able to interpret what you see. Remember that you are working in your groups. Work question by question, wait for each other, don't rush each other out, discuss everything with each other. Enjoy the process. This is not a marathon! Write the names of your group members here with your own name first: --- ## Question 1 Familiarize yourself with the dataset. How many variables are there? What variables catch your eye? How many patients are there? ## Question 2 Is the distribution of ages of patients right-skewed, left-skewed, or symmetric? What does this imply, are there more older patients with RA? Answer this question visually, and numerically. Calculate the mean and median and report which one is greater. ## Question 3 Is RA more commonly seen in men or women? Answer this question numerically and with a visual. We will learn more rigorous to test this but for now we can eye-ball based on numbers and visuals. ## Question 4 What is relationship between age and sex? Hint: think about what kind of variables age and sex are (numeric vs. categorical). Review what kind of plot we have used for such variables. ## Question 5 Read about how [Clinical Disease Activity Index (CDAI)](https://www.rheumatology.org/Portals/0/Files/CDAI%20Form.pdf) is calculated. Is it better for patients to have a high or low CDAI? ## Question 6 Do "elderly" group have higher or lower CDAI overall when compared with the "control" group? Answer with a visual and comment on what you see. ## Question 7 What is the relationship between `age` and `cdai`? Is there any strong and obvious relationship? Answer with a visual and comment on it.