Logistics
Create an .Rmd
from scratch to answer these questions.
Read the article California Takes New Steps To Stop Black Women From Dying In Childbirth.
In this article it is stated “The risk of pregnancy-related deaths is three to four times higher for Black women than for white women, according to the U.S. Centers for Disease Control and Prevention”. This statement by the CDC is based on data. Without accessing the data, we will try to imagine the data. In your groups discuss:
What would a row of such data represent?
What are the minimum required variables and the types of these variables that the researchers must have had in their dataset?
Are there any other variables that you think that the researchers might have collected? If so, what would their types be?
Read the article Heat Waves May Raise Risk of Premature Birth. Similarly, in your groups discuss:
Recall the data on rheumatoid arthritis patients. The data is from Teaching Statistics in Health Sciences Resource Portal. The following information about the dataset is provided.
Rheumatoid arthritis (RA) patients in two age ranges who were receiving care at a clinic in Philadelphia are included. Variables include age and sex, several indicators of disease activity and whether or not patients were administered selected common treatments for RA.
Previously I had provided you the data. For you to practice downloading data, and importing into R this time you will do everything from downloading to exploratory analysis yourselves. You will make a good use of the data dictionary again. You will also work on some of the questions you have previously worked on. However, you know more R so your new answers should go beyond your previous ones. For instance, now your visualizations should have labels at the very least. If you have time, go beyond the lecture notes and explore more features.
In the Codes column in the data dictionary you will see explanations for each value of the variables. 1. Using this at the very least make sure to create variables with the corresponding levels of the CDAI
and DAS_28
variables. In other words, you should have a cdai_level
variable which should show “Remission”, “Low Activity” etc. 2. Then make sure these variables have the appropriate variable types. 3. Do appropriate “cleaning” for the Steroids_GT_5
variable as well. 4. Due to time limitation we may not be able to clean each and every variable but if you end up with extra time please return here.
Do “elderly” group have higher or lower CDAI overall when compared with the “control” group? Answer with a visual and numeric answers and comment on what you see.
Do women get diagnosed with AR at a later age than man? Answer with a visual and numbers. Comment on what you see.
How many patients are in remission?
Are there any patients who have been diagnosed with RA for longer than 40 years?
Are there any men who have been diagnosed with RA for longer than 40 years?
In your groups, come up with at least one question related to RA. This may require for you all to Google about the disease. This is part of the data science process. You should, at the very least look for what das28 is about and how steroids is known to impact RA.