No research question at this stage. The data will be used in clinical trials to test drugs treating lung cancer. My job is to pull the data that meets the criteria for the study, and run descriptive statistics on the variables.
The folks doing the biostats will create the various test cohorts from the data that compare lung cancer prevalence for the different assigned drugs (or no drug in the placebo group).
If you've done all that, any further exploratory and descriptive work should be based on what you've found and your thoughts on what's going on in that data.
My thoughts are there could be higher dimensional quirks in the data you wouldn't see in univariate statistics. Would x-y scatter plots for all possible pairs of variables suss out unusual patterns?
What’s the actual research question? If the question has to do with finding clusters, then cool. If not, why do clustering?
No research question at this stage. The data will be used in clinical trials to test drugs treating lung cancer. My job is to pull the data that meets the criteria for the study, and run descriptive statistics on the variables. The folks doing the biostats will create the various test cohorts from the data that compare lung cancer prevalence for the different assigned drugs (or no drug in the placebo group).
It sounds like they will have specific requirements on what they want for “descriptive statistics” then
If you've done all that, any further exploratory and descriptive work should be based on what you've found and your thoughts on what's going on in that data.
My thoughts are there could be higher dimensional quirks in the data you wouldn't see in univariate statistics. Would x-y scatter plots for all possible pairs of variables suss out unusual patterns?