User:Chrisftaylor/Project documents/T1KDSs/Questions

List of questions and question sources:
 * Heather on the ASIS&T award.
 * How often is data from repositories used in the published literature?
 * What is the distribution of use across datasets and time?
 * Who reuses data?
 * Are investigators who reuse repository datasets similar to investigators who deposit data?
 * What is data reused for?
 * How similar are studies that reuse data to studies that deposit data?
 * Research questions from Ben Morris.
 * How often is data from repositories used in the published literature? What is the distribution of use across datasets and time?
 * rates of reuse by repository (histograms, ANOVA) PCA of histogram bin values - which repos have citation frequency distributions which are the most similar
 * cumulative citations over time
 * needed: citation dates for each dataset (from Web of Science or Scopus)
 * regression: IV=time, DV=cumulative citations
 * AUTHORS: Who reuses data? Are investigators who reuse repository datasets similar to investigators who deposit data?
 * things to compare:
 * author department (cat)
 * author country (cat)
 * author institution (cat)
 * chi squared: IV=data vs reuse, DV=(dept, country, institution)
 * STUDIES: What is data reused for? How similar are studies that reuse data to studies that deposit data?
 * things to compare:
 * keywords
 * number of authors
 * author institution
 * author country
 * will require keyword data
 * how similar (in multivariate space, by keywords) are citing papers to the papers they cite?
 * figure 2 from proposal, topic cooccurrence network
 * Questions from CFT...
 * Why the left-skewed and log-normal classes (with a couple looking transitional) in this figure?
 * What are the age and size and user/usage profiles for the ten databases?
 * Journal Archives = JISC..? Searched for data collection articles only?
 * Why are ICPSR's DB and IQSS' DataVerse combined? Any 'funny business'?