Students Gain Big Data Science (BDs) Skills to Explore Epidemiologic and Neuroimaging Data for Disease Prevention

Fullerton

Data Science

 

 

​​​​Virtually 90 percent of data available has been generated in the last few years alone! Technological, medical, diagnostic and other scientific advances have contributed to generating enormous amounts, varieties and sources of complex big data that have vast potential for the creation of new knowledge, particularly in disease prevention and control. However, the newly emerging field of big data science (BDs) has inherent challenges of utilization and value. 

Cal State Fullerton is working collaboratively to address these issues. Its Big Data Discovery and Diversity through Research Education Advancement and Partnerships (BD3-REAP) program has partnered with four campus colleges, external institutions and key faculty, including Dr. Archana McEligot, epidemiologist and professor of public health; Dr. Sam Behseta, mathematics professor; Dr. Math Cuajungco, biological sciences professor; and other faculty. They are providing comprehensive didactic and research opportunities in BDs for CSUF undergraduates, improving student research exposure, training and attitudes toward BDs. BD3 s​cholars have gained skills in computation—including Python, R and MATLAB—as well as an understanding of modern statistical techniques such as principle components analyses, and ridge and lasso regression. Importantly, BD3 scholars applied BDs skills to tackle large epidemiologic and neuroimaging data to address vital biomedical questions. 

Of the 18 BD3 scholars, Alysia Bright, Stephen Gonzalez, Gwen Lind, Mimi Ngo, Cydney Parker, Galilea Patricio and Shaina St. Cruz explored the large publicly available National Health and Nutrition Examination survey data, investigating research questions such as the role of folate in depression in diverse populations, and links between physical activity and sedentary behavior to blood pressure and obesity. Also, BD3 scholars gained appreciation for handling complex, large datasets, including understanding population sampling surveys, weighting, cleaning, merging and identifying appropriate variables. St. Cruz and Gonzalez examined dietary folate intakes in nonHispanic whites, Asians and Hispanics and found that dietary folate intake is inversely associated with depression, particularly in the Hispanic population, suggesting increased folate consumption for Hispanics. 

​Also, in partnership with USC, students utilized MATLAB and wrote scripts for functional magnetic resonance imaging (fMRI) datasets, identified potential biomarkers for post-traumatic epilepsy, explored Alzheimer’s risk factors among Mexican Americans, conducted brain imaging with biomarkers in neurodegenerative disorders in Mexican Americans and predicted brain age by combining brain MRI data with deep-learning neural network algorithms. 

Of the initial 12 scholars trained (the first two cohorts), all co-authored a peer-reviewed manuscript and/or presented at national or regional meetings. Further, of the six BD3 scholars who applied to graduate school, all were accepted into graduate programs at universities such as University of California, Los Angeles; USC; Dartmouth; Emory; and University of Chicago. In evaluations, two BD3 scholars indicated that the program changed their lives. 

​​