900.1  Research Electives (non-clinical elective)

Course Director: Shannon Fisher, M.D., Ph.D.

Course Coordinator: Ana Gregory              email:

Period to be offered: to be arranged

M4 Research Elective Information.


900.2 Research Elective: Assessing Utility of Patient Data Sets for Addressing Clinical Research Questions

Department of Internal Medicine (non-clinical elective)

Course Directors:

  • Daniel Chen, MD, Department of Internal Medicine
  • Frank Meng, PhD, Department of Internal Medicine

NUMBER OF STUDENTS: Maximum of 2 students per block


AVAILABLE BLOCKS: July through April



Modern data science and analysis capabilities applied to clinical data sets has played a significant role in enabling many recent advances in medical research.  However, each distinct clinical data set has inherent traits that dictate its utility for providing relevant answers to particular clinical questions.  Characteristics such as the types of data elements being collected, frequency of collection, structured vs. unstructured format, consistency, completeness, and accuracy all play a major role in determining the overall effectiveness of the data for specific research studies.  Quantifying and measuring data set attributes requires a deep understanding of the underlying clinical workflows that generate the data as well as a working knowledge of modern data modeling, capture, and curation practices.  This elective will introduce students to techniques for identifying and analyzing the characteristics of patient data sets and provide hands-on learning opportunities for students to directly assess a data set’s capacity for deconstructing various classes of clinical questions.  Students will utilize patient data directly extracted from the VA healthcare system’s electronic health record (EHR) that has been secured in a pre-approved repository, as well as other well-known open source data sets.  The VA EHR data contains clinical data such as demographics, medications, laboratory values, hospitalizations, surgical procedures, progress notes, and radiology reports.  Examples of open source data include The Cancer Genome Atlas (TCGA) clinical and molecular data and the MIMIC-III critical care database.  Questions that have been asked of the clinical data in the past have included determining the number of patients with lung cancer who had been identified prior to and after the LDCT national recommendation.

Students will receive didactic instruction on the VA’s clinical workflows as well as basic database principles such as relational data modeling and the Structured Query Language (SQL).  Didactic instruction covers several introductory data science topics including database systems, the Structured Query Language, and standard analysis techniques. Laboratory sessions provide students with practical mentorship to ensure projects make timely progress. Students are familiarized with the workflows, processes, and practices involved in data generation that dictate the usefulness of the data.  No previous data science knowledge is required but fundamental computer skills are necessary.

Students will be required to work with the research mentors to obtain access to clinical data sets as a prerequisite approximately 8 weeks before the elective begins. At the beginning of the elective, students will meet with the research mentors to discuss research aims and formulate a preliminary timeline for the elective’s major milestones. The timeline for the elective will generally consist of: finalization of research questions to be addressed and initial access to and familiarization with patient data sets (first week); comprehensive examination of data set characteristics and data analysis towards fulfilling research aims (second and third week); and finalize project results and generate documentation (fourth week). The main deliverable of this elective is a final presentation at a research-in- progress meeting of the VA Boston Informatics Group. If sufficient progress has been made, students may also develop an abstract or poster presentation targeted for informatics and/or clinically related conferences.

First week hours will be met and tracked by student participation and engagement in all five instruction and laboratory sessions. The second- and third-week hours will be met by students on their research projects and tracked by requiring students to send daily updates on their progress. Students in the fourth week will spend their time developing and giving their final presentation and writeup and hours will be tracked based on the quality of these artifacts.

This elective will be held at the VA Boston Cooperative Studies Coordinating Center (CSPCC) offices at 5 Post Office Square in downtown Boston. The proposed research mentors will be Drs. Daniel Chen and Frank Meng.