Data Science Core Opens for Business

Data Science Visual graphicTwo years ago faculty, and particularly the genomics research community, at BUSM asked for help with data analysis. Dean Karen Antman, MD,  held a data science workshop in February 2020 and the result was a decision to create a service core staffed by faculty scientists to provide the level of analysis and bioinformatics (big data analysis of large data sets using computer and statistical analyses on biological data) that scientists needed to further their research.

The COVID-19 pandemic put that effort on hold until this past September when the search for data science faculty was reopened.

Chao Zhang standing with arms folded
Chao Zhang

The result: the Applied Data Science Core (ADSC) opened for business with the launch of its website and its first hire, Assistant Professor of Computational Biomedicine Chao Zhang, PhD. Associate Professor of Computational Biomedicine Ignaty Leshchiner, PhD, will join Zhang this May.

“The goal is to have on campus the analytical capabilities needed to support and advance the biomedical research of the BUSM faculty,” said Andrew W. Taylor, PhD, BUSM associate dean for research. “Bioinformatics and biostatistical programmers know how to handle large data sets.”

Taylor said that a secondary goal was for the ADSC faculty to conduct their own research into their specialties.

“Applying data sciences is foundational for advancing our field, and so, we can’t wait to collaborate with the new core,” said Darrell Kotton, MD, the David C. Seldin Professor of Medicine and director of the BU/BMC Center for Regenerative Medicine (CReM).

The new service complements BU President Robert Brown’s vision of the University becoming a leader in the data science field, realized in the ongoing construction of a new Center for Computing & Data Sciences on Commonwealth Avenue.

“There’s a lot of excitement at BU for data science,” said Nelson Lau, PhD, associate professor of biochemistry and director of BU’s Genome Sciences Institute (GSI).

“In the biomedicine field we are awash in genomic and sequencing data. Even labs that never managed data before find they need to in order to stay current and competitive,” said Lau.

“Anyone doing animal or disease research requires genetic sequencing that can involve gigabytes, even terabytes, of data to map diseases back to the genomes and compare healthy to diseased to get to even a fundamental understanding of what you are working with,” Lau said.

He is already working with Zhang on a project in which they analyze RNA data sequenced from mosquitos collected at field stations in Connecticut to see what other viruses, besides the commonly known ones like West Nile and Eastern Equine Encephalitis, might be present.

While Lau is trained in bioinformatics and his lab already has two bioinformatics specialists, he said he can’t help everyone who needs this specialized data analysis.

“Many PI’s (principal investigators) like myself have to take care of our own research first,” said Lau, who worked with Taylor and Antman with a goal of democratizing access.

“Even with centers as successful as the CReM and GSI where we have our own bioinformatics groups, it would be great to see more labs at BUSM have access to this type of analysis too,” said Lau.

Before this new service, BUSM researchers without direct access to data analysis specialists either found another researcher to help them or contracted with an outside company.

“Instead of sending (their data) to an outside company, we have the opportunity to interact with them to get a better result,” said Zhang, who has a doctorate in computer science and a masters’ degree in statistics.

“Typically, for-profit companies just do basic analysis for everyone. We are researchers and we will try to help them interpret and understand the data, the deeper background of the whole story,” said Zhang. “The ultimate goal is to use technology on large scale data to solve those complicated questions.”

His specialty is computational biomedicine, the application of computer methodology to help in the diagnosis and treatment of disease. Zhang said one of the applications could be genomics analysis of a tumor sample to find a better treatment specific to that patient.

“A lot of people do data science projects that are data-heavy,” said Vasan Ramachandran, MD, professor of medicine and epidemiology at BUSM and SPH, and principal investigator and director of the Framingham Heart Study (FHS). He compared the new data center to a community pool where a lot of people could swim without the expense of building individual pools.

“Having a centralized data core would cater to a larger group of researchers where they could find one-stop expertise. The ADSC would be able to work together with researchers to really meet what has heretofore been an unmet need,” said Ramachandran, who believed there may be as many unfunded as there are funded research projects underway at BUSM.

Being able to do data analysis in-house may also help in getting grants, he said. “They assume you have the infrastructure.”

“I think it will be helpful in applying for grants,” said Lau. “There’s just a higher chance of (the project) succeeding. Instead of begging for help from another institution, you can just go down the hall and talk to someone on the same team. It’s a more personalized experience.”

While it is not yet in the ADSC toolkit, Ramachandran looks forward to computerized data analysis that can combine data from heterogenous sources. For example, Ramachandran’s RURAL Cohort Study is evaluating the underlying risks in rural areas.

“Why are people in rural areas challenged in terms of their health?” he asked. The answer may lie in a combination of reasons that involve varied data sources like genetics, diet, medical and health records, that can be stored in entirely different formats.

“The underlying architecture of the data is different befitting the multi-dimensional nature of the disease,” Ramachandran said. The information is siloed but understanding risk and building health takes a multi-faceted approach and there isn’t one expert who can coordinate the data, he said.

“The ASDC Core hopes to provide a resource with complementary expertise in different domains,” said Ramachandran.

Taylor, who oversees the running of BUSM cores, said the ASDC will be initially funded by Dean’s Office.

“As time goes by it should be supported by subcontracts and multiple principal investigator grants,” said Taylor, with the possibility of additional funding coming through data science instruction and student training.