Computational Resources

The Biomedical Genetics Researchers have available for their compute and data intensive research the Boston University Shared Computing Cluster (SCC). The SCC is located in Holyoke, MA, site of the LEED Platinum certified Massachusetts Green High Performance Computing Center (MGHPCC) where energy is plentiful, clean, and inexpensive. Two pairs of 10Gigabit Ethernet network connections between the MGHPCC and the BU campus provide extremely fast data transfer between the two locations. The SCC is a heterogeneous Linux cluster composed of both shared and buy-incomponents. The system currently includes over 2600 shared processors, over 5100 buy-in processors, a combined 244 GPUs, and over two petabytes of storage (approximately 75% of this is Buy-in storage) for research data. The SCC is suitable for high-performance computing for both compute and storage intensive analyses required for bioinformatics and genomics research. An detail summary of the SCC resource can be found here. Research Computing installs and maintains an extensive set of bioinformatics, genomic and statistical modules that have been installed for supporting an extensive range of processing and analyses.

In addition for scalable data intensive research for the processing of large sequencing and variant files, a Shared Hadoop Cluster is also available that runs Apache Spark. Apache Spark is a fast and general-purpose cluster computing system that has high-level APIs in Java, Scala, Python and R. Many higher level tools are available to scale analyses: Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and SparkR for statistical analyses. The redesigned-for-Spark GATK 4 software will soon be available for performing variant discovery analysis in high-throughput sequencing (HTS) data to take advantage of this Spark distributed computing framework to speed up robust pipelines for genomic research.