To meet the computational needs of the Genetics Program research, three Linux Clusters are available to researchers. These clusters have been customized with an extensive library of open source and licensed software used in a wide range of statistical analyses for genetic and bioinformatics analyses: R, SAS, BioConductor, MATLAB, GSEA, Merlin, plink, perl and MACH 1.0.
The attached database server is running the MySQL 5.1 which stores a number of commonly used reference data sets used by the various research groups as well as study specific data and results. Both clusters are placed behind a secure firewall and access is only granted from the BU Medical campus network.
LinGA-II purchased in 2007 (funded by a Boston University Department of Medicine Infrastructure grant) is a High Performance Computing Beowulf class Linux Cluster. It is comprised of compute nodes, management nodes and storage nodes. There are 56 server blade compute nodes. Each of them has two dual-core 2.6 GHz AMD Opteron CPUs and 12 GB of RAM. This brings the total raw computing capacity to 224 Cores/CPUs and 560 GB of RAM. Server blades are distributed over four chassis and communicate via dual port GigE connections. Each blade chassis has two 10 Gig uplink Ethernet switch modules. The cluster has two management nodes and a backup/proxy server. Each of them has two dual-core 2.6 GHz AMD Opteron CPUs and 8 GB of RAM. 40 TB of redundant disk storage is available to this cluster via a high performance GPFS (General Parallel Files System) system comprised of two storage servers and Fiber Channel RAID array(s). Besides the above mentioned traditional HPC cluster components, a database server with high performance dual controller SAS RAID storage has been added to the cluster. This will ensure that all types of data, be it either flat files or database indexed, reside as close as possible to compute nodes. Furthermore all blade chasses, management nodes, storage nodes and database server are connected via a 10Gig Ethernet switch. This cluster runs Rocks 4.3 Linux Cluster Distribution from San Diego Supercomputer Center.
LinGA-I purchased in 2004 (funded by NIH NCRR Shared Instrumentation grant 1S10RR163736-01A1) is is an IBM compute cluster configured with a head node and 134 compute nodes. Each compute node contains two 2.8GHz Intel Xeon processors; 110 of the compute nodes have 1GB of memory and 24 have 2GB of memory. Each compute node has a 40GB hard drive. This cluster has access to the high performance GPFS storage on LinGA-II via CNFS protocol (Clustered NFS) and to the database server. The head node controls access to the compute nodes from the Boston University Medical Center (BUMC) intranet. Currently LinGA is running Rocks 5.0 Linux Cluster Distribution from San Diego Supercomputer Center.
Buwulf-A 24-node Linux Beowulf Cluster is the primary platform used for statistical analysis. Each compute node contains dual Pentium III 1.2 Ghz processors with 512 MB of memory. The IBM configured system contains 48 processors and permits the parallel computation of biostatistics and analysis. For storage, a RAID-5 system is used to prevent loss of data in case of disk failure. The system is connected to a free standing UPS system to provide backup power for up to 20 minutes. This interval provides enough time for the server rooms backup generator to turn on.
Boston University Medical Campus Information Technology provides an automated, centralized data backup service (via CommVault) to protect server file storage systems. Backups are completed each night by the Office of Information Technology with the CommVault backup system and data archived off-site weekly. All backups are staged from the backup client machine to the primary backup destination on a high-capacity disk array located onsite in a secured server room. These data are then duplicated to tape media (Ultrium LTO-4). All backup data are encrypted over the network and on all target media using a keyed symmetric block cipher. Secondary tape copies that contain full backup data are sent to a professional offsite tape vault.
The web server hardware is a IBM X345 dual-Pentium Xeon 2.4GHz processor with 2 GB of memory running Red Hat Linux Advance Server with the Apache and Tomcat web software.
The database server is an IBM X345 dual-Pentium Xeon 2.4 GHz processor with 4GB of memory. This system is now running Oracle 9i Release 2. The OS is Red Hat Advance Server which is optimized for the Oracle Database software.
|Web Server||IBM X345 dual Pentium IV Xeon 2.4 ghz with 2 GB of Memory||30 GB (RAID-1)|
|Database Server||IBM X345 dual Pentium IV Xeon 2.4 ghz with 4 GB of Memory (Linux Advanced Server)||180 GB (RAID-10)|
|Linux Cluster||Linux Cluster consisting of 24 dual Pentium III compute nodes, 1 management node and 1 storage node||200 GB (RAID-5)|
|LIMS (Molecular Genetics Core Facility)||SUN Enterprise 450 (4 GB Memory) Solaris 4-Processor System||500 GB|
|Genetics Domain||PDC and Backup Domain Controller|
|Network Storage||NetApp FAS250||600 GB|
|DLT Robotic Tape Backup||9 Tape Robotic Tape System||Backs up up to 720 GB of Storage (compressed).|
|Network of Windows Workstations||40 Workstations (Mixture of Pentium IV and Pentium IIIs systems running Windows 2000 and XP)||4 to 80 GB per workstation|
|Software Category||Software Package|
|Backup Software||Veritas NetBackup Business Server|
|Database Server||Oracle 9i, Sybase 10, SQL Server 2000|
|Linux Cluster||PBS (portable batch system), X-CAT|
LIMS (Laboratory Information Management System)
|Applied Biosystems SQL/LIMS, WebLIMS|
|Statistical Software||SAS, SPSS, S-Plus, Stata, nQuery, CART|
|Genetic Linkage Analysis Software||Fastlink, Genehunter, Genehunter Plus, Solar, SAGE, Vittesse, Sib-Pair, GAS, Merlin, PedCheck, Simwalk|
|Web Server||Apache, Tomcat|
|OCR Scanning Software||Teleforms|