John Farrell, Ph.D. – IT Manager
John Farrell has lead the research computing efforts for the Biomedical Genetics research group since 2000. He has an extensive background in bioinformatics, data management, software development, and systems analysis. During this period he pioneered the use of HPCC and Hadoop clusters for genomic analysis and deployed one of the first HPCC for genomic analysis which was ranked in the Top 500 for two consecutive years.
He has led the development of biomedical databases and bioinformatics pipelines for GWAS and NGS for multiple projects. This includes expertise in the development of pipelines for the alignment, genotyping, annotation, and analysis of GWAS, whole-genome, and exome sequencing projects in Alzheimer’s disease, addiction, age-related macular degeneration, idiopathic membranous nephropathy, sickle cell disease, auto-immune diseases, and adverse drug reactions.
His recent interest includes the development of a novel pipeline to accurately detect and genotype precise structural variants for 5,000 whole genome-sequencing samples, 10,000 whole-exome sequencing samples for the NIH Alzheimer Disease Sequencing Project (ADSP). His work includes developing highly scalable solutions using Spark and Hadoop for the processing of large volumes of genomic data.
Working with researchers at the BU Center of Excellence in Sickle Cell Disease, his work employed a novel approach using both 1000 Genomes data and ENCODE Project data to discover the first functional variant (3-bp deletion) related to fetal hemoglobin expression and other blood-related phenotypes. For the discovery of the molecular basis of autoimmune disease, major infectious diseases, and adverse drug reactions, he developed and validated two bioinformatics methods to predict HLA genotypes from next-generation sequencing data and genome scan data.
For the substance abuse research collaboration between BU and other research groups, he led the development of a 200-page computerized multi-language survey for a genetics of multi-center addiction study using the Python/Django framework. He also developed a web-based data collection system for Alzheimer’s disease research projects. His expertise includes bioinformatics tools such as WDL, BWA, GATK, and multiple structural variant detection tools such as Strelka, Manta, Scalpel, Lumpy, Delly, and GenomeSTRiP.
His recent interests include the design, testing, and deployment of bioinformatics software tools in Python, Spark, and Hail for the BU Share Compute Hadoop Cluster for analyzing Big Data such as the UK Biobank data.
-Ph.D. Bioinformatics, Boston University, Boston, MA
-B.A. Biology, College of the Holy Cross, Worcester, MA
Service and Professional Memberships
-Boston University High-Performance Compute Cluster Working Group
-Alzheimer Disease Sequencing Project -Member of QC, Structural Variant, Case/Control Analysis Work Groups
-Genome in a Bottle Consortium Structural Variant Workgroup
-American Society of Human Genetics
Areas of Research Interest and Selected Publications
A complete list of his bibliography can be found here.
Data Collection and Management
I have led the successful design, development and deployment of several large-scale multicenter data collection systems to support multiple research projects throughout the world. These include Google Cloud-based data collection systems for two national multi-center pediatric cancer treatment studies in Saudi Arabia. For our addiction studies, I led the development of a 200 page computerized questionnaire (3000 data items) which has now collected over 16,000 interviews during the last 19 years.
For the National Birth Defects Prevention Study, I led a team for the implementation of the data collections system and tools that has collected and analyzed over 30,000 interviews for 20 years used by 8 CDC Centers of Excellence. For a BU-McNeil clinical trial of the safety of children’s ibuprophen, I led the development the data collection system (80,000 subjects). For the multi-center Alzheimer Disease project MIRAGE, I developed a web based data collection systems that collected demographic, clinical history, memory testing, sample tracking for the MIRAGE Alzheimer Disease study.
High-Performance Compute and Hadoop Clusters
Working with IBM in 2001, I led our deployment of one of the first Linux High Performance Linux Clusters used for Genetic Research. The HPCC was ranked in the Top 500 in the world for two consecutive years.
To provide a campus wide resource for analyzing Big Data, Boston University funded and deployed my proposal for the BU Shared Hadoop Cluster was based on a Biomedical Genetics pilot cluster. With this computer resource, I have led the development of Biomedical Genetics QC and analysis pipelines using Spark and Hail to scale to analyze UK Biobank sized datasets.
1. Genetics of Immune-Related Diseases
The HLA regions is associated with 100s of diseases involving the immune system and neurological diseases. To advanced research in this area, I developed tools to impute and call genotypes from GWAS data and next generation sequencing data. As a result of this work, we were able pinpoint the HLA allele associated with adverse reactions to carbamazepine. We are now using these tools to uncover the role that HLA may have in Alzheimer Disease within the ADSP and ADGC datasets.
a. McCormack, M., McCormack M1, Alfirevic A, Bourgeois S, Farrell JJ, Kasperavičiūtė D, Carrington M, Sills GJ, Marson T, Jia X, de Bakker PI, Chinthapalli K, Molokhia M, Johnson MR, O’Connor GD, Chaila E, Alhusaini S, Shianna KV, Radtke RA, Heinzen EL, Walley N, Pandolfo M, Pichler W, Park BK, Depondt C, Sisodiya SM, Goldstein DB, Deloukas P, Delanty N, Cavalleri GL, Pirmohamed M. (2011). HLA-A*3101 and carbamazepine-induced hypersensitivity reactions in Europeans. N Engl J Med 364(12): 1134-1143. PMC3113609.
2. Genetics of Alzheimer’s Disease (AD)
I have led the development of the web data collection systems, data management and bioinformatics pipelines for the Alzheimer research for the Biomedical Genetics Program. My work includes structural variant pipeline development for the ADSP. I am co-author in multiple papers in this field.
a. Bis et al (2018) Whole exome sequencing study identifies novel rare and common Alzheimer’s-Associated variants involved in immune response and transcriptional regulation. Mol Psychiatry.
b. Xiaoling Zhang et al, A rare missense variant of CASP7 is associated with familial late-onset Alzheimer’s disease. Alzheimer’s and Dementia, November 2018
c. Logue MW et al (2018) Targeted Sequencing of Alzheimer Disease Genes in African Americans Implicates Novel Risk Variants. Front Neurosci. 2018 Aug 27;12:592, PMC6119822.
d. Logue M.W, et al. (2014). “Two rare AKAP9 variants are associated with Alzheimer’s disease in African Americans”. Alzheimers Dement. 10(6):609-618. PMC4253055.
3. Genetics of Thalassemia
I have been contributing to the research in the genetics of Thalassemia for over a dozen years. A focus of research in this area is to find variants that influence fetal hemoglobin for which higher levels are known to ameliorate symptoms of thalassemia. I developed a novel approach integrating data from 1000 Genomes Project, Encode Project, GWAS association results, and Transcription Factor Binding Sites databases to pinpoint an intragenic functional variant influencing fetal hemoglobin. I am a co-author on multiple papers in this field.
a. Farrell, J. J., et al. (2011). “A 3-bp deletion in the HBS1L-MYB intergenic region on chromosome 6q23 is associated with HbF expression.” Blood 117(18): 4935-4945. PMC3100700.
b. Morrison and al (2018), A long noncoding RNA from the HBS1L-MYB intergenic region on chr6q23 regulates human fetal hemoglobin expression”, Blood Cells, Molecules, and Diseases 69:1-9. PMC5783741.
c. Jiang, Z., Luo, H.-y., Huang, S., Farrell, J. J., Davis, L., Théberge, R., Benson, K. A., Riolueang,S., Viprakasit, V., Al-Allawi, N. A.S., Ünal, S., Gümrük, F., Akar, N., Başak, A. N., Osorio, L., Badens, C., Pissard, S., Joly, P., Campbell, A. D., Gallagher, P. G., Steinberg, M. H., Forget, B.G. and Chui, D. H.K. (2016), The genetic basis of asymptomatic codon 8 frame-shift (HBB:c25_26delAA) β0-thalassaemia homozygotes. British Journal of Haematology, 172: 958–965.
d. Gibney, G. T., et al. (2008). “Variation and heritability of Hb F and F-cells among beta-thalassemia heterozygotes in Hong Kong.” Am J Hematol 83(6): 458-464.
4. Genetics of Sickle Cell Disease
Using GWAS and WGS, our research group has completed extensive research in African-American and Saudi Arabians populations with sickle cell disease to help uncover the functional variants that influence fetal hemoglobin expression and ameliorate the symptoms of sickle cell disease. Among my co-authored papers are the following:
a. Shaikho et al, Variants of ZBTB7A (LRF) and its β-globin gene cluster binding motifs in sickle cell anemia.Blood Cells Mol Dis. 2016 Jul;59:49-51
b. Elmutaz M. Shaikho et al (2017) A phased SNP-based classification of sickle cell anemia HBB haplotypes, J BMC Genomics 201718:608, PMC5553663.
c. P. Sebastiani, J.J. Farrell, A. Alsultan, S. Wang, H.L. Edward, H. Shappell, H. Bae, J.N. Milton, C.T. Baldwin, A.M. Al-Rubaish, Z. Naserullah, F. Al-Muhanna, A. Alsuliman, P.K. Patra, L.A.Farrer, D. Ngo, V. Vathipadiekal, D.H.K. Chui, A.K. Al-Ali, M.H. Steinberg, “BCL11A enhancer haplotypes and fetal hemoglobin in sickle cell anemia”, Blood Cells, Molecules, and Diseases Volume 54, Issue 3, March 2015, Pages 224–230
d. Alsultan et al, Fetal hemoglobin in sickle cell anemia: Saudi patients from the Southwestern province have similar HBB haplotypes but higher HbF levels than African Americans, American Journal of Hematology, 21 March 2011
5. Structural Variants
a) Zook, J.M., Hansen, N.F., Olson, N.D. et al. A robust benchmark for detection of germline large deletions and insertions. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0538-8
Other Selected Publications H-Index
-Mark McCormack, B.A., Ana Alfirevic, M.D., Ph.D., Stephane Bourgeois, Ph.D., John J. Farrell, M.S., Dalia Kasperavičiūtė, Ph.D., Mary Carrington, Ph.D., Graeme J. Sills, Ph.D., Tony Marson, M.B., Ch.B., M.D., Xiaoming Jia, M.Eng., Paul I.W. de Bakker, Ph.D., Krishna Chinthapalli, M.B., B.S., Mariam Molokhia, M.B., Ch.B., Ph.D., Michael R. Johnson, D.Phil., Gerard D. O’Connor, M.R.C.P.I., Elijah Chaila, M.R.C.P.I., Saud Alhusaini, M.B., Kevin V. Shianna, Ph.D., Rodney A. Radtke, M.D., Erin L. Heinzen, Ph.D., Nicole Walley, B.S., Massimo Pandolfo, M.D., Ph.D., Werner Pichler, M.D., B. Kevin Park, Ph.D., Chantal Depondt, M.D., Ph.D., Sanjay M. Sisodiya, M.D., Ph.D., David B. Goldstein, Ph.D., Panos Deloukas, Ph.D., Norman Delanty, B.M., Gianpiero L. Cavalleri, Ph.D., and Munir Pirmohamed, Ph.D., F.R.C.P. HLA-A*3101 and Carbamazepine-Induced Hypersensitivity Reactions in Europeans, N Engl J Med 2011; 364:1134-1143, March 24, 2011
-John J. Farrell, Richard M. Sherva, Zhi-yi Chen, Hong-yuan Luo, Benjamin F. Chu, Shau Yin Ha, Chi Kong Li, Anselm C.W. Lee, Rever C.H. Li, Chi Keung Li, Hui Leung Yuen, Jason C.C. So, Edmond S.K. Ma, Li Chong Chan, Vivian Chan, Paola Sebastiani, Lindsay A. Farrer, Clinton T. Baldwin, Martin H. Steinberg, and David H.K. Chui. A 3-bp deletion in the HBS1L-MYB intergenic region on chromosome 6q23 is associated with HbF expression, Blood First Edition Paper, prepublished online March 8, 2011; DOI 10.1182/blood-2010-11-317081[Abstract]
-Gibney GT, Panhuysen CI, So JC, Ma ES, Ha SY, Li CK, Lee AC, Li CK, Yuen HL, Lau YL, Johnson DM, Farrell JJ, Bisbee AB, Farrer LA, Steinberg MH, Chan LC, Chui DH. Variation and heritability of Hb F and F-cells among beta-thalassemia heterozygotes in Hong Kong. Am J Hematol. 2008 Jun;83(6):458-64.
-Ma Q, Wyszynski DF, Farrell JJ, Kutlar A, Farrer LA, Baldwin CT, Steinberg MH. Fetal hemoglobin in sickle cell anemia: genetic determinants of response to hydroxyurea. Pharmacogenomics J. 2007 Dec;7(6):386-94. Epub 2007 Feb 13.
-Adewoye AH, Nolan VG, Ma Q, Baldwin C, Wyszynski DF, Farrell JJ, Farrer LA, Steinberg MH. Association of polymorphisms of IGF1R and genes in the transforming growth factor-beta /bone morphogenetic protein pathway with bacteremia in sickle cell anemia. Clin Infect Dis. 2006 Sep 1;43(5):593-8. Epub 2006 Jul 14.
-Nolan VG, Adewoye A, Baldwin C, Wang L, Ma Q, Wyszynski DF, Farrell JJ, Sebastiani P, Farrer LA, Steinberg MH. Sickle cell leg ulcers: associations with haemolysis and SNPs in Klotho, TEK and genes of the TGF-beta/BMP pathway. Br J Haematol. 2006 Jun;133(5):570-8.
-Nolan VG, Baldwin C, Ma Q, Wyszynski DF, Amirault Y, Farrell JJ, Bisbee A, Embury SH, Farrer LA, Steinberg MH. Association of single nucleotide polymorphisms in klotho with priapism in sickle cell anaemia. Br J Haematol. 2005 Jan;128(2):266-72.
-Yip A, Ma Q, Wilcox M, Panhuysen CI, Farrell J, Farrer LA, Wyszynski DF. Search for genetic factors predisposing to atherogenic dyslipidemia. BMC Genetics 2003, 4(Suppl 1):S100 (1 December 2003)
-Wilcox MA, Wyszynski DF, Panhuysen CI, Ma Q, Yip A, Farrell J, Farrer LA. Empirically derived phenotypic subgroups – qualitative and quantitative trait analyses. BMC Genetics 2003, 4(Suppl 1):S15 (1 December 2003)
-Web-Based Medication Reference Database for Researchers, J. J. Farrell, AMIA 1996 Annual Symposium.
-The Drug Etiology of Agranulocytosis and Aplastic Anemia Kaufman, Kelly, Levy, Shapiro, and Study Group, 1991.
-A Data Collection And Coding System On A Portable Computer For Epidemiological Studies. , J. J. Farrell, 11th Symposium on Computer Applications in Medical Care, 1987.
-Identification Of Antibiotic-Inactivating Enzymes By Stepwise Discriminant Analysis of Susceptibility Test Results, R. L. Kent, T. F. O’Brien, A. A. Medieros, J. J. Farrell, and M.A. Guzman, Current Chemotherapy and Infectious Disease, Proceedings of the 11th ICC and 19TH ICAAC, American Society of Microbiology, 1980.