Solaiappan Manimaran (Mani)
B.S., Mathematics, Indian Institute of Technology, Kharagpur, India, 1995
M.E., Computer Science & Engineering, Indian Institute of Science, Bangalore, India, 1999
M.B.A., Finance & Marketing,University of Connecticut, Connecticut, 2012
M.A. Biostatistics, Boston University, Boston, Massachusetts, 2014
Email: mani2012 at bu.edu
Machine Learning and Statistical Analysis of Next Generation Sequencing data for cancer biomarkers and microbiome analysis of pathogen proportions in clinical mixed sample data towards improved diagnostics and personalized medicine.
Pathoscope 2.0: Statistical and computational methods for accurate characterization of microbes in sequencing samples
The rapid identification and quantification of pathogens present in a clinical sample is of high importance in controlling contagious diseases during an outbreak. For example, during the European E.coli outbreak of 2011, there was a 3 week delay in the correct identification of the pathogen strain O104:H4 which caused 3,800 infections and 54 deaths. Pathoscope 2.0 is a complete software package for rapidly identifying and quantifying the microbial strains present in environmental or clinical sequencing samples. Pathoscope utilizes a Bayesian statistical methodology based on a penalized mixture modeling approach to accurately identify and quantify the pathogens. We also present a confidence region for the identified pathogens so that accurate diagnosis and the best possible treatment can be provided. We simulated sequencing reads from 25 strains of bacteria that are commonly found in humans. Our method was able to accurately identify and quantify the pathogen strain both in pure samples with single strain and in mixture samples with multiple strains of bacteria. Our method performed well even with low read coverage and in samples with multiple closely related strains.
PathoScope is available for download at the following link:
BatchQC: interactive software for evaluating sample and batch effects in genomic data
Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. BatchQC can also apply existing adjustment tools and allow users to evaluate their benefits interactively. We used the BatchQC pipeline on both simulated and real data to demonstrate the effectiveness of this software toolkit.
PathoStat: A Comprehensive Toolkit for Microbiome Variation Analysis
The microbiome varies significantly between different environments. Studying the variation of microbiomes under different conditions within an organism or environment is the key to diagnosing diseases and providing personalized treatments. We have developed a software module called PathoStat for statistical quantification of metagenomics samples with metrics such as alpha and beta diversity to characterize the microbial variation across different conditions of interest. PathoStat is developed as a Shiny app R-package with an interactive GUI for easy navigation by the user. It is rich with visualization plots such as the relative abundance stacked bar plot to display the composition and diversity of the samples both within and across covariates of interest, Heatmaps, Alpha and Beta diversity plots, Principal Component Analysis (PCA), Principal Co-ordinate Analysis (PCoA) plots and multi-dimensional Confidence Region plots. PathoStat also has a module dedicated for performing differential expression analysis using standard statistical methods such as Limma. We have used PathoStat on a diet study to characterize the microbial differences across diets from a RNA-Seq data collected from the fecal matter of subjects.
Manimaran, Selby, Okrah, Ruberman, Leek, Quackenbush, Kains, Bravo, Johnson “BatchQC: Interactive software for evaluating sample and batch effects in genomic data” (2016: Oxford Bioinformatics, Applications Note)http://bioinformatics.oxfordjournals.org/content/early/2016/08/30/bioinformatics.btw538
Hong*, Manimaran*, Shen, Perez-Rogers, Byrd, Castro-Nallar, Crandall, Johnson, “PathoScope 2.0: a complete computational framework for strain identification in environmental or clinical sequencing samples” (2014: Microbiome 2 (1), 1-15) *Co-First author http://www.microbiomejournal.com/content/2/1/33
Francis, Bendall, Manimaran, Hong, Clement, Castro-Nallar, Snell, Schaalje, Clement, Crandall, and Johnson, “Pathoscope: Species identification and strain attribution with unassembled sequencing data” (2013: Genome research 23 (10), 1721-1729) http://genome.cshlp.org/content/23/10/1721
Byrd, Perez-Rogers, Manimaran, Castro-Nallar, Toma, McCaffrey, Siegel, Benson, Crandall, Johnson, “Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data” (2014: BMC bioinformatics 15 (1), 262)
Hong, Manimaran, Johnson, “PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets” (2015: Cancer Informatics 2014:Suppl. 1 167-176) http://la-press.com/article.php?article_id=4828
Castro-Nallar, Shen, Freishtat, Perez-Losada, Manimaran, Liu, Johnson and Crandall, “Integrating microbial and host transcriptomics to characterize asthma-associated microbial communities” (2015: BMC Medical Genomics) http://www.biomedcentral.com/1755-8794/8/50