Computational Genomic Models of Environmental & Chemical Carcinogenicity

INITIATION DATE:

01.01.2013

ARC DIRECTORS AND CO-DIRECTORS:

Stefano Monti, PhD, Professor of Computational Biology and Genomics

David Sherr, PhD, Professor of Environmental Health; Professor of Pathology and Laboratory Medicine

OVERVIEW OF GOALS AND MISSION:

The goal of the project is the development of accurate and cost-effective methods for the identification of threats to our health from exposure to chemical and environmental carcinogens. We propose the development of genomic models of carcinogenicity for cancer prevention and tailored treatment, based on a highly innovative approach integrating computational models and high-throughput, low-cost gene expression-based in-vitro assays. The proposed approach will allow us to predict with unprecedented speed the carcinogenic potential of individual or complex mixtures of chemical pollutants and/or therapeutics, with potential far-reaching implications for preventive medicine and patients’ treatment stratification. Study outcomes will also facilitate prediction of the molecular targets and mechanisms of action of chemicals and therapeutics, and will provide for a practical and efficient method of exposure and risk assessment. Work performed in the first year strongly support the validity of both the technology used to rapidly generate thousands of genomic signatures and the computational models required to process dense datasets. This unique platform will be further exploited to define genomic signatures associated with environmental chemical-induced carcinogenesis in liver, lung, and breast tissue and to generate signatures associated with environmental chemical-induced tumor metastasis and “stem-like cell” induction. A more generalized application of the technology will involve the identification of transcriptional changes associated with exposure to environmental obesogens.

ARC Members:

Name of ARC member	Departmental and School Affiliation	Core Faculty ? *(denote with )**
Stefano Monti (dir)	Computational Biomedicine/Medicine	*
David Sherr (dir)	School of Public Health	*
George Murphy	CReM/School of Medicine	*
Susan Jick	School of Public Health	*
Jennifer Schlezinger	School of Public Health	*
Avi Spira	Computational Biomedicine/Medicine	*
Catalina Perdomo	Computational Biomedicine/Medicine	*
Marc Lenburg	Computational Biomedicine/Medicine
David Center	CTSA/Medicine
Paola Sebastiani	Biostatistics/School of Public Health
Evan Johnson	Computational Biomedicine/Medicine
Maria Kukuruzinska	Molecular and Cell Biology/GSDM

Research Strategy:

The goal of the project is the development of accurate and cost-effective novel methods for the identification of threats to our health from exposure to chemical and environmental carcinogens. We propose the development of genomic models of carcinogenicity for cancer prevention and tailored treatment, based on a highly innovative approach integrating computational models and high-throughput, low-cost gene expression-based in-vitro assays. The proposed approach will allow us to predict with unprecedented speed the carcinogenic potential of individual or complex mixtures of chemical pollutants and/or therapeutics, with potentially far-reaching implications for preventive medicine and patients’ treatment stratification. Study outcomes will also facilitate prediction of the molecular targets and the mechanisms of action of chemical compounds and therapeutics, and will provide for a practical and efficient method of exposure and risk assessment. This unique technology will also be exploited to define environmental chemical effects on the breast cancer transcriptome associated with tumor metastasis and on adipogenesis programs in human bone marrow.

Our driving hypothesis — supported by preliminary data — is that the biological effects of a chemical compound, or a mixture of compounds, can be inferred from the expression profile of cell lines treated with that compound or mixture.

Our approach crucially relies on a novel high-throughput, low-cost gene expression assay measuring ~1,000 transcripts/sample (Luminex-1000), which will allow for the efficient profiling of a set of minimally immortalized normal human cell lines treated with 1000s of compounds with known carcinogenic status. Computational analysis will be used to evaluate the generated exposure database and to develop predictive models of biologic activity, especially carcinogenicicty, and exposure (Figure 1). In a preliminary study we used a rat-based expression dataset, the DrugMatrix, available through the National Institute of Environment Health Sciences (NIEHS), to evaluate the applicability of toxicogenomics to model carcinogenicity¹. We applied advanced machine learning approaches to build a classifier that predicts (accuracy up to 78%) the carcinogenic potential of chemical compounds within the dataset. Of note, these results were obtained based on a training set comprising 110 compounds only, and our simulations convincingly showed that predictive accuracy would be greatly improved by increasing the number of compounds profiled, a goal that can be easily achieved based on our high-throughput platform. Indeed, using the BU high throughput core, we have now generated preliminary data using 45 known carcinogens, 45 non-carcinogens, human mammary breast cancer cells as chemical targets, and the L1000 platform to generate genomic outcomes. While the data are still being analyzed, they strongly support the feasibility of our platform and our central hypothesis.

Our proposal finds motivation in the persistence of high cancer incidence (~41%) and mortality (~21%), and in the substantial body of evidence suggesting a minor contribution of inherited genetic factors, and a significant if not dominant role of environmental chemicals in causing sporadic cancer^2–4. It is telling that one of the major determinants of the drop in overall cancer death of recent years has been the decline in smoking-related lung cancers, mainly brought about by preventive measures informed by the study of carcinogens present in tobacco smoke⁵. Given the burgeoning number of chemicals present in our environment (over 80,000 registered by the EPA and totaling in the billions of pounds), our studies represent a much-needed, practical approach to determining, before they go on the market, if commercial products or mixtures thereof are carcinogenic or otherwise biologically active. Our studies will also be applicable to the assessment of new therapeutics and food additives.

The significance of the project lies in the innovative integration of cutting-edge in-vitro technologies and in-silico methods to address the recognized need for the development of “high-throughput screening technologies and related data interpretation models” to advance “precautionary, prevention-oriented” approaches to the study of chemical carcinogenesis (Presidential Cancer Panel 2010)⁶.

The innovation of the proposal lies in the use of a high-dimensional assay (L1000)^7,8, that allows for the genome-wide characterization of the transcriptional response to chemical exposure at a cost unmatched by other technologies (~$8/sample, instead of ~$400), paired with an innovative use of minimally immortalized cell lines, and computational genomic approaches for the analysis and interpretation of the screens’ results. The high throughput format makes it possible to begin to evaluate the effects of environmental chemical mixtures, a primary area of concern to the NIH.

The deliverables will include: 1) a well-annotated Carcinogenome DB (CGDB), comprising 100,000s of expression profiles of multiple cell types treated with 1,000s of compounds and mixtures at multiple doses and time points; 2) carefully validated prediction models for the accurate assessment of the carcinogenicity of as-yet uncharacterized compounds and mixtures; 3) well-annotated signatures of carcinogenicity (SoC) and associated pathways playing a role in carcinogenesis; and 4) the high-throughput screening and computational infrastructure for the large-scale processing and characterization of new compounds and mixtures.

The project brings together an interdisciplinary team of investigators, integrating computational cancer genomics (Monti), and molecular biology and environmental science (Sherr), toward the advancement of public health and preventive medicine. The project also engages investigators outside BU, by tapping into the high-throughput screening infrastructure at the Broad Institute and by involving NIEHS and National Toxicology Program (NTP) scientists in the selection and annotation of the chemical compounds to be included in the study. Finally, this initiative will serve as an umbrella project to foster and catalyze synergistic interactions among multiple investigators, as we briefly outline below.

Synergies

iPSC-based screening. In collaboration with George Murphy, we will investigate the use of human iPSCs to evaluate gene/environment interactions in a high throughput format, toward the development of personalized profiles of exposure. The use of iPSCs will also allow us to model the effects of early exposure, and to study the different developmental windows of susceptibility to contaminants and drug exposure.

Smoking and Lung Cancer. In collaboration with the Spira lab, we will profile lung cell lines exposed to tobacco smoke components to fully characterize their signatures and to compare them to those of other exposure types. Additionally, the CGDB will be queried to assess the similarity between externally defined signatures of exposure to tobacco alternatives (such as E-cigarettes, nicotine lozanges, etc.) and our validated signatures of carcinogenicity and genotoxicity. Externally defined signatures from bronchial epithelial cells at different stages of differentiation exposed to whole tobacco or its constituents will similarly be used to query the CGDB.

Drug efficacy and repurposing. In collaboration with David Center, we will utilize our high-throughput screening capabilities for the generation and profiling of cell lines exposed to therapeutics. The profiles generated, integrated with the Carcinogenome DB, will allow us to characterize the response signatures of the drugs profiled, to evaluate their carcinogenic potential, as well as their capacity to “revert” carcinogenicity signatures.

Environmental Obesogens and Cancer. Another key contributor to our ARC, Jennifer Schlezinger (BU SPH), will assess the ability of environmental obesogens to alter genomic signatures of human bone marrow mesenchymal stem cells. Working with the National Toxicology Program within the NIEHS and their Tox21 initiative, Dr. Schlezinger has demonstrated that bipotential bone marrow mesenchymal stem cells are skewed towards adipogenesis at the expense of osteogenesis following exposure to common environmental chemicals including potential carcinogens. This shift compromises the development of hematopoietic cells which depend on osteoblast-derived stromal cells for growth and differentiation support. Here, she will exploit the L1000 platform to assess the spectrum of transcriptional effects of these chemicals on human bone-derived stromal cells with the goal of identifying genomic signatures of environmental obesogens. Particular emphasis will be placed on environmental PPAR chemicals that also show carcinogenic potential.

Breast Cancer Exposome. Dr. Sherr’s laboratory has demonstrated that several classes of environmental carcinogens, including dioxins, PCBs and aromatic hydrocarbons associated with breast cancer risk, aberrantly activate a cellular stress sensor, the aryl hydrocarbon receptor (AhR). His lab also demonstrated that hyper-activated AhR drives breast cancer cell invasion and enhances breast cancer stem cell survival and/or growth. Thus, environmental exposures are postulated to play a role in both induction and progression of cancer. Using multiplexed qPCR to complement the L1000 platform, Dr. Sherr will assess the signaling pathways through which the AhR enhances metastasis and cancer stem cell survival and determine the extent to which a large number of individual environmental chemicals, or complex mixtures thereof, exploit these same pathways to drive breast cancer progression. These studies are expected to reveal molecular mechanisms of AhR signaling while determining genomic signatures that predict environmental AhR ligands capable of initiating and/or exacerbating breast cancer.

Bibliography

1. David, A. R. & Zimmerman, M. R. Cancer: an old disease, a new disease or something in between? Nat Rev Cancer 10, 728–733 (2010).

2. Lee Davis, D. et al. The need to develop centers for environmental oncology. Biomed. Pharmacother. 61, 614–622 (2007).

3. Sorensen, T. I. A., Nielsen, G. G., Andersen, P. K. & Teasdale, T. W. Genetic and Environmental Influences on Premature Death in Adult Adoptees. N. Engl. J. Med. 318, 727–732 (1988).

4. Leffall, L. D. & Kripke, M. L. President’s Cancer Panel: Reducing Environmental Cancer Risk. 2008-2009 Annu. Rep. (National Cancer Institute, 2010). at <http://deainfo.nci.nih.gov/advisory/pcp/annualReports/index.htm>

5. Peck, D. et al. A method for high-throughput gene expression signature analysis. Genome Biol. 7, R61 (2006).

6. Subramanian, A., Narayan, R., Peck, D., Golub, T. & Lamb, J. Manuscript in preparation. (2011).

7. Lamb, J. The Connectivity Map: a new tool for biomedical research. Nat. Rev. Cancer 7, 54–60 (2007).

8. Smith, B. W. et al. The aryl hydrocarbon receptor directs hematopoietic progenitor cell expansion and differentiation. Blood 122, 376–385 (2013).