
Projects and Grants
The Ma'ayan Laboratory applies computational and mathematical methods to study the complexity of regulatory networks in mammalian cells. We apply graph-theory algorithms, machine-learning techniques and dynamical modeling to study how intracellular regulatory systems function as networks to control cellular processes such as differentiation, de-differentiation, apoptosis and proliferation. We develop software systems to help experimental biologists form novel hypotheses from high-throughput data, and develop theories about the structure and function of regulatory networks in mammalian systems.
Research Aims:
a. Assemble large-scale mammalian cellular signaling networks, protein-protein interactions, transcription-factor/DNA interactions, microRNA-mRNA and kinase-substrate interactions from publications and databases describing direct regulatory relationships between individual cellular components.
b. Many of the theoretical observations we extracted so far from the topologies of biological networks are manifestations of general design principles observed in many complex systems, not just in biological networks, we plan to further explore how such principles emerge and are related.
c. Utilize the consolidated datasets we collect together with algorithms that we develop, visualization tools, modeling approaches and statistical methods to extract patterns from the data collected by our experimentalists collaborators to prioritize components and interactions for further functional experimentation.
a. Collect and Organize Data from the Public Domain
Cell signaling and gene regulatory networks in mammalian cells are the focus of biomedical research because such complex systems control cellular behavior. When cellular regulation mechanisms malfunction in an organism, the result is often disease. In the past five decades, cell and molecular biologists have accumulated enormous amounts of knowledge about cell regulation at the molecular level. Today, the rate of data accumulation resulting from emerging high-throughput biotechnologies is rapidly increasing. Such advances have the potential to unravel the complexity of cell regulation at the molecular level in a way that would enable us to control cells with drugs and genetically engineer cells for desired behaviors. However, we are still not there yet. Many components and details about their interactions—particularly in mammalian cells—are still largely unknown. Hence, we still do not have a holistic understanding of cellular regulation in mammalian cells. Integrating data from multiple sources to extract critical knowledge about regulatory networks and developing new hypotheses that are based on such prior knowledge is currently one of the major challenges in computational systems biology.
To address some of these challenges, during this exciting phase-transition era in regulatory biology, the Ma’ayan laboratory is applying engineering principles to develop new theories about the global organization of cell regulatory networks, as well as develop tools to assist experimental biologists to improve knowledge extraction from high-throughput experimental results. We have identified interesting global emergent properties observed in the topology of biological regulatory networks, including mammalian cell signaling and gene regulatory networks, and developed acclaimed software tools to analyze proteomics and genomics experimental data in context of prior biological knowledge about networks and annotated gene-sets. As part of our effort, we plan to continue to assemble large-scale mammalian cellular signaling networks, protein-protein interactions, transcription-factor/DNA interactions, microRNA-mRNA, and kinase-substrate interactions from publications and databases that describe direct regulatory relationships between individual molecular cellular components.
b. Understand the Structure and Dynamics of Cellular Regulatory Networks
Initial topology analysis of the networks we have collected and analyzed showed, for example, that negative feedback loops are more often found to include components close to the cell surface, whereas positive feedback loops are more prevalent with components present in the cytoplasm and the nucleus (Ma'ayan et al. Science 310:1078, 2005). We also found that pathways starting from some extra-cellular ligands have many more alternative paths to downstream effectors compared with most other extra-cellular ligands. We showed that this organizational architecture might be due to an evolutionary process of adaptation to a non-uniform extra-cellular environment (Ma'ayan et al. Physical Review E 73:061912, 2006). In collaboration with Eduardo Sontag, we also found that gene-regulatory and signaling networks might be designed to be close to Monotone Systems because negative feedback and negative feedforward loops are much less abundant in graphs representing gene and signaling regulatory networks (Ma'ayan et al. IET Systems Biology 2:206, 2008). Our topology analyses also uncovered that regulatory molecular networks in cells are depleted in feedback loops and feedback loops are nested in all the regulatory networks we examined (Ma’ayan et al. PNAS 105:19235, 2008). We recently proposed an evolutionary model that can be used to explain such architecture (MacArthur et al. Phys. Rev. Lett 16:168701, 2010).
Many of the theoretical observations we extracted from the topologies of biological networks are manifestations of general design principles observed in many complex systems, not just in biological networks, and we are interested in understanding how such principles emerge and are related.
c. Analyze Data from High-Content Experiments and Develop Novel Data Analysis and Data Visualization Methods and Software
More pragmatically, we are integrating our theoretical framework with experimental data. We are analyzing results from Protein/DNA arrays (Bromberg et al. Science 320:903, 2008), gene expression microarrays (Lu et al. Nature 462:358 2009), and Mass-Spectrometry proteomics (Abul-Husn et al. Proteomics 9:3303, 2009) to place lists of genes and proteins, identified in experiments, in the context of prior biological knowledge about protein-protein, protein-DNA and cell signaling interactions and pathways. For this we utilize the consolidated datasets we collect and the algorithms, visualization tools, modeling approaches and statistical methods we develop to extract patterns from the data and prioritize components and interactions for further functional experimentation.
So far we developed several software tools that can be use to analyze and visualize high-throughput experimental data collected at different regulatory layer. These include: Lists2Networks, GATE, KEA, ChEA, Genes2Networks, AVIS and SNAVI.
• Lists2Networks (Lachmann and Ma’ayan BMC Bioinformatics, 11:87, 2010) is a web-based system that allows users to upload lists of mammalian genes/proteins onto a server-based program for integrated analysis. The system includes web-based tools to manipulate lists with different set operations, to expand lists using existing mammalian networks of protein-protein interactions, co-expression correlation, or background knowledge co-annotation correlation, as well as to apply gene-list enrichment analyses against many gene-list libraries of prior biological knowledge such as pathways, gene ontology terms, kinase-substrate, microRNA-mRAN, and protein-protein interactions, metabolites, and protein domains. Such analyses can be applied to several lists at once against many prior knowledge libraries of gene-lists associated with specific annotations. The system also contains features that allow users to export networks and share lists with other users of the system.
• GATE (MacArthur et al. Bioinformatics, 26:143, 2010) Grid Analysis of Time series Expression (GATE), an integrated computational software platform for the analysis and visualization of high-dimensional biomolecular time series. GATE uses a correlation-based clustering algorithm to arrange molecular time series on a two-dimensional hexagonal array and dynamically colors individual hexagons according to the expression level of the molecular component to which they are assigned, to create animated movies of systems-level molecular regulatory dynamics. In order to infer potential regulatory control mechanisms from patterns of correlation, GATE also allows interactive interroga-tion of movies against a wide variety of prior knowledge datasets. GATE movies can be paused and are interactive, allowing users to reconstruct networks and perform functional enrichment analyses. Movies created with GATE can be saved in Flash format and can be inserted directly into PDF manuscript files as interactive figures. The GATE software was used to visualize and analyze data from a study that was published in Nature (Lu et al. Nature 462:358 2009) in collaboration with Ihor Lemischka’s laboratory.
• Kinase enrichment analysis (KEA) (Lachmann and Ma'ayan Bioinformatics 25:684, 2009) is a web-based tool with an underlying database providing users with the ability to link lists of mammalian proteins/genes with the kinases that phosphorylate them. The system draws from several available kinase-substrate databases to compute kinase enrichment probability based on the distribution of kinase-substrate proportions in the background kinase-substrate database compared with kinases found to be associated with an input list of genes/proteins.is a web-based tool with an underlying database providing users with the ability to link lists of mammalian proteins/genes with the kinases that phosphorylate them.
• ChEA is a web-based system where users can input lists of mammalian gene symbols for which the program computes over-representation of transcription factor targets from the ChIP-X database. To build the ChIP-X database we collected interactions from high throughput ChIP experiments to construct a mammalian ChIP-X database. The database currently contains 157,423 interactions, extracted from 66 publications, describing the binding of 74 transcription factors to 28,743 target genes. We use the database to analyze mRNA expression data where we perform gene-list enrichment analysis using the ChIP-X database as the prior biological knowledge list library. The web-based software reports a ranked list of transcription factors exhibiting statistically significant enrichment for overlapping targets with the input list. The ChEA database allowed us to also reconstruct an initial network of transcription factors connected based on shared overlapping targets.
• Genes2Networks (Berger et al. BMC Bioinformatics 8:372, 2007) is a software system that integrates the content of ten mammalian interaction network datasets. Filtering techniques to prune low-confidence interactions were implemented. Genes2Networks is delivered as a web-based service using AJAX. The system can be used to extract relevant subnetworks created from "seed" lists of human Entrez gene symbols. The output includes a dynamic linkable three color web-based network map, with a statistical analysis report that identifies significant intermediate nodes used to connect the seed list. Genes2Networks was used to identify SHOC2 as a novel Noonan-like syndrome disease causing gene (Cordeddu et al. Nature Genetics 41:1022, 2009) and to predict component important for regulating neurite outgrowth in Neuro2A cells treated with the CB1R agonist HU-210 (Bromberg et al. Science 320:903, 2008).
• SNAVI (Ma'ayan et al. BMC Systems Biology 3:10, 2009) is a PC Windows desktop application that can be used to build web-sites from networks in text format, find pathways from receptors to effectors, and to compute network statistics for large-scale networks.
• AVIS (Berger et al. Bioinformatics, 23:2803, 2007) is a Google gadget-compatible, web-baseviewer that can be used to visualize cell signaling networks stored in text or any other tabular format.
The Ma’ayan laboratory's ultimate long-term goal is to understand how gene regulatory networks and cell signaling networks are altered in human disease and to predict how drugs can be used to alter such changes as well as predict how drugs can cause side-effects. For this we began developing a network that connects FDA approve drugs and their known targets (Ma'ayan et al. Mount Sinai Journal of Medicine 74:27-32, 2007). We are also developing methods to predict drug combinations to inhibit the proliferation of cancer cells in a dish, and to enhance iPS reprogramming of somatic cells with drugs.
As we move forward, we believe that all of our efforts would make significant contributions to Translational Systems Biology toward novel identification of therapeutics to treat complex diseases such as cancer and type-2 diabetes, as well significantly enhance studies that profile stem cells toward enhancing our ability to better reprogram iPS cells and direct differentiation of embryonic stem and iPS cells to all specific lineages.
Avi Ma'ayan, PhD
Tel: 212-659-1739
Fax: 212-831-0114
Send e-mail
Icahn Medical Institute
1425 Madison Avenue
Room 12-78 (Office), 12-76 (Lab)
One Gustave L. Levy Place
Box 1215
New York, NY 10029
New Database Could Speed Up Drug Discovery
Tech news feature on CNET
Animating Molecular Biology
Article in Biomedical Computation Review
Systematic Tracking of Cell Fate Changes
News and views article in Nature Biotechnology
Computational Honeycombs Drip with Data
News item in NIGMS Computing Life
Molecular Movies: New Software Animates Gene Expression Data
Technology observation on Scientific American Online
Stem Cells, Systems Biology and Human Feedback
News feature in Nature Reports Stem Cells



