No image available
by Bolan Linghu · 2009
ISBN: Unavailable
Category: Unavailable
Page count: 290
Abstract: In the postgenomic era, it remains a challenging task to understand the cellular functions of genes and how the dysfunction of a gene relates to a disease. Since genes work cooperatively for particular cellular tasks, a functional linkage network (FLN) can be used for function-related studies. In this network, the nodes represent genes and the weighted edges represent the degree of their functional association. Here I explore the FLN construction, FLN-based gene-function prediction, and FLN-based new-disease-gene prediction. In the first part of the dissertation, aiming to provide precise functional annotation for as many genes as possible, I explore and propose a two-step framework: (i) construction of a high-coverage and reliable FLN via data integration, and (ii) development of a reliable decision rule for functional annotation. This framework is tested in yeast and E. coli . In step one, I demonstrate that commonly used machine learning methods such as Linear SVM and Naïve Bayes all combine heterogeneous data to produce reliable and high-coverage FLNs. In step two, empirical tuning of an adjustable decision rule on the FLN reveals that basing annotation on maximum edge weight results in the most precise annotation at high coverages. In the second part of the dissertation, I build and validate a human genome-scale FLN by data integration using a Naïve Bayes classifier. This FLN is then used to predict new candidate disease genes associated with 110 diseases. In particular I hypothesize that the neighborhood of known disease genes tends to be enriched in genes that are also associated with the same disease. This is based on the observation that disease genes underlying common diseases tend to occur in distinct functional modules. The network thus enables one to identify previously unimplicated genes, and to rank them by the likelihood of their involvement. I show that this FLN is able to predict new disease genes for diverse diseases and outperforms networks based solely on protein-protein physical interactions. Additionally, based on the observation that disease genes underlying similar or related diseases tend to be functionally related, I illustrate that the FLN can also help to assess disease-disease associations.