訊息公告

2013/9/5(四) Detecting and Correcting Spurious Transcriptome Inference due to RNAseq Reads Misalignment,Prof. Wei Wang.

Talk title: Detecting and Correcting Spurious Transcriptome Inference due to RNAseq Reads Misalignment

Time: 9/5 10:30-11:30  

Place: EC 122

Speaker: Prof. Wei Wang, UCLA

Abstract:

RNA-seq techniques provide an unparalleled means for exploring a transcriptome with deep coverage and base pair level resolution. Various analysis tools have been developed to align and assemble RNA-seq data, such as the widely used TopHat/Cufflinks pipeline. A common observation is that a sizable fraction of the fragments/reads align to multiple locations of the genome. These multiple alignments pose substantial challenges to existing RNA-seq analysis tools. Inappropriate treatment may result in reporting spurious expressed genes (false positives), and missing the real expressed genes (false negatives). Such errors impact the subsequent analysis, such as differential expression analysis. In our study, we observed that about 3.5% of transcripts reported by TopHat/Cufflinks pipeline correspond to annotated nonfunctional pseudogenes. Moreover, about 10.0% of reported trascripts are not annotated in the Ensembl database. These genes could be either novel expressed genes or false discoveries. We examined the underlying genomic features that lead to multiple alignments and investigate how they generate systematic errors in RNA-seq analysis. We developed a general tool, GeneScissors, which exploits machine learning techniques guided by biological knowledge to detect and correct spurious transcriptome inference by existing RNA-seq analysis methods. GeneScissors can predict spurious transcriptome calls due to misalignment with an accuracy close to 90%, which represents substantial improvement over the widely used TopHat/Cufflinks or MapSplice/Cufflinks pipelines.

Biography:

Wei Wang is a professor in the Department of Computer Science at University of California at Los Angeles. She received a MS degree from the State University of New York at Binghamton in 1995 and a PhD degree in Computer Science from the University of California at Los Angeles in 1999. She was a professor in Computer Science and a member of the Carolina Center for Genomic Sciences and Lineberger Comprehensive Cancer Center at the University of North Carolina at Chapel Hill from 2002 to 2012, and was a research staff member at the IBM T. J. Watson Research Center between 1999 and 2002. Dr. Wang's research interests include big data, data mining, bioinformatics and computational biology, and databases. She has filed seven patents, and has published one monograph and more than one hundred fifty research papers in international journals and major peer-reviewed conference proceedings.

Dr. Wang received the IBM Invention Achievement Awards in 2000 and 2001. She was the recipient of an NSF Faculty Early Career Development (CAREER) Award in 2005. She was named a Microsoft Research New Faculty Fellow in 2005. She was honored with the 2007 Phillip and Ruth Hettleman Prize for Artistic and Scholarly Achievement at UNC. She also received the 2012 IEEE ICDM Outstanding Service Award. Dr. Wang has been an associate editor of the IEEE Transactions on Knowledge and Data Engineering, ACM Transactions on Knowledge Discovery in Data, Journal of Knowledge and Information Systems, International Journal of Knowledge Discovery in Bioinformatics, and an editorial board member of the International Journal of Data Mining and Bioinformatics and the Open Artificial Intelligence Journal. She serves on the organization and program committees of international conferences including ACM SIGMOD, ACM SIGKDD, ACM BCB, VLDB, ICDE, EDBT, ACM CIKM, IEEE ICDM, SIAM DM, SSDBM, BIBM.