skip to main content

Classification of long non-coding RNAs

Project

Project Details

Program
Computer Science
Field of Study
​Computer science, bioinformatics, electrical engineering, applied mathematics​
Division
Computer, Electrical and Mathematical Sciences and Engineering
Center Affiliation
Computational Bioscience Research Center

Project Description

​Long non-coding RNAs (lncRNAs) have been found to perform various functions in a wide variety of important biological processes. To make easier interpretation of lncRNA functions and conduct deep mining on these transcribed sequences, it is important to classify lncRNAs into different groups. lncRNA classification attracts much attention recently. The main technical difficulties are 1) the limited number of known lncRNAs (small training sample size), and 2) the very different lengths of lncRNAs. This project is to apply and further improve the string kernel algorithms developed in Prof. Gao’s group to the lncRNA classification problem. ​​​​​​​

About the Researcher

Xin Gao
Professor, Computer Science Program
Computer, Electrical and Mathematical Science and Engineering Division

Affiliations

Education Profile

  • Ph.D. University of Waterloo, Canada, 2009
  • B.S. Tsinghua University, 2004

Research Interests

Gao's research lies at the intersection between computer science and biology. His work has two main focuses: 1) developing theory and methodology in the fields of machine learning and algorithms; and 2) solving key open problems in biological and medical fields through building computational models, developing machine-learning techniques, and designing effective and efficient algorithms. In particular, he aims to solve problems that occur along the path from protein amino acid sequences to their three-dimensional structures and functions that ultimately lead to their undesirable expression in complex biological networks.

Selected Publications

  • A. Abbas, X. Guo, B. Jing, and X. Gao. An automated framework for NMR resonance assignment through simultaneous slice picking and spin system forming. Journal of Biomolecular NMR. (2014). 59(2): 75-86.
  • H. Kuwahara, M. Fan, S. Wang, and X. Gao. A framework for scalable parameter estimation of gene circuit models using structural information. Bioinformatics. (2013). 29(13): i98-i107.
  • B. Xie, B. Jankovic, V. Bajic, L. Song, and X. Gao. Poly(A) motif prediction using spectral latent features from human DNA sequences. Bioinformatics. (2013). 29(13): i316-i325.
  • M. Maadooliat, X. Gao, and J. Huang. Assessing protein conformational sampling methods based on bivariate lag-distributions of backbone angles. Briefings in Bioinformatics. (2013). 14(6): 724-736.
  • Z. Liu, A. Abbas, B. Jing, and X. Gao. WaVPeak: picking NMR peaks through wavelet transform and volume-based filtering. Bioinformatics (2012), 28(7): 914-920.
  • B. Alipanahi, X. Gao, E. Karakoc, L. Donaldson, A. Gutmanas, C. Arrowsmith, and M. Li. PICKY: a novel SVD-based NMR spectra peak picking method. Bioinformatics. (2009). 25(12): i268-i275.
  • X. Gao, D. Bu, J. Xu, and M. Li. Improving consensus contact prediction via server correlation reduction. BMC Structural Biology, 2009, 9:28.

Desired Project Deliverables

The visiting student for this project is expected to finish the following deliverables:1.      Give a throughout literature review on lncRNA classification methods and potential machine learning methods that can be applied to this problem. 2.      Get familiar with the string kernel algorithms developed in Prof. Gao’s group. 3.      Gather an lncRNA dataset to be used as the benchmark set for this research. 4.      Conduct a comprehensive comparative study of the state-of-the-art methods on the benchmark set. 5.      Apply the string kernel algorithms on lncRNA classification and evaluate the performance. 6.      If necessary, improve the string kernel algorithms to achieve better performance.Write a report to summarize the results.