Back to Project Search

Contextualized Embeddings for Biomedical Data

Apply to this project

Project Details

Program

BioScience

Field of Study

Machine Learning, Representation Learning

Division

Biological and Environmental Sciences and Engineering

Faculty Lab Link

https://www.kaust.edu.sa/en/study/faculty/ricardo-henao

Center Affiliation

Computational Bioscience Research Center

Project Description

Contextualized embeddings have revolutionized the field of machine learning. First, as a means to encode text in natural language applications and later on as a representational mechanism for other modalities including image, longitudinal, and high-dimensional structured data. In recent years, embedding approaches have been proposed to address problems in biology and healthcare, however, there are many important questions that require further investigation. For instance, i) how to effectively integrate embeddings for discrete elements with continuous measurements, ii) how to integrate granular temporal or ordering information into embeddings, and iii) how to effectively create embeddings for multimodal data. Successful applicants will work toward developing a model prototype addressing one of the questions above using state-of-the-art representation learning approaches based on deep learning architectures.

About the Researcher

Ricardo Henao Giraldo

Associate Professor, Bioengineering

Biological and Environmental Science and Engineering Division

Education Profile

Postdoctoral Associate, Duke University, 2015
Postdoctoral Researcher, University of Copenhagen, 2011
PhD, Technical University of Denmark, 2011
MSc, Universidad Nacional de Colombia, 2004
BEng, Universidad Nacional de Colombia, 2002

Research Interests

a€‹The theme of Professor Henao's research is the development of novel statistical methods and machine learning algorithms primarily based on probabilistic modeling. His expertise covers several fields including applied statistics, signal processing, pattern recognition and machine learning. His methods research focuses on hierarchical or multilayer probabilistic models to describe complex data, such as that characterized by high-dimensions, multiple modalities, more variables than observations, noisy measurements, missing values, time-series, multiple modalities, etc., in terms of low-dimensional representations for the purposes of hypothesis generation and improved predictive modeling. Most of his applied work is dedicated to the analysis of biological data such as gene expression, medical imaging, clinical narrative, and electronic health records. His recent work has been focused on the development of sophisticated machine learning models, including deep learning approaches, for the analysis and interpretation of clinical and biological data with applications to predictive modeling for diverse clinical outcomes.

Selected Publications

Wang, R., Yu, T., Zhao, H., Kim, S., Mitra, S., Zhang, R. & Henao, R. Few-Shot Class-Incremental Learning|for Named Entity Recognition in Proceedings of the 60th Annual Meeting of the Association for Computational|Linguistics (Volume 1: Long Papers) (2022), 571-582.
Kong, F. & Henao, R. Efficient Classification of Very Large Images with Tiny Objects in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).
Chapfuwa, P., Tao, C., Li, C., Page, C., Goldstein, B., Carin, L. & Henao, R. Adversarial time-to-event modeling in Proceedings of the 35th International Conference on Machine Learning (2018).
Pu, Y., Gan, Z., Henao, R., Yuan, X., Li, C., Stevens, A. & Carin, L. Variational Autoencoder for Deep Learning|of Images, Labels and Captions in Advances in Neural Information Processing Systems 29 (2016).
Tsalik, E. L., Henao, R., Nichols, M., Burke, T., Ko, E. R., McClain, M. T., Hudson, L. L., Mazur, A., Freeman,|D. H., Veldman, T., et al. Host gene expression classifiers diagnose acute respiratory illness etiology. Science|Translational Medicine 8 (2016).

Desired Project Deliverables

Deliverables include a literature review of the state of the art, the implementation of a model prototype and experiments comparing to existing approaches in the literature.

Recommended Student Background

Machine Learning

Representation Learning

Deep Learning

Natural Language Processing

On this page