skip to main content

Contextualized Embeddings for Biomedical Data

Project

Project Details

Program
BioScience
Field of Study
Machine Learning, Representation Learning
Division
Biological and Environmental Sciences and Engineering
Center Affiliation
Computational Bioscience Research Center

Project Description

Contextualized embeddings have revolutionized the field of machine learning. First, as a means to encode text in natural language applications and later on as a representational mechanism for other modalities including image, longitudinal, and high-dimensional structured data. In recent years, embedding approaches have been proposed to address problems in biology and healthcare, however, there are many important questions that require further investigation. For instance, i) how to effectively integrate embeddings for discrete elements with continuous measurements, ii) how to integrate granular temporal or ordering information into embeddings, and iii) how to effectively create embeddings for multimodal data. Successful applicants will work toward developing a model prototype addressing one of the questions above using state-of-the-art representation learning approaches based on deep learning architectures.

About the Researcher

Ricardo Henao Giraldo
Associate Professor, Bioengineering
Biological and Environmental Science and Engineering Division

Affiliations

Education Profile

  • Postdoctoral Associate, Duke University, 2015
  • Postdoctoral Researcher, University of Copenhagen, 2011
  • PhD, Technical University of Denmark, 2011
  • MSc, Universidad Nacional de Colombia, 2004
  • BEng, Universidad Nacional de Colombia, 2002

Research Interests

a€‹The theme of Professor Henao's research is the development of novel statistical methods and machine learning algorithms primarily based on probabilistic modeling. His expertise covers several fields including applied statistics, signal processing, pattern recognition and machine learning. His methods research focuses on hierarchical or multilayer probabilistic models to describe complex data, such as that characterized by high-dimensions, multiple modalities, more variables than observations, noisy measurements, missing values, time-series, multiple modalities, etc., in terms of low-dimensional representations for the purposes of hypothesis generation and improved predictive modeling. Most of his applied work is dedicated to the analysis of biological data such as gene expression, medical imaging, clinical narrative, and electronic health records. His recent work has been focused on the development of sophisticated machine learning models, including deep learning approaches, for the analysis and interpretation of clinical and biological data with applications to predictive modeling for diverse clinical outcomes.

Selected Publications

  • Wang, R., Yu, T., Zhao, H., Kim, S., Mitra, S., Zhang, R. & Henao, R. Few-Shot Class-Incremental Learning|for Named Entity Recognition in Proceedings of the 60th Annual Meeting of the Association for Computational|Linguistics (Volume 1: Long Papers) (2022), 571-582.
  • Kong, F. & Henao, R. Efficient Classification of Very Large Images with Tiny Objects in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).
  • Chapfuwa, P., Tao, C., Li, C., Page, C., Goldstein, B., Carin, L. & Henao, R. Adversarial time-to-event modeling in Proceedings of the 35th International Conference on Machine Learning (2018).
  • Pu, Y., Gan, Z., Henao, R., Yuan, X., Li, C., Stevens, A. & Carin, L. Variational Autoencoder for Deep Learning|of Images, Labels and Captions in Advances in Neural Information Processing Systems 29 (2016).
  • Tsalik, E. L., Henao, R., Nichols, M., Burke, T., Ko, E. R., McClain, M. T., Hudson, L. L., Mazur, A., Freeman,|D. H., Veldman, T., et al. Host gene expression classifiers diagnose acute respiratory illness etiology. Science|Translational Medicine 8 (2016).

Desired Project Deliverables

Deliverables include a literature review of the state of the art, the implementation of a model prototype and experiments comparing to existing approaches in the literature.

Recommended Student Background

Machine Learning
Representation Learning
Deep Learning
Natural Language Processing