Nonconvex optimization naturally arises in many machine learning problems such as sparse learning, matrix and tensor factorization, graphical models and deep learning. Traditionally, nonconvexity such as nonconvex functions is considered as a curse in machine learning. However, the empirical success of nonconvex optimization cannot be well interpreted by classical statistics and optimization theory, which prohibits us from designing more efficient and effective algorithms in a systematic way. Our research aims to bridge this gap between practice and theory, by developing new nonconvex machine learning algorithms and theories.
Deep learning has achieved great success in many applications such as image processing, speech recognition and Go games. However, the reason why deep learning is so powerful remains elusive. The goal of this project is to understand the successes of deep learning by studying and building the theoretical foundations of deep learning.
High-dimensional data, where the dimension is comparable to or even larger than the sample size, are ubiquitous in modern Big Data applications, ranging from texts, images, videos, to electronic medical records, genomics data and neuroscience data. For example, neuroimaging techniques, such as functional magnetic resonance imaging (fMRI), generate massive amounts of high-resolution image datasets, which are of high dimensions (i.e., > 80,000 voxels per image). The analysis of the fMRI data can lead to important discoveries for understanding brain diseases such as Alzheimer’s disease, attention deficit, hyperactive disorder and depression. In order to learn from high-dimensional data, high-dimensional machine learning methods play a central role.
Gene regulatory networks (GRNs) are highly dynamic among different tissue types. Graphical models have been used to estimate GRN from gene expression data to distinguish direct interactions from indirect associations. However, most existing methods estimate GRN for a specific cell/tissue type or in a tissue-naive way, or do not specifically focus on network rewiring between different tissues. We have applied this innovative and powerful graphical model to study the gene regulatory network among different tissue types.
Privacy is a critical issue that one needs to address in the deployment of large-scale distributed machine learning system. Privacy protection is concerned by all individual parties, which participate in the information sharing and exchange through modern information technologies. While privacy preserving machine learning has been extensively studied, the distributed privacy preserving machine learning is still under-investigated. Working with my collaborator, we have employed a new multiparty computation protocol (i.e., Garbled Circuit) in the distributed high-dimensional machine learning for aggregating local estimators. Our work enables organizations such as hospitals and banks willing to share their data to improve the performance of data mining and management.
Reinforcement learning (RL) is a sequential decision problem where the agent progressively interacts with the environment to choose the best actions that lead to the maximized cumulative reward for specific tasks. RL has achieved significant success in solving various complex problems such as learning robotic motion skills, autonomous driving and games. Our research has focused on the sample efficiency and the exploitation-exploration trade-off of RL, which is of central interest in modern machine learning.
The novel coronavirus (COVID-19) has emerged as a global pandemic, and the global death toll has reached 100,000 as of April 10, 2020. Currently the data about COVID-19 are overwhelming, yet the use of these data for combating COVID-19 is still in its early stage. The overarching goal of this project is to make good use of these data by machine learning, to better understand the spread of COVID-19, to facilitate informed decisions by policy makers, and to better allocate the medical resources such as medical workers, personal protective equipments (PPEs), ventilators, etc.