Dung Ngoc Nguyen

Dung Ngoc Nguyen

About Me

Hello and welcome! My full name is Dung Ngoc Nguyen but most people call me Dung with the D being pronounced by Z in English pronunciation.

I am currently a postdoctoral research fellow in Statistics under the supervision of Professor Alberto Roverato at the Department of Statistical Sciences, University of Padova, Italy. Our objective is to conduct a research activity on the development of new statistical tools for learning the structure of complex networks characterized by high dimensionality in non-standard experimental setups.

My research interest is related to Data Science, in particular, I can apply Machine Learning and Statistical Methodologies to analyze and interpret large and complex datasets, especially data related to biology and genetics.

I am a motivated young researcher who wishes to leave a legacy for science. It is with pleasure and enthusiasm that I will participate in any collaborative relationship.

Interests

  • Data Science
  • Statistical Learning
  • Probabilistic Graphical Models
  • Machine Learning

Education

  • Ph.D. in Statistical Sciences, 2018-2021

    Università degli studi di Padova, Italy

  • M.S. in Applied Mathematics, 2017-2018

    Université de Tours, France

  • B.S. in Mathematics and Computer Sciences, 2013-2017

    Vietnam National University-Ho Chi Minh Univeristy of Science, Vietnam

Publications

A non-asymptotic theory for model selection in high-dimensional mixture of experts via joint rank and variable selection

Mixture of experts (MoE) models are among the most popular and interesting combination techniques, with great potential for improving the performance of machine learning and statistical learning systems. We are the first to consider a polynomial softmax-gated block-diagonal mixture of experts (PSGaBloME) model for the identification of potentially nonlinear regression relationships for complex and high-dimensional heterogeneous data, where the number of explanatory and response variables can be much larger than the sample size and possibly hidden graph-structured interactions exist. These PSGaBloME models are characterized by several hyperparameters, including the number of mixture components, the complexity of softmax gating networks and Gaussian mean experts, and the hidden block-diagonal structures of covariance matrices. We contribute a nonasymptotic theory for model selection of such complex hyperparameters with the help of the slope heuristic approach in a penalized maximum likelihood estimation (PMLE) framework. In particular, we establish a non-asymptotic risk bound on the PMLE, which takes the form of an oracle inequality, given lower bound assumptions on the penalty function. Furthermore, we propose two Lasso–MLE–rank procedures, based on a new generalized expectation–maximization algorithm, to tackle the estimation problem of the collection of PSGaBloME models.

Achievements

Course of Machine Learning Specialization

Lecturer: Andrew Ng

Objectives: it provides a broad introduction to modern machine learning, including supervised learning (multiple linear regression, logistic regression, neural networks, and decision trees), unsupervised learning (clustering, dimensionality reduction, recommender systems), and some of the best practices used in Silicon Valley for artificial intelligence and machine learning innovation (evaluating and tuning models, taking a data-centric approach to improving performance, and more.).

See certificate

Course of Statistical aspects of Deep Neural Networks

Lecturer: Omiros Papaspiliopoulos

Objectives:

  • Cover the foundations of deep neural networks with a special emphasis and priority on more statistical aspects of this research agenda e.g. high dimensional regression, the random features models, from PCA to autoencoders, optimisation for neural networks from deep neural networks to stochastic processes, regularisation, stability and adversarial training, etc.
  • Give an overview of software and implementations in Pytorch.
See certificate

Projects

Algorithms for graphical models with symmetries

Summary: The activity is part of the AFOSR project and as a specific object the identification of a network, based on a set of data in the event that the reference context presents a natural symmetrical structure. Specifically, the network can be seen as the union of two blocks, each vertex in the first block has a homologous vertex within the second block, and two blocks are interconnected. This problem can be seen as a special case of a colored graphical model in which the model space forms a lattice structure. The problem of model selection through greedy procedures therefore requires the development of algorithms that allow the search space to be efficiently explored through local steps.

Talks

On model selection of coloured Gaussian graphical models for paired data
Model selection for coloured graphical models for paired data