Dung Ngoc Nguyen

About Me

Hello and welcome! My full name is Dung Ngoc Nguyen but most people call me Dung with the D being pronounced by Z in English pronunciation.

I am currently a postdoctoral research fellow in Statistics under the supervision of Professor Alberto Roverato at the Department of Statistical Sciences, University of Padova, Italy. Our objective is to conduct a research activity on the development of new statistical tools for learning the structure of complex networks characterized by high dimensionality in non-standard experimental setups.

My research interest is related to Data Science, in particular, I can apply Machine Learning and Statistical Methodologies to analyze and interpret large and complex datasets, especially data related to biology and genetics.

I am a motivated young researcher who wishes to leave a legacy for science. It is with pleasure and enthusiasm that I will participate in any collaborative relationship.

Interests

Data Science
Statistical Learning
Probabilistic Graphical Models
Machine Learning

Education

Ph.D. in Statistical Sciences, 2018-2021

Università degli studi di Padova, Italy
M.S. in Applied Mathematics, 2017-2018

Université de Tours, France
B.S. in Mathematics and Computer Sciences, 2013-2017

Vietnam National University-Ho Chi Minh Univeristy of Science, Vietnam

Publications

Alberto Roverato, Dung Ngoc Nguyen

Mar. 9, 2023 arXiv:2303.05561

Exploration of the search space of Gaussian graphical models for paired data

We consider the problem of learning a Gaussian graphical model in the case where the observations come from two dependent groups sharing the same variables. We focus on a family of coloured Gaussian graphical models specifically suited for the paired data problem. Commonly, graphical models are ordered by the submodel relationship so that the search space is a lattice, called the model inclusion lattice. We introduce a novel order between models, named the twin order. We show that, embedded with this order, the model space is a lattice that, unlike the model inclusion lattice, is distributive. Furthermore, we provide the relevant rules for the computation of the neighbours of a model. The latter are more efficient than the same operations in the model inclusion lattice, and are then exploited to achieve a more efficient exploration of the search space. These results can be applied to improve the efficiency of both greedy and Bayesian model search procedures. Here we implement a stepwise backward elimination procedure and evaluate its performance by means of simulations. Finally, the procedure is applied to learn a brain network from fMRI data where the two groups correspond to the left and right hemispheres, respectively.

PDF Source Document

TrungTin Nguyen, Dung Ngoc Nguyen, Hien Duy Nguyen, Faicel Chamroukhi

Feb. 11, 2023 HAL-03984011

A non-asymptotic theory for model selection in high-dimensional mixture of experts via joint rank and variable selection

Mixture of experts (MoE) models are among the most popular and interesting combination techniques, with great potential for improving the performance of machine learning and statistical learning systems. We are the first to consider a polynomial softmax-gated block-diagonal mixture of experts (PSGaBloME) model for the identification of potentially nonlinear regression relationships for complex and high-dimensional heterogeneous data, where the number of explanatory and response variables can be much larger than the sample size and possibly hidden graph-structured interactions exist. These PSGaBloME models are characterized by several hyperparameters, including the number of mixture components, the complexity of softmax gating networks and Gaussian mean experts, and the hidden block-diagonal structures of covariance matrices. We contribute a nonasymptotic theory for model selection of such complex hyperparameters with the help of the slope heuristic approach in a penalized maximum likelihood estimation (PMLE) framework. In particular, we establish a non-asymptotic risk bound on the PMLE, which takes the form of an oracle inequality, given lower bound assumptions on the penalty function. Furthermore, we propose two Lasso–MLE–rank procedures, based on a new generalized expectation–maximization algorithm, to tackle the estimation problem of the collection of PSGaBloME models.

PDF Source Document

Alberto Roverato, Dung Ngoc Nguyen

Oct. 5, 2022 Proceedings of Machine Learning Research – The 11th International Conference on Probabilistic Graphical Models

Model inclusion lattice of coloured Gaussian graphical models for paired data

We consider the problem of learning a graphical model when the observations come from two groups sharing the same variables but, unlike the usual approach to the joint learning of graphical models, the two groups do not correspond to different populations and therefore produce dependent samples. A Gaussian graphical model for paired data may be implemented by applying the methodology developed for the family of graphical models with edge and vertex symmetries, also known as coloured graphical models. We identify a family of coloured graphical models suited for the paired data problem and investigate the structure of the corresponding model space. More specifically, we provide a comprehensive description of the lattice structure formed by this family of models under the model inclusion order. Furthermore, we give rules for the computation of the join and meet operations between models, which are useful in the exploration of the model space. These are then applied to implement a stepwise model search procedure and an application to the identification of a brain network from fMRI data is given.

PDF Source Document

Achievements

Course of Machine Learning Specialization

Coursera Dec. 24, 2022 – Dec. 30, 2022

Lecturer: Andrew Ng

Objectives: it provides a broad introduction to modern machine learning, including supervised learning (multiple linear regression, logistic regression, neural networks, and decision trees), unsupervised learning (clustering, dimensionality reduction, recommender systems), and some of the best practices used in Silicon Valley for artificial intelligence and machine learning innovation (evaluating and tuning models, taking a data-centric approach to improving performance, and more.).

See certificate

Course of Statistical aspects of Deep Neural Networks

Ph.D. Programme in Economics, Statistics and Data Science, University of Milano-Bicocca, Italy Oct. 5, 2021 – Oct. 27, 2021

Lecturer: Omiros Papaspiliopoulos

Objectives:

Cover the foundations of deep neural networks with a special emphasis and priority on more statistical aspects of this research agenda e.g. high dimensional regression, the random features models, from PCA to autoencoders, optimisation for neural networks from deep neural networks to stochastic processes, regularisation, stability and adversarial training, etc.
Give an overview of software and implementations in Pytorch.

See certificate

Projects

Algorithms for graphical models with symmetries

a short-term research contract, Department of Statiscal Sciences Paolo Fortunati, University of Bologna, Italy Aug. 1, 2021 – Oct. 31, 2021

Summary: The activity is part of the AFOSR project and as a specific object the identification of a network, based on a set of data in the event that the reference context presents a natural symmetrical structure. Specifically, the network can be seen as the union of two blocks, each vertex in the first block has a homologous vertex within the second block, and two blocks are interconnected. This problem can be seen as a special case of a colored graphical model in which the model space forms a lattice structure. The problem of model selection through greedy procedures therefore requires the development of algorithms that allow the search space to be efficiently explored through local steps.

Talks

On model selection of coloured Gaussian graphical models for paired data

Dec 13, 2022 — Dec 16, 2022 IMS International conference on Statistics and Data Science - Florence, Italy.

Model selection for coloured graphical models for paired data

Sep 21, 2022 — Sep 23, 2022 International conference on Statistical methods and models for complex data - Department of Statistical Sciences, University of Padova, Italy.