CUWB-IV: Frontiers in statistics and probability

CIMAT

 

UNAM

 

 

 

University of Bath

 

 

 

 

Minicourse speaker Title and Abstract
Gareth Roberts

Retrospective Simulation

These lectures will take a fresh perspective on stochastic simulation for use in Bayesian Statistics and Applied Probability. Put simply, retrospective simulation techniques subvert the traditional order of steps in existing sampling algorithms (including inversion samplers, rejection and importance sampling, Markov chain Monte Carlo) in order to effect (often huge) gains in algorithmic efficiency. This short course will introduce the basic techniques, and illustrate them in a number of examples which will include non-reversible MCMC algorithms, simulation and importance sampling of diffusion processes, and draws for major football competitions.

Jere Koskela

An ancestral perspective on genetics and genetic algorithms

Populations of particles reproducing at random in discrete generations are canonical models of population genetic evolution. Predicting the genetic diversity in a sample from such a population quickly leads one to consider the random tree describing the common ancestry of that sample. Scaling limits of these random trees for large populations gives rise to the field of coalescent theory and yield tractable predictions of genetic diversity, particularly for neutrally evolving populations. The same populations of particles describe a class of statistical learning algorithms known as particle filters. For particle filters, neutral evolution is applicable only to trivial statistical problems so that many of the innovations of coalescent theory are difficult to leverage. Indeed, the fields of mathematical population genetics and particle filtering have been largely disjoint for decades. These lectures will introduce standard tools of mathematical population genetics, as well as particle filters as general-purpose statistical algorithms. Emphasis will be given to the ancestral perspective, and to connections between the two distinct application domains.

Adrian Gonzalez Casanova

and

Imanol Nuñez

Moment duality and the propagation of exchengeability

Heuristically, two processes are said to be dual if there exists a function that allows one process to be studied through the other. Sampling duality is a specific form of duality that utilizes a function S(𝑛,𝑥) which represents the probability that all individuals in a sample of size n belong to a certain type, given that the total number (or frequency) of that type in the population is x. While this idea can be traced back implicitly to Blaise Pascal (1623–1662), it was explicitly formalized by Martin Möhle in 1999 in the context of population genetics.  such as the simple exclusion process. Additionally, we will discuss a universality result for the Fisher-KPP stochastic partial differential equation. A key focus will be the relationship between exchangeability and duality, providing insights into the lookdown construction. Finally, we will examine a characterization of exchangeable Markov chains and explore how it naturally connects with sampling duality.

Talks by:
 

Anita Behme (Dresden)
Emilien Joly (CIMAT)
Maria Fernanda Gil Leyva Villa (UNAM)
Kari Heine (Bath)
Lizbeth Penaloza
Leticia Ramirez Ramirez (CIMAT)
Yi Yu (Warwick)
 
See abstracts below.

 

Schedule

The programe is still to be confirmed, but we plan on the following schedule:

TIME Monday Tuesday Wednesday Thursday Friday
9.30-10.30
Gareth Roberts
Gonzalez
Casanova
+Nuñez
Gonzalez
Casanova
+Nuñez
Gareth Roberts
Jere Koskela
10.30-11.00 Coffee Coffee Coffee Coffee Coffee
11.00-12.00
Jere Koskela
Jere Koskela
Gareth Roberts
Gonzalez
Casanova
+Nuñez
Marifer Gil 
Student
12.00-12:45
Kari Heine 
Yi Yu
Student
Anita Behme
END
Student
Student
12:45-13:00 Student Student Student Student
13:00-15:00
Lunch
Lunch
Free time
Lunch
15.00-16.00
Gonzalez
Casanova
+Nuñez
Gareth Roberts
Jere Koskela
16.00-16.30 Coffee Coffee Coffee
16.30-17.15
Lizbeth Peñaloza
Leticia Ramirez
Emilien Joly
17.15-17.30 Student Student Student

Title and abstracts for the talks:

Anita Behme: Siegmund-duality for Markov processes

According to Siegmund (1976) two time-homogeneous Markov processes $X,Y$ on $\textbb{R}_+$ are dual, if for all $t,x,y\geq 0$.          $$\mathbb{P}^x(X_t\leq y) = \PP^y(Y_t\geq x).$$.     This duality is a helpful tool in applied probability as it allows (under suitable regularity conditions) to express the stationary law of one of the processes via hitting probabilities of the other process.    We recall a few well-known examples of pairs of dual Markov processes and their applications, add new case-studies, and discuss how to find a dual process in the general context of Lévy-type processes. Further, we will shed some light on the connection between the above duality and the related concept of time-reversal as used in the theory of semimartingales.

 

Emilien Joly: GROS: A Unified Framework for Robust Aggregation in Metric Spaces with Applications to Machine Learning and Statistics
 
In this talk, I will present GROS (General Robust Aggregation Strategy), a novel framework for robustly combining estimators in metric spaces. GROS is inspired by the median-of-means approach but extends it to a much broader class of problems, including clustering, regression, bandits, set estimation, and topological data analysis. The key idea is simple yet powerful: partition the data into K groups, compute an estimator for each group, and then aggregate these estimators using a robust minimization procedure. The resulting estimator is provably sub-Gaussian and achieves a high breakdown point, making it resilient to outliers and adversarial data. I will also discuss how GROS can be efficiently implemented in practice, with only a constant factor loss in performance compared to the theoretical ideal. Finally, I will outline future directions for applying GROS to other domains where robustness to outliers or adversarial data is critical.
 
Maria Fernanda Gil Leyva Villa: Ordered allocation sampling in Bayesian nonparametrics
 
Monte Carlo Markov Chain (MCMC) methods such as Gibbs samplers and Metropolis-Hastings algorithms are standard tools to perform posterior inference in Bayesian Statistics. For nonparametrics models these methods are challenging to design, as the objective distribution is often infinite dimensional. The most efficient available algorithms rely on Pólya urn schemes that describe the evolution of a (possibly latent) exchangeable partition. Consequently, their application requires an explicit expression of the so-called Exchangeable Partition Probability Function (EPPF) which is only available for a handful of models. Here we propose to replace the exchangeable (unordered) partition with a partition whose blocks are in the least element order. Given the long-run proportions of elements in each of these blocks, a conditional Pólya urn scheme is obtained. This yields a new class of MCMC methods that does not require an analytically tractable EPPF and maintains the nice convergence properties of the most efficient available samplers.
 
Kari Heine: Augmented island resampling particle filter for particle MCMC
 
The ability to carry out computations in parallel is paramount to efficient implementations of computationally intensive algorithms. We investigate the applicability of the Augmented Island Resampling Particle Filter (AIRPF) - an algorithm designed for parallel computing - to particle Markov Chain Monte Carlo (PMCMC), and show that it produces a non-negative, unbiased estimator of the marginal likelihood making it suitable for PMCMC. Moreover, we extend the stability results previously shown for the so-called αSMC algorithm to cover AIRPF. As a corollary, the error of AIRPF can be bounded uniformly in time by controlling the effective number of filters, which is a diagnostic analogous to the effective sample size. Such control can be implemented by adaptively constraining the interactions between the parallel filters. We demonstrate the superiority of AIRPF over independent Bootstrap Particle Filters, not only numerically, but also theoretically. In this context, we extend the previously proposed collision analysis approach to derive an explicit expression for the variance of the marginal likelihood estimate, and establish an unexpected connection between the filter network topology and the marginal likelihood variance in terms of Fibonacci sequence.
 
Lizbeth Penaloza: The time to the most recent common ancestor (TMRCA) of genealogies in populations of variable size.
 
In biology, particularly in population genetics, the theory of coalescent processes is used to model the parental relationships of a given sample or population as we trace the ancestry of individuals backward in time, thus constructing a genealogical tree. Once we have a suitable coalescent model for the genealogy of a population, we can employ mathematical tools to tackle biological questions, such as determining the time needed to reach the most recent common ancestor of one sample (TMRCA). In this talk, I will present results about the density and the moments of the TMRCA for time-inhomogeneous coalescent processes describing the genealogies of populations evolving under deterministically varying population size, using recent results on inhomogeneous phase-type random variables. This work is with Alejandro H.Wences, Matthias Steinrücken, and Arno Siri-Jégousse.
 
Leticia Ramirez Ramirez: Statistical Inference of Censored Data in Non-Homogeneous Poisson Processes
 
In this talk, we propose a method of statistical inference applied to censored data within the context of non-homogeneous Poisson processes, with a particular focus on Hawkes processes. These processes are flexible models that describe data where previous events can either stimulate or inhibit future events. Due to this flexibility, Hawkes processes can model complex temporal phenomena, such as the occurrence of earthquakes, financial transactions, or social interactions, where there is a temporal dependence between events. We also discuss the challenges that arise when working with censored data. In this talk, we present inference techniques that allow us to estimate the parameters of these models while considering censored data. We evaluate the effectiveness of these methods through computational experiments and finally apply the methodology to a real case in the field of mental health.
 
Yi Yu: Optimal federated learning under differential privacy constraints

In this talk, I will start with an overview of the foundational concept of differential privacy (DP). I will then introduce three notions of DP tailored to the federated learning context, highlighting their relevance and implications in distributed settings. The core focus of this talk will be on a functional data estimation problem under a hierarchical and heterogeneous DP framework. I will discuss how privacy constraints impact estimation accuracy and quantify these tradeoffs through the lens of minimax theory. Key aspects of the proofs will also be outlined, as well as some numerical performances.


Organizers
Dario Spano, Daniel Kious, Andreas Kyprianou Giuseppe Cannizaro, Arno Siri-Jégousse, Sandra Palau, Juan Carlos Pardo, Victor Rivero, Paul Jenkins