CUWB-IV: Frontiers in statistics and probability

CIMAT

 

UNAM

 

 

 

University of Bath

 

 

 

All lectures will take place at IIMAS in the Building C, room C13

Minicourse speaker Title and Abstract
Gareth Roberts

Retrospective Simulation

These lectures will take a fresh perspective on stochastic simulation for use in Bayesian Statistics and Applied Probability. Put simply, retrospective simulation techniques subvert the traditional order of steps in existing sampling algorithms (including inversion samplers, rejection and importance sampling, Markov chain Monte Carlo) in order to effect (often huge) gains in algorithmic efficiency. This short course will introduce the basic techniques, and illustrate them in a number of examples which will include non-reversible MCMC algorithms, simulation and importance sampling of diffusion processes, and draws for major football competitions.

Jere Koskela

An ancestral perspective on genetics and genetic algorithms

Populations of particles reproducing at random in discrete generations are canonical models of population genetic evolution. Predicting the genetic diversity in a sample from such a population quickly leads one to consider the random tree describing the common ancestry of that sample. Scaling limits of these random trees for large populations gives rise to the field of coalescent theory and yield tractable predictions of genetic diversity, particularly for neutrally evolving populations. The same populations of particles describe a class of statistical learning algorithms known as particle filters. For particle filters, neutral evolution is applicable only to trivial statistical problems so that many of the innovations of coalescent theory are difficult to leverage. Indeed, the fields of mathematical population genetics and particle filtering have been largely disjoint for decades. These lectures will introduce standard tools of mathematical population genetics, as well as particle filters as general-purpose statistical algorithms. Emphasis will be given to the ancestral perspective, and to connections between the two distinct application domains.

Adrian Gonzalez Casanova

and

Imanol Nuñez

Moment duality and the propagation of exchengeability

Heuristically, two processes are said to be dual if there exists a function that allows one process to be studied through the other. Sampling duality is a specific form of duality that utilizes a function S(𝑛,đ‘„) which represents the probability that all individuals in a sample of size n belong to a certain type, given that the total number (or frequency) of that type in the population is x. While this idea can be traced back implicitly to Blaise Pascal (1623–1662), it was explicitly formalized by Martin Möhle in 1999 in the context of population genetics.  such as the simple exclusion process. Additionally, we will discuss a universality result for the Fisher-KPP stochastic partial differential equation. A key focus will be the relationship between exchangeability and duality, providing insights into the lookdown construction. Finally, we will examine a characterization of exchangeable Markov chains and explore how it naturally connects with sampling duality.

Talks by:
 

Anita Behme (Dresden)
Emilien Joly (CIMAT)
Maria Fernanda Gil Leyva Villa (UNAM)
Kari Heine (Bath)
Lizbeth Penaloza
Leticia Ramirez Ramirez (CIMAT)
Yi Yu (Warwick)
 
See abstracts below.

 

Schedule

All lectures will take place at IIMAS in the Building C, room C13

TIME Monday Tuesday Wednesday Thursday Friday
9.30-10.30
Gareth Roberts
Gonzalez
Casanova
+Nuñez
Cardenas
Gareth Roberts
Jere Koskela
Krasnowska
Anzures
Angtuncio
10.30-11.00 Coffee Coffee Coffee Coffee Coffee
11.00-12.00
Jere Koskela
Jere Koskela
Gareth Roberts
Gonzalez
Casanova
+Nuñez
Marifer Gil 
 
12.00-12:45
Kari Heine 
Yi Yu
Gonzalez
Casanova
+Nuñez
Anita Behme
END
12:45-13:00 Quintanilla Molina Hata
13:00-15:00
Lunch
Lunch
Free time
Lunch
15.00-16.00
Gonzalez
Casanova
+Nuñez
Gareth Roberts
Jere Koskela
16.00-16.30 Coffee Coffee Coffee
16.30-17.15
Lizbeth Peñaloza
Leticia Ramirez
Emilien Joly
17.15-17.30 LĂłpez Bravo Sapranidis

Title and abstracts for the talks:

Anita Behme: Siegmund-duality for Markov processes

According to Siegmund (1976) two time-homogeneous Markov processes $X,Y$ on $\textbb{R}_+$ are dual, if for all $t,x,y\geq 0$.          $$\mathbb{P}^x(X_t\leq y) = \PP^y(Y_t\geq x).$$.     This duality is a helpful tool in applied probability as it allows (under suitable regularity conditions) to express the stationary law of one of the processes via hitting probabilities of the other process.    We recall a few well-known examples of pairs of dual Markov processes and their applications, add new case-studies, and discuss how to find a dual process in the general context of LĂ©vy-type processes. Further, we will shed some light on the connection between the above duality and the related concept of time-reversal as used in the theory of semimartingales.

 

Emilien Joly: GROS: A Unified Framework for Robust Aggregation in Metric Spaces with Applications to Machine Learning and Statistics
 
In this talk, I will present GROS (General Robust Aggregation Strategy), a novel framework for robustly combining estimators in metric spaces. GROS is inspired by the median-of-means approach but extends it to a much broader class of problems, including clustering, regression, bandits, set estimation, and topological data analysis. The key idea is simple yet powerful: partition the data into K groups, compute an estimator for each group, and then aggregate these estimators using a robust minimization procedure. The resulting estimator is provably sub-Gaussian and achieves a high breakdown point, making it resilient to outliers and adversarial data. I will also discuss how GROS can be efficiently implemented in practice, with only a constant factor loss in performance compared to the theoretical ideal. Finally, I will outline future directions for applying GROS to other domains where robustness to outliers or adversarial data is critical.
 
Maria Fernanda Gil Leyva Villa: Ordered allocation sampling in Bayesian nonparametrics
 
Monte Carlo Markov Chain (MCMC) methods such as Gibbs samplers and Metropolis-Hastings algorithms are standard tools to perform posterior inference in Bayesian Statistics. For nonparametrics models these methods are challenging to design, as the objective distribution is often infinite dimensional. The most efficient available algorithms rely on PĂłlya urn schemes that describe the evolution of a (possibly latent) exchangeable partition. Consequently, their application requires an explicit expression of the so-called Exchangeable Partition Probability Function (EPPF) which is only available for a handful of models. Here we propose to replace the exchangeable (unordered) partition with a partition whose blocks are in the least element order. Given the long-run proportions of elements in each of these blocks, a conditional PĂłlya urn scheme is obtained. This yields a new class of MCMC methods that does not require an analytically tractable EPPF and maintains the nice convergence properties of the most efficient available samplers.
 
Kari Heine: Augmented island resampling particle filter for particle MCMC
 
The ability to carry out computations in parallel is paramount to efficient implementations of computationally intensive algorithms. We investigate the applicability of the Augmented Island Resampling Particle Filter (AIRPF) - an algorithm designed for parallel computing - to particle Markov Chain Monte Carlo (PMCMC), and show that it produces a non-negative, unbiased estimator of the marginal likelihood making it suitable for PMCMC. Moreover, we extend the stability results previously shown for the so-called αSMC algorithm to cover AIRPF. As a corollary, the error of AIRPF can be bounded uniformly in time by controlling the effective number of filters, which is a diagnostic analogous to the effective sample size. Such control can be implemented by adaptively constraining the interactions between the parallel filters. We demonstrate the superiority of AIRPF over independent Bootstrap Particle Filters, not only numerically, but also theoretically. In this context, we extend the previously proposed collision analysis approach to derive an explicit expression for the variance of the marginal likelihood estimate, and establish an unexpected connection between the filter network topology and the marginal likelihood variance in terms of Fibonacci sequence.
 
Lizbeth Penaloza: The time to the most recent common ancestor (TMRCA) of genealogies in populations of variable size.
 
In biology, particularly in population genetics, the theory of coalescent processes is used to model the parental relationships of a given sample or population as we trace the ancestry of individuals backward in time, thus constructing a genealogical tree. Once we have a suitable coalescent model for the genealogy of a population, we can employ mathematical tools to tackle biological questions, such as determining the time needed to reach the most recent common ancestor of one sample (TMRCA). In this talk, I will present results about the density and the moments of the TMRCA for time-inhomogeneous coalescent processes describing the genealogies of populations evolving under deterministically varying population size, using recent results on inhomogeneous phase-type random variables. This work is with Alejandro H.Wences, Matthias SteinrĂŒcken, and Arno Siri-JĂ©gousse.
 
Leticia Ramirez Ramirez: Statistical Inference of Censored Data in Non-Homogeneous Poisson Processes
 
In this talk, we propose a method of statistical inference applied to censored data within the context of non-homogeneous Poisson processes, with a particular focus on Hawkes processes. These processes are flexible models that describe data where previous events can either stimulate or inhibit future events. Due to this flexibility, Hawkes processes can model complex temporal phenomena, such as the occurrence of earthquakes, financial transactions, or social interactions, where there is a temporal dependence between events. We also discuss the challenges that arise when working with censored data. In this talk, we present inference techniques that allow us to estimate the parameters of these models while considering censored data. We evaluate the effectiveness of these methods through computational experiments and finally apply the methodology to a real case in the field of mental health.
 
Yi Yu: Optimal federated learning under differential privacy constraints

In this talk, I will start with an overview of the foundational concept of differential privacy (DP). I will then introduce three notions of DP tailored to the federated learning context, highlighting their relevance and implications in distributed settings. The core focus of this talk will be on a functional data estimation problem under a hierarchical and heterogeneous DP framework. I will discuss how privacy constraints impact estimation accuracy and quantify these tradeoffs through the lens of minimax theory. Key aspects of the proofs will also be outlined, as well as some numerical performances.

Title and abstract from the students:

 

Sebastian Quintanilla: Bayesian inference of genetic ancestry, using the sequentially Markov coalescent with memory

The genetic history of living beings is encoded in their DNA, and those sequences can be traced back to a common ancestor. Having a sample of DNA sequences from individuals of a same population, it is natural to ask ourselves how this genetic variety came about. Ancestral recombination graphs (ARGs) are a mathematical object that trace the genealogical history of a sample of sequences back to their common ancestor, in the presence of genetic recombination. Knowing the true ARG of a sample is impossible, so a stochastic process called the Coalescent with recombination (CwR) is used to model the unseen history. This process, although simple in its definition, can be computationally demanding when used for Bayesian inference of the true ARG. In this talk we will present different approximations to the CwR by using sequences of trees that are satisfy the Markov property. We will focus on the potential benefits of adding memory to the tree sequence for performing Bayesian inference of the true ARG. We will also present a potential simplification of the inference, where the aim is to obtain the branch length of the local trees instead of the whole ARG, reducing this way the dimensions of the state space.

Kotaro Hata: Uniform Weak Convergence of Random Walks to Additive Processes

An additive process is a class of continuous in probability stochastic processes with independent increments. A Brownian motion is an example of an additive process, and Donsker’s theorem is a limit theorem for a Brownian motion. So our aim is to get a limit theorem for additive processes. In this talk, I will define the new convergence “uniform weak convergence”, and then I will give the necessary and sufficient condition for random walks generated by an infinitesimal triangular array to weakly converge to an additive process uniformly. Afterwards, we will provide some examples in specific cases. This talk is based on a joint work with Takahiro Hasebe (Hokkaido University).

Osvaldo Angtuncio: The coalescent structure of multitype continuous-time Bienaymé-Galton-Watson trees

 In this talk we discuss the genealogy of a sample of $k\geq 2$ individuals, from a continuous-time, finite variance multitype BienaymĂ©-Galton-Watson tree that survives up to a large time. The technique used is via a change of measure, extending the work of Harris, Johnston and Roberts (2020). We will also discuss the law of the times when the particles in the sample coalesce, the types of the individuals when coalescing, and scaling limits of such laws. We will also discuss the limiting coalescent process, which is a new type of multitype coalescent changing types in every infinitesimal time. This is a joint work with Juan Carlos Pardo and Simon Harris.

Frank Bravo: Logistic Branching Brownian Motion

We study a spatial population dynamics model whose underlying branching structure follows the branching process with logistic growth defined by A. Lambert.Specifically, each particle splits into two at rate ρ, competes with every other particle at rate c, and moves according to Brownian motion. We present results on the hydrodynamic limit of the system and the speed of the rightmost particle under a weak competition regime.

Mario Molina: SGD with robust aggregation

The stochastic gradient descent (SGD) algorithm is one of the most widely used methods for training machine learning models. Despite its empirical success across many domains, the theoretical understanding of certain generalizations remains incomplete. In this project, we evaluate the performance and investigate theoretical properties of a median-based variant of SGD, aiming to improve its robustness in the presence of heavy-tailed noise or outliers.

Marco LĂłpez: Degree- biased cutting of random recursive trees

We study a degree-biased cutting process on random recursive trees, where vertices are deleted with probability proportional to their degree. We verify the splitting property and explicitly obtain the distribution of the number of vertices deleted by each cut. This allows us to obtain a recursive formula for $K_n$, the number of cuts needed to destroy a random recursive tree of size $n$. Furthermore, we show that $K_n$ is stochastically dominated by $J_n$, the number of jumps made by a certain random walk with a barrier. We obtain a convergence in distribution of $J_n$ to a Cauchy random variable.

Edwin Anzures: Coalescent theory across species and coagulation-transport equations.

Stochastic models in population genetics allow us to study genetic variation in samples from one or more populations. The multispecies coalescent process is an adaptation of the single population coalescent model for the case of multiple species or populations separated by some geographic or external factor. In this short talk, I will explore the model's relationship with a coagulation-transport partial differential equation.

Janique Krasnowska: Coalescence in multi-type supercritical branching processes

Multi-type branching processes are used to model populations and other non-biological systems with similar dynamics. The coalescent process, which traces the genealogy of a sample from the final generation backward in time, can offer valuable insights into the system's behaviour. In this talk, we present a formula for the probability that two individuals share a common ancestor at a given time. We compare the efficiency of our method to direct simulation techniques.

Daniel CĂĄrdenas: De Finetti's control problem linearly bounded

De Finetti’s stochastic control problem is a class of optimal control problem that involves maximizing a functional in a random environment. This problem has significant applications in risk modeling. In this talk, we will present the solution to this control problem under the assumption of absolutely continuous and linearly bounded solutions, leading to solutions characterized by reflected Brownian motion. The solution and model discussed are based on the work of Renaud (2020).

Harry Sapranidis Mantelos: A Lamperti-type connection between self-similar Markov processes and Markov Additive Processes & some illustrations

We present a one-to-one connection between self-similar Markov processes (ssMps) on a Banach space equipped with norm and Markov Additive Processes (MAPs). This connection is analogous to the well-known one between positive self-similar Markov processes and Lévy processes via the renowned Lamperti-transform, with the main difference that ours is $\|\cdot\|$-dependent. We then illustrate the above-mentioned connection between ssMps and MAPs through some interesting examples of ssMps in the orthant of Rd involving stable processes. In particular, we explore how our formulae vary with the choice of norm. Based on joint work with Andreas Kyprianou and Víctor Rivero.


Organizers
Dario Spanó, Daniel Kious, Andreas Kyprianou, Giuseppe Cannizaro, Arno Siri-Jégousse, Sandra Palau, Juan Carlos Pardo, Victor Rivero, Paul Jenkins