Van Dantzig Seminar

nationwide series of lectures in statistics


Home      David van Dantzig      About the seminar      Upcoming seminars      Previous seminars      Slides      Contact    

Van Dantzig Seminar: 9 October 2014

Programme: (click names or scroll down for titles and abstracts)

14:00 - 14:05 Opening
14:05 - 15:05 Arnak Dalalyan (ENSAE/CREST and Université Paris-Est)
15:05 - 15:25 Break
15:25 - 16:25 Marco Grzegorczyk (University of Groningen)
16:30 - 17:30 Reception
Location: VU University Amsterdam, Main Building, De Boelelaan 1105, Room 15A05 (Directions)

Titles and abstracts

  • Arnak Dalalyan

    On the performance of the Lasso in terms of prediction loss

    Although the Lasso has been extensively studied, the relationship between its prediction performance and the correlations of the covariates is not yet fully understood. In this talk, we give new insights into this relationship in the context of regression with deterministic design. We show, in particular, that the incorporation of a simple correlation measure into the tuning parameter leads to a nearly optimal prediction performance of the Lasso even when the elements of the dictionary are highly correlated. However, we also reveal that for moderately correlated dictionary, the performance of the Lasso can be mediocre irrespective of the choice of the tuning parameter. For the illustration of our approach with an important application, we deduce nearly optimal rates for the least-squares estimator with total variation penalty.

    Joint work with M. Hebiri and J. Lederer

    Download the slides

  • Marco Grzegorczyk

    Bayesian regularization of non-homogeneous dynamic Bayesian networks by coupling interaction parameters

    The objective of systems biology research is the elucidation of the regulatory networks and signalling pathways of the cell. The ideal approach would be the deduction of a detailed mathematical description of the entire system in terms of a set of coupled non-linear differential equations. As high-throughput measurements are inherently stochastic and most kinetic rate constants cannot be measured directly, the parameters of the system would have to be estimated from the data. Unfortunately, standard optimization techniques in high-dimensional multimodal parameter spaces are not robust, and model selection is impeded by the fact that more complex pathway models would always provide a better explanation of the data than less complex ones, rendering this approach intrinsically susceptible to over-fitting. To assist the elucidation of regulatory networks, dynamic Bayesian networks can be employed. The idea is to simplify the mathematical description of the biological system by replacing coupled differential equations by conditional probability distributions. This results in a scoring function (marginal likelihood) of closed form that depends only on the structure of the network and avoids the over-fitting problem. Markov Chain Monte Carlo (MCMC) algorithms can be applied to search the space of network structures for those that are most consistent with the data.

    To relax the homogeneity assumption of classical dynamic Bayesian networks (DBNs), various recent studies have combined DBNs with multiple changepoint processes. The underlying assumption is that the parameters associated with time series segments delimited by multiple changepoints are a priori independent. However, the assumption of prior independence is unrealistic in many real-world applications, where the majority of segment-specific regulatory relationships among the interdependent quantities tend to undergo minor and gradual adaptations. Moreover, for sparse time series, as typically available in many systems biology applications, inference suffers from vague posterior distributions, and could borrow strength from a systematic mechanism of information coupling.

    There are two approaches to information coupling in time series segmented by multiple changepoints: sequential information coupling, and global information coupling. In the former, information is shared between adjacent segments. In the latter, segments are treated as interchangeable units, and information is shared globally. Sequential information coupling is appropriate for a system in the process of development, e.g. in morphogenesis. Global information coupling, on the other hand, is more appropriate when time series segments are related to different experimental scenarios or environmental conditions. These coupling schemes have been applied to the regularization of DBNs with time-varying network structures, by penalizing network structure changes sequentially or globally. However, these approaches do not address the information coupling with respect to the interaction parameters and assume complete parameter independence among time segments.

    In the talk I will present two novel non-homogeneous dynamic Bayesian network models for sequential [1] and global [2,3] information sharing with respect to the interaction parameters.

    REFERENCES:
    [1] Grzegorczyk, M. and Husmeier, D. (2012a): A non-homogeneous dynamic Bayesian network model with sequentially coupled interaction parameters for applications in systems and synthetic biology. Statistical Applications in Genetics and Molecular Biology (SAGMB), vol.11 (4), Article 7.
    [2] Grzegorczyk, M. and Husmeier, D. (2012b): Bayesian regularization of non-homogeneous dynamic Bayesian networks by globally coupling interaction parameters. Journal of Machine Learning Research (JMLR), Workshop and Conference Proceedings (AISTATS2012), 22, 467-476.
    [3] Grzegorczyk, M. and Husmeier, D. (2013): Regularization of Non-Homogeneous Dynamic Bayesian Networks with Global Information-Coupling based on Hierarchical Bayesian models. Machine Learning, vol. 91(1), 105-154.



    Download the slides


Supported by




BTK, Amsterdam 2014