-
Arnak Dalalyan
On the performance of the Lasso in terms of prediction loss
Although the Lasso has been extensively studied, the relationship
between its prediction performance and the correlations of the covariates
is not yet fully understood. In this talk, we give new insights into
this relationship in the context of regression with deterministic design.
We show, in particular, that the incorporation of a simple correlation
measure into the tuning parameter leads to a nearly optimal prediction
performance of the Lasso even when the elements of the dictionary
are highly correlated. However, we also reveal that for moderately
correlated dictionary, the performance of the Lasso can be mediocre irrespective
of the choice of the tuning parameter. For the illustration of our approach
with an important application, we deduce nearly optimal rates for
the least-squares estimator with total variation penalty.
Joint work with M. Hebiri and J. Lederer
Download the slides
-
Marco Grzegorczyk
Bayesian regularization of non-homogeneous dynamic Bayesian networks by
coupling interaction parameters
The objective of systems biology research is the elucidation of the regulatory
networks and signalling pathways of the cell. The ideal approach would be
the deduction of a detailed mathematical description of the entire system
in terms of a set of coupled non-linear differential equations.
As high-throughput measurements are inherently stochastic and most
kinetic rate constants cannot be measured directly, the parameters of the system
would have to be estimated from the data. Unfortunately, standard optimization
techniques in high-dimensional multimodal parameter spaces are not robust,
and model selection is impeded by the fact that more complex pathway models
would always provide a better explanation of the data than less complex ones,
rendering this approach intrinsically susceptible to over-fitting. To assist
the elucidation of regulatory networks, dynamic Bayesian networks can
be employed. The idea is to simplify the mathematical description of
the biological system by replacing coupled differential equations by
conditional probability distributions. This results in a scoring function
(marginal likelihood) of closed form that depends only on the structure
of the network and avoids the over-fitting problem. Markov Chain Monte Carlo (MCMC)
algorithms can be applied to search the space of network structures for those
that are most consistent with the data.
To relax the homogeneity assumption of classical dynamic Bayesian networks (DBNs),
various recent studies have combined DBNs with multiple changepoint
processes. The underlying assumption is that the parameters associated
with time series segments delimited by multiple changepoints are a priori
independent. However, the assumption of prior independence is unrealistic
in many real-world applications, where the majority of segment-specific
regulatory relationships among the interdependent quantities tend to undergo
minor and gradual adaptations. Moreover, for sparse time series, as typically
available in many systems biology applications, inference suffers from vague
posterior distributions, and could borrow strength from a systematic mechanism
of information coupling.
There are two approaches to information coupling in time series segmented
by multiple changepoints: sequential information coupling, and global information
coupling. In the former, information is shared between adjacent segments.
In the latter, segments are treated as interchangeable units, and information
is shared globally. Sequential information coupling is appropriate for a system
in the process of development, e.g. in morphogenesis. Global information coupling,
on the other hand, is more appropriate when time series segments are related
to different experimental scenarios or environmental conditions. These coupling
schemes have been applied to the regularization of DBNs with time-varying network
structures, by penalizing network structure changes sequentially or globally. However,
these approaches do not address the information coupling with respect to the interaction
parameters and assume complete parameter independence among time segments.
In the talk I will present two novel non-homogeneous dynamic Bayesian network
models for sequential [1] and global [2,3] information sharing
with respect to the interaction parameters.
REFERENCES:
[1] Grzegorczyk, M. and Husmeier, D. (2012a): A non-homogeneous dynamic
Bayesian network model with sequentially coupled interaction parameters for
applications in systems and synthetic biology.
Statistical Applications in Genetics and Molecular Biology (SAGMB), vol.11 (4), Article 7.
[2] Grzegorczyk, M. and Husmeier, D. (2012b): Bayesian regularization of
non-homogeneous dynamic Bayesian networks by globally coupling interaction
parameters.
Journal of Machine Learning Research (JMLR),
Workshop and Conference Proceedings (AISTATS2012), 22, 467-476.
[3] Grzegorczyk, M. and Husmeier, D. (2013): Regularization of Non-Homogeneous
Dynamic Bayesian Networks with Global Information-Coupling based on
Hierarchical Bayesian models.
Machine Learning, vol. 91(1), 105-154.
Download the slides
|