Intro to Sparse Identification of Nonlinear Dynamics (SINDy)
Sparse Identification of Nonlinear Dynamics (SINDy) is a systems identification method introduced in [Brunton et al., 2016]. Here the authors postulate that most dynamical systems given by a (autonomous) ordinary differential equation (ODE)
(where \( s(t) \in \mathbb R \) is assumed to be at least once differentiable with respect to time \( t \))
typically have relatively few terms on the right-hand side, given by the function \( f \). And thus, given time-series observations of \( s(t) \), written here as \( s_{t_i},\; i \in \{1,2,\ldots,n-1,n\} \),
one might be able to use sparse linear regression techniques to obtain a pointwise approximation of the derivative \(\dot s(t) \) in (1).
This results in a sparse linear regression estimator \(\xi \) (vector of coefficients with few non-zero entries),
by which one can approximate the function \( f \).
Finding a sparse estimator \(\xi \) corresponds to finding a parsimonious model for the process \( s(t) \).
If the derivatives \( \dot s_{t_i},\; i \in \{1,2,\ldots,n-1,n\} \) are not part of the observations (which they seldom are), then these need to be approximated using, for example, finite difference, or the derivatives of a fitted smoothing spline, because these make out the target in the sparse linear regression.
Let's elaborate a little on this:
Say you have \(n\) observations \( s_{t_i},\; i \in \{1,2,\ldots,n-1,n\} \) of some dynamical process \(s(t)\)
that is assumed to be differentiable with respect to time \(t\),
and that you want to approximate the governing equation of \(s(t)\) in the form of (1).
When using SINDy we assume that \( f(s_{t_i}) \) can be approximated as
where the \( \Theta \in \mathbb R^{n \times p} \) matrix has nonlinear functions of the observations as columns,
and \( \xi \in \mathbb R^p \) is a column vector of coefficients to be determined.
For example using polynomial basis function in \( \Theta \) up to 4th order (no intercept)
where we have \( p = 4 \).
Let \( \dot S \in \mathbb R^n \) be the column vector of derivatives, i.e. \( \dot S = (\dot s_{t_1}, \dot s_{t_2},\ldots, \dot s_{t_n})^\top \), then these can be written down (according to our assumptions) as
where \( \epsilon \) is the residual vector.
If one estimates \( \xi \) (minimizes the residual vector \( \epsilon \) ) using ordinary least squares (OLS), i.e.
then the solution
will have \( p \) non-zero entries. Nothing sparse about that...
Using SINDy, the point is rather to use sparse regression, that results in \( \xi^* \) having \( q,\; 0 < q < p \) non-zero entries,
corresponding to \( q \) terms in the governing equation.
If \( q \) is small, and the fit is acceptable, then we have succeeded in finding
a parsimonious approximation of the governing equations of \( s(t) \), i.e.
where \( \theta(s(t))^\top \) is a row vector of nonlinear functions of \( s(t) \).
I.e. using up to 4th order polynomial basis functions as above we have
which are the same functions as in the columns of \( \Theta \).
The Least Absolute Shrinkage and Selection Operator (LASSO) [Tibshirani, 1996] is an ideal candidate method to use in the sparse regression step,
which also is noted by the authors of [Brunton et al. 2016].
The LASSO finds the estimator by minimizing
where \( \lambda > 0 \) imposes a penalty on the \( \ell_1 \) norm of the estimator. Using the \( \ell_1 \) norm results in (depending on the magnitude of \( \lambda \)) some entries of \( \xi^* \) being effectively zero. Thereby, the LASSO acts as a model selection method parameterized by \( \lambda \). Note that the LASSO estimator converges to the OLS estimator as \( \lambda \rightarrow 0 \).
The solution to (2) has to be found numerically, and one can for example use the lars or glmnet packages in R to achieve this.
For estimating derivatives one can for example use smooth.spline from the stats package.
Conclusion
SINDy is a powerful, yet simple, method to approximate the governing equations of systems, for which there is some data available.
In addition, is is quite modular, in the sense that there are many options available for estimating derivatives, constructing the \( \Theta \) matrix,
and obtaining the linear estimator \( \xi \).
A crucial question is which basis functions to use in \( \Theta \). Here physical knowledge of the system under investigation can provide some clues.
Obtaining a system of ordinary differential equations (ODEs) can be achieved (in a sequential manner) by a straight-forward generalization of the above —
just include all predictors (and functions/interactions thereof) in the \( \Theta \) matrix, and fit the derivatives using for example LASSO.
In this series of blog articles, we will be using SINDy to obtain a parsimonious, model of the Networked SIR Model, i.e. we'll take data from simulations of a stochastic, continuous time, discrete state model, and obtain a deterministic, continuous time/state model by way of SINDy. In doing so, we hope via dynamical system techniques, to obtain some insights regarding the effects of network topology on the dynamics of the simulation.
References:
[Brunton et al. 2016] Brunton, S. L., Proctor, J. L., Kutz, J. N. & Bialek, W. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. U. S. A. 113, 3932–3937 (2016)
[Tibshirani, 1996] Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996).