Skip to content

Posts

Currently, I only have some (rather oudated) posts on the subject of my Master's thesis. I wonder how many years it will take me to finish the series? And also, how I can incorporate graph embeddings into the subject?

Centrality Measures and The Networked SIR Model

Note: In this, and all posts in this series, the terms "graph" and "network" are used interchangeably. The same applies to the terms "node", "vertex" and "individual".

In this post I will be talking about how observations of network centralities, from simulations of the networked SIR model (previous post), may be used as predictors when applying the SINDy systems identification method (another previous post).

Centrality measures have been considered before in the literature in connection with the networked SIR model, for example in [Dekker, 2013],[Bucur et al., 2020] and [Yuan et al., 2013], but I have not seen papers where aggregates of centrality measures, for nodes in the different compartments, are tracked over the duration of a simulation, which is what we will be looking at here.

What is a network centrality measure?

In the following let \(V\) be the vertex set of a finite graph \(G\), with cardinality \(|V|=N\).

Network centrality measures offer a way to quantify connectivity properties of a vertex \(v_j \in V\) according to some criteria. Depending on the application, the measure establishes the importance of the vertex from a connectivity standpoint.

Well known centrality measures include for example degree centrality, betweenness centrality, eigenvector centrality and closeness centrality amongst others.

For example, the degree centrality \( C_D(v_j) \)of a vertex \(v_j\) is the number of edges that are adjacent to the it, and so coincides with the vertex degree \( deg(v_j) \).

The betweenness centrality \(C_B(v_j)\) for \(v_j \in V\) is defined as the sum of the fractions of shortest paths (counting edges) in \(G\) going through \(v_j\) between all vertice pairs \(v_i,v_k \in V\), where \(i,k,j \in \{1,2,\ldots,N\}\) and \(i\neq j,\, i < k \neq j\), times a normalizing factor.

\[ C_B(v_j) = \frac{2}{(N-1)(N-2)}\, \sum_{i \neq j} \sum_{i < k \neq j} \frac{\sigma_{ik}(j)}{\sigma_{ik}} \]

where \( \sigma_{ik}(j) \) denotes the number of shortest paths from \(v_i\) to \(v_k\) going through \(v_j\), and \(\sigma_{ik}\) is the total number of shortest paths from \(v_i\) to \(v_k\) [Dekker, 2013].
The normalizing factor \(\frac{2}{(N-1)(N-2)}\) above corresponds to the one used in the igraph package, when computing the betweenness centrality in R (using igraph::betweenness with the normalized=TRUE option), but one may come across other factors in the literature.
Betweenness centrality can for example be used to identify bottlenecks in traffic networks, since traffic usually takes the shortest route from A to B.

Aggregates of centralities

In my master's thesis I wanted to find some way to use dynamic centrality observations from simulations as predictors when applying SINDy. My thinking was that an infected node's centrality measure, for example considering degree centrality, would clearly influence the time derivative target that we want to approximate when using SINDy. The solution I came up with, was to use the fraction of the centrality sum accounted for by the individuals in the \(S(t),\,I(t)\) and \(R(t)\) vertex sets at time \(t\). If I may elaborate...

Let \( C_*(v_i) \) be any per-node centrality measure for a graph \(G\) with \(v_i \in V \), then \( C_*(G) = \sum_{j=1}^N C_*(v_j) \) is a graph invariant. The quatity

\[ s_C(t) = \frac{1}{C_*(G)} \sum_{v_j \in S(t)} C_*(v_j) \]

is an \(s(t)\)-like trajectory, but in terms of a fraction of a graph invariant, instead of fraction of a population. The case being similar for the \(i_C(t)\) and \(r_C(t)\) fractions. Depending on the measure \(C_*\) being used, the different centrality-sum fractions (CSFs) above provide dynamical information about how the pathogen in the networked SIR model is spreading through the population from a topological standpoint. It is somewhat unclear exactly what information is being conveyed, but the assumption is that the information is useful in a machine learning context, i.e using SINDy.

Another thing to consider, is that a model of the networked SIR dynamics using CSFs as predictors will have more than 3 equations — one additional for every CSF used as a predictor, unless one assumes that the CSFs are known functions of time, which would render the model less useful in an applied sense.
The good news is, that the sum of CSFs is a conserved quantity, i.e. \(s_C + i_C + r_C = 1\), which enables us to use som "tricks" when finding a model with SINDy. More on this in the final post in this series.

Simulation example

Let's see what these CSF trajectories look like when using degree centrality and betweenness centrality.
I've made a small modification to the networked-sir.R code from the previous post. The networked_sir() function now takes an additional vector C as an argument which gives some per-vertex centrality measure. The CSFs are then computed using the csf() function, and returned in observation data frame. In the simulation below, we use degree centrality and betweenness centrality respectively.

simulation using degree centrality
Plot 1 of networked SIR model using degree centrality for CSFs.
simulation using degree centrality
Plot 2 of networked SIR model using betweenness centrality for CSFs.

As can be seen from the plots above the fraction of the centrality-sum accounted for by the nodes in the \(S(t),I(t)\) and \(R(t)\) vertex sets are far from the same as the fraction of population in the same sets.
Comparing \(i(t)\) and \(i_C(t)\) in the two plots, we see that around the peak of the epidemic it is the nodes with high degree and betweenness that are infected. Since an infected node with many edges (high degree) is likely to increase the infection rate of many susceptible neighbours, and thereby likely the estimated derivative of \(i(t)\) and \(s(t)\), it seems natural to use the degree-CSFs as predictors in regression where the target is the estimated derivative (i.e. the SINDy stage).
The case is not so clear for betweenness-CSFs, since a vertex can have high betweenness but low degree. Even so, CSFs using degree and betweenness centrality are strongly correlated ( \( \approx 0.97 \) for the two \(i_C(t) \) time-series above).

Conclusion

It appears that using CSFs provides us with additional information about how a pathogen spreads through a population in the networked SIR model.
In the next post I will be using CSFs as predictors in sparse regression when using SINDy, and leave it up to the LASSO algorithm to decide whether the CSF predictors should be included in the resulting model.

References

[Dekker, 2013] Dekker, A. H. Network centrality and super-spreaders in infectious disease epidemiology. Proc. - 20th Int. Congr. Model. Simulation, MODSIM 2013 331–337 (2013) doi:10.36334/modsim.2013.a5.dekker.

[Bucur et al., 2020] Bucur, D. & Holme, P. Beyond ranking nodes: Predicting epidemic outbreak sizes by network centralities. PLoS Comput. Biol. 16, 1–20 (2020).

[Yuan et al., 2013] Yuan, C., Chai, Y., Li, P. & Yang, Y. Impact of the network structure on transmission dynamics in complex networks. IFAC Proc. Vol. 13, 218–223 (2013).

The Networked SIR model

In the previous post we introduced the SIR model, and argued that a networked model might be more appropriate if one seeks to relax the assumption, that transmission of a pathogen can occur from any individual in the \(I\) compartment, to any individual in the \(S\) compartment (with the compartments viewed as sets of individuals instead of counts), and instead only is possible if there exists an edge, or symmetric relation between the individuals in a network that models the social interactions of a population of size \(N\).

This of course introduces the new assumption, that the network is a useful model of the social interactions (over the duration of the epidemic, in case of a static network, which we will assume the network is).

In this post I will show (with R code) how to perform stochastic simulations of a networked SIR model using the Gillespie algorithm [Gillespie, 1977], while representing the network using the igraph package.

Introduction

The networked SIR model is treated in e.g. [Kiss et al., 2017], and assumes that transmission of a pathogen, from an infected to a connected susceptible individual, takes place as a Poisson process with rate \(\hat \beta\) — the per-edge transmission rate, while recovery of an infected individual takes place as a Poisson process with rate \(\gamma\), independent of the network.

So, if a susceptible individual \(s_j \in S, \; j \in \{1,2,\ldots,N\} \) has \(n_j \in \mathbb N,\; 0 \leq n_j \leq N-1\) infected neighbours, then transmission to \(s_j\) takes place as a Poisson process with rate \( \lambda_j = n_j\,\hat \beta\).

A Poisson process is characterized by its exponentially distributed waiting times between events.
Let \( \tau_j \) denote the waiting time until \(s_j\) (with \(n_j\) infected neighbours) becomes infected, then \(\tau_j \) is exponentially distributed with density

\[ f_{\tau_j}(t;\,\lambda_j) = \lambda_j\,\exp^{-\lambda_j\,t} = n_j\,\hat \beta \,\exp^{-n_j\,\hat \beta\,t} \]

If at some time \(t \geq 0\) we have \(s_j,s_k \in S \), and \(s_j,s_k\) have infected neighbours. Then which individual gets infected first?
This is a case of so-called racing exponentials, and it turns out that the probability \( P(s_j\, \text{is infected first})=\frac{\lambda_j}{\lambda_j + \lambda_k} \), i.e. proportional to the rate, and similarly for \(s_k\). This fact is utilized in the Gillespie algorithm.

The Gillespie algorithm

The Gillespie algorithm was initially invented in order to perform stochastic simulations of chemical reaction systems, given by deterministic reaction-rate ODEs. Gillespie argued that stochastic effects could result in chemical reactions having realizations that were far from what the deterministic models predicted, and thus advocated using stochastic simulations as a complementary tool in conjunction with classical deterministic models when working with chemical reaction systems.

We will not concern ourselves with these academic discussions, but rather focus on using the Gillespie algorithm to simulate the networked SIR model.

We do this by simply considering infection and recovery events, instead of chemical reaction events, since both types of systems assume exponentially distributed waiting times between events. Then all that is required, is that we identify which types of events that can occur, and their rate.

Say we have \(m\) types of events. At the core of the Gillespie algorithm is the joint distribution

\[ P(\tau,j),\qquad \tau \in \mathbb R,\; j \in \{1,2,\ldots,m\} \]

which is the probability, given the state of the system at time \(t\), that the next event will occur at time \(t+\tau \), and will be an event of type \(j\).
It turns out that we can sample from this distribution first by sampling \(\tau\) from an exponential distribution with rate \(\lambda\) — the sum of all event rates at time \(t\), i.e. \( \lambda = \sum_{j=1}^m \lambda_j \), and then sample the type of event \(j\) with probability proportional the event rate \(\lambda_j,\; j \in \{1,2,\ldots,m\}\).
But we are getting ahead of ourselves! We first need a network of individuals...

Network representation

We can use the igraph package in R to represent the network of individuals in the networked SIR model.
Every individual in the population of size \(N\) is represented by a vertex in the network. The question is now: What is the topology of this network? Do we have to painstakingly construct this network by hand or what?
Random graphs to the rescue! Unless we want to use a real-world network, we simply sample the network uniformly from the set of connected networks with a specified degree sequence, which as the term suggests, gives the degree (number of connections) of all the vertices. We can do this with igraph package using the degree.sequence.game function. For example:

library(igraph)

seq <- c(3,3,2,1,1) # Degree sequence
G <- degree.sequence.game(seq, method="vl")
plot(G)
Plot of example network
Plot of an example network with prescribed degree sequence seq produced by the code snippet above.

Note that the degree sequence needs to be graphical, i.e. the sequence sum needs to be even, and realizable as a simple graph — this can be determined using the Havel-Hakimi algorithm (Wikipedia).

So, if we have an assumption on the degree distribution of the population, from which the degree sequence can be sampled, then we can, given that the degree sequence is realizable, get a igraph object \(G\) that we can use in the implementation of the stochastic simulation.

Simulation code

In the implementation below we have a state vector \(X\) of length \(N\). The entries in the state vector can take on the values 0,1 or 2, corresponding to a susceptible, infected or recovered individual.
We also have a vector of rates \(\lambda\) of length \(N\), where some entries may be zero, depending on the state of the corresponding individual, and the state of the neighbours.
After initialization we sample the joint distribution \(P(\tau, j)\) mentioned above, and update \(X\) to reflect the new state of the system as a result of event \(j\). We also update the rates that are affected by this new state, i.e. some susceptible individuals may have more or fewer infected neighbours. Then repeat.
The simulation stops and returns the observations when there are no more infected individuals, or \(t > tmax\).

Let's try out this code by sampling a random graph with 1000 vertices or individuals, with a degree sequence drawn from a geometric distribution, with \(\hat \beta = 0.5 \) and \(\gamma = 1 \). Initially 99% of the population is susceptible, while 1% is infected.

Plot of an example simulation
Plot of a simulation using networked-sir.R with prescribed degree sequence drawn from a geometric distribution with \(p=0.2\) and theoretical mean \(\frac{1}{p}=5\).

In the plot of the simulation, we see the stochastic effects in the \(s,i\) and \(r\) trajectories. Compare this, for example, with the trajectories produced by the deterministc model in my previous post: The SIR Model.

In a future post we will examine the effects that different degree distributions have on the dynamics of the networked SIR model, and use the SINDy framework to obtain a parsimonious (deterministic and continuous) differential equation model of the networked SIR dynamics given different degree distributions.

I hope you liked the post. Please leave any questions or comments in the commenting section below!

References

[Kiss et al., 2017] Kiss, I. Z., Miller, J. C. & Simon, P. L. Mathematics of epidemics on networks: From exact to approximate models. in Interdisciplinary Applied Mathematics vol. 46 1–413 (Springer, 2017).

[Gillespie, 1977] Gillespie, D. T. Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81, 2340–2361 (1977).

The Susceptible, Infected, Recovered (SIR) model

If you dabble in mathematical modeling, chances are that you (because of COVID-19) have been exposed to the SIR epidemiological model in the last year or two.

There is a wealth of material on the SIR model available online, so I will only cover the absolute basic points that are needed to aid understanding of the networked SIR model later on in this blog-post series. At least it well enable us to establish some notation.

The SIR model is a simple compartmental model of how an uncontrolled epidemic evolves in a population of size \(N\). I.e. it gives the change over time \( t \) (in the form of 3 coupled ODEs) of the fractions \(s(t)=S(t)/N,\,i(t)=I(t)/N\) and \(r(t)=R(t)/N\) of the population in the Susceptible (\(S(t)\)), Infected (\(I(t)\)) and Recovered (\(R(t)\)) compartments respectively at time \(t\), given that a certain fraction of the population is infected initially.

\[ \begin{split} \frac{d}{dt}\,s(t) = \dot s(t) &= -\beta \, s(t)\, i(t) \\ \frac{d}{dt}\,i(t) = \dot i(t) &= \beta \,s(t) \,i(t) - \gamma \,i(t) \qquad (1) \\ \frac{d}{dt}\,r(t) = \dot r(t) &= \gamma \, i(t) \\ \end{split} \]

where \( \beta > 0 \) is the infection rate, and \( \gamma > 0 \) is the recovery rate.
Note that \( s(t) + i(t) + r(t) = 1 \) is a conserved quantity, which also means that one of the equations is redundant (something we will make use of later when applying SINDy to stochastic simulations of the networked SIR model).

Even though the model is deterministic, a probabilistic interpretation can sometimes aid one's intuition.
The \(s,i\) and \(r\) fractions (dropping the dependence on \( t \)) can be viewed as the probabilities that an individual picked at random from the population belongs to the \(S,I\) or \(R\) compartments respectively. Hence, \(\frac{d}{dt}\,s = \dot s \) is the change in the \(s\) probability over an infinitesimal time interval \(dt\).

Taking this probabilistic view, we see that \(\dot s\) is negatively proportional to the probability of picking an \((S,I)\) pair of individuals at random (with replacement) from the population.

Because of conservation of probability (\(s+i+r=1\)), this quantity appears in the \( \dot i \) equation with positive sign, since \(S \rightarrow I \rightarrow R \) is the only trajectory an individual can take through the compartments.

In the standard SIR model an individual is permanently immune (or deceased) to the epidemic once recovered (in the \(R \) compartment). Therefore a recovered individual cannot end up in the \(S\) compartment again.

Hence individuals leaving the \(S\) compartment must enter the \(I\) compartment. Similarly, individuals leaving the \(I\) compartment at rate \(\gamma\) enter the \(R\) compartment at the same rate.

Basic analysis

It is easy to see, that in order for there to be an equilibrium, we need \(i(t)=0\).
Also, we see that in order for \( i(t) \) to be increasing (i.e. an outbreak) we must have

\[ \begin{split} \dot i > 0 & \implies \beta \,s \,i - \gamma \,i > 0 \\ & \implies \beta \,s \,i > \gamma \,i \\ & \implies s > \frac{\gamma}{\beta} \quad (0 < i \leq 1) \end{split} \]

Since \(s(t)\) is always less than 1 (when there is an epidemic), then the ratio \( \frac{\gamma}{\beta} \) must be less than 1 in order for it to be possible that \(i(t)\) be increasing at any time \(t\).
The ratio \( \frac{\gamma}{\beta} \) is the reciprocal of \(R_0\)the basic reproductive ratio \( (R_0 = \frac{\beta}{\gamma}) \) [Kiss et al., 2017].

Ok, let's see some examples of the SIR model in action using R!

Code example

We can numerically integrate the ODE system in (1) using deSolve::ode. Below we are using \(\beta = 2\) and \(\gamma = 1\), with initial conditions \(s(0) = 0.99, i(0)=0.01,r(0)=0\).

Plot of system in (1)
Plot of the system in (1) produced by the supplied code snippet above.

In the figure above we see the characteristic trajectories for the SIR model, when only a small fraction of the population is infected initially. We see that in this scenario approximately 80% of the population is infected at some time, with the maximum number of infected being around 16% for \(t\approx 4\).

This is all well and good, but let's return to our probabilistic interpretation of the SIR model above: There, \( \dot s \) is negatively proportional to the probability of picking an \( (S,I) \) pair, hence assuming that all individuals in the population may interact with each other. Otherwise there couldn't be any pathogen transmission resulting in infected individuals.

One might argue that this assumption is unrealistic, especially for large populations, and that this could be mitigated by letting the interactions be dependent upon a graph/network, that models which interactions are possible within the population.

This is exactly what is done in the networked SIR model. There, the individuals/agents are modelled as vertices in a graph, with interactions only being possible between vertices where there exists an edge (symmetric physio-social relation).
This is also where the modelling ceases to be deterministic and continuous. Instead transmission from an infected to a susceptible agent takes place as a Poisson process, i.e. a random or stochastic process, with rate \(\hat \beta\) — the per-edge infection rate, while recovery of agents is a Poisson process with rate \(\gamma\), independent of the graph [Kiss et al., 2017].

The networked SIR model which will be the subject of a future blog post, where we also cover how to do stochastic simulations of the same model using the Gillespie algorithm.

Stay tuned, and please leave your questions and comments in the commenting section below!

References

[Kiss et al., 2017] Kiss, I. Z., Miller, J. C. & Simon, P. L. Mathematics of epidemics on networks: From exact to approximate models. in Interdisciplinary Applied Mathematics vol. 46 1–413 (Springer, 2017).

Intro to Sparse Identification of Nonlinear Dynamics (SINDy)

Sparse Identification of Nonlinear Dynamics (SINDy) is a systems identification method introduced in [Brunton et al., 2016]. Here the authors postulate that most dynamical systems given by a (autonomous) ordinary differential equation (ODE)

\[ {d\over dt}\,s(t) = \dot s(t)= f(s(t)) \qquad (1) \]

(where \( s(t) \in \mathbb R \) is assumed to be at least once differentiable with respect to time \( t \)) typically have relatively few terms on the right-hand side, given by the function \( f \). And thus, given time-series observations of \( s(t) \), written here as \( s_{t_i},\; i \in \{1,2,\ldots,n-1,n\} \), one might be able to use sparse linear regression techniques to obtain a pointwise approximation of the derivative \(\dot s(t) \) in (1).

This results in a sparse linear regression estimator \(\xi \) (vector of coefficients with few non-zero entries), by which one can approximate the function \( f \).
Finding a sparse estimator \(\xi \) corresponds to finding a parsimonious model for the process \( s(t) \).

If the derivatives \( \dot s_{t_i},\; i \in \{1,2,\ldots,n-1,n\} \) are not part of the observations (which they seldom are), then these need to be approximated using, for example, finite difference, or the derivatives of a fitted smoothing spline, because these make out the target in the sparse linear regression.

Let's elaborate a little on this:
Say you have \(n\) observations \( s_{t_i},\; i \in \{1,2,\ldots,n-1,n\} \) of some dynamical process \(s(t)\) that is assumed to be differentiable with respect to time \(t\), and that you want to approximate the governing equation of \(s(t)\) in the form of (1).

When using SINDy we assume that \( f(s_{t_i}) \) can be approximated as

\[ f(s_{t_i}) \approx [\Theta\,\xi]_i \qquad (i'\text{th entry in vector})\]

where the \( \Theta \in \mathbb R^{n \times p} \) matrix has nonlinear functions of the observations as columns, and \( \xi \in \mathbb R^p \) is a column vector of coefficients to be determined.
For example using polynomial basis function in \( \Theta \) up to 4th order (no intercept)

\[ \Theta = \begin{bmatrix} s_{t_1} & s_{t_1}^2 & s_{t_1}^3 & s_{t_1}^4 \\ s_{t_2} & s_{t_2}^2 & s_{t_2}^3 & s_{t_2}^4\\ \vdots & \vdots & \vdots \\ s_{t_n} & s_{t_n}^2 & s_{t_n}^3 & s_{t_n}^4 \\ \end{bmatrix} \]

where we have \( p = 4 \).

Let \( \dot S \in \mathbb R^n \) be the column vector of derivatives, i.e. \( \dot S = (\dot s_{t_1}, \dot s_{t_2},\ldots, \dot s_{t_n})^\top \), then these can be written down (according to our assumptions) as

\[ \dot S = \Theta\,\xi + \epsilon\]

where \( \epsilon \) is the residual vector.

If one estimates \( \xi \) (minimizes the residual vector \( \epsilon \) ) using ordinary least squares (OLS), i.e.

\[ \xi^* = \text{arg}\min\limits_{\xi}\, ||\dot S - \Theta\,\xi||_2^2 \]

then the solution

\[ \xi^*=(\Theta^\top \Theta)^{-1}\Theta^\top \dot S \]

will have \( p \) non-zero entries. Nothing sparse about that...
Using SINDy, the point is rather to use sparse regression, that results in \( \xi^* \) having \( q,\; 0 < q < p \) non-zero entries, corresponding to \( q \) terms in the governing equation.
If \( q \) is small, and the fit is acceptable, then we have succeeded in finding a parsimonious approximation of the governing equations of \( s(t) \), i.e.

\[ \dot s(t) \approx \theta(s(t))^\top\,\xi^* \]

where \( \theta(s(t))^\top \) is a row vector of nonlinear functions of \( s(t) \).
I.e. using up to 4th order polynomial basis functions as above we have

\[ \dot s(t) \approx ( s(t), s(t)^2, s(t)^3, s(t)^4 )\,\xi^* \]

which are the same functions as in the columns of \( \Theta \).

The Least Absolute Shrinkage and Selection Operator (LASSO) [Tibshirani, 1996] is an ideal candidate method to use in the sparse regression step, which also is noted by the authors of [Brunton et al. 2016].
The LASSO finds the estimator by minimizing

\[ \xi^* = \text{arg}\min\limits_{\xi}\, ||\dot S - \Theta\,\xi||_2^2 + \lambda\,||\xi||_1 \qquad (2) \]

where \( \lambda > 0 \) imposes a penalty on the \( \ell_1 \) norm of the estimator. Using the \( \ell_1 \) norm results in (depending on the magnitude of \( \lambda \)) some entries of \( \xi^* \) being effectively zero. Thereby, the LASSO acts as a model selection method parameterized by \( \lambda \). Note that the LASSO estimator converges to the OLS estimator as \( \lambda \rightarrow 0 \).

The solution to (2) has to be found numerically, and one can for example use the lars or glmnet packages in R to achieve this.
For estimating derivatives one can for example use smooth.spline from the stats package.

Conclusion

SINDy is a powerful, yet simple, method to approximate the governing equations of systems, for which there is some data available.
In addition, is is quite modular, in the sense that there are many options available for estimating derivatives, constructing the \( \Theta \) matrix, and obtaining the linear estimator \( \xi \).
A crucial question is which basis functions to use in \( \Theta \). Here physical knowledge of the system under investigation can provide some clues.
Obtaining a system of ordinary differential equations (ODEs) can be achieved (in a sequential manner) by a straight-forward generalization of the above — just include all predictors (and functions/interactions thereof) in the \( \Theta \) matrix, and fit the derivatives using for example LASSO.

In this series of blog articles, we will be using SINDy to obtain a parsimonious, model of the Networked SIR Model, i.e. we'll take data from simulations of a stochastic, continuous time, discrete state model, and obtain a deterministic, continuous time/state model by way of SINDy. In doing so, we hope via dynamical system techniques, to obtain some insights regarding the effects of network topology on the dynamics of the simulation.

References:

[Brunton et al. 2016] Brunton, S. L., Proctor, J. L., Kutz, J. N. & Bialek, W. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. U. S. A. 113, 3932–3937 (2016)

[Tibshirani, 1996] Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996).