open_cp.seppexp¶

seppexp¶

Implements the ETAS (Epidemic Type Aftershock-Sequences) model intensity estimation scheme outlined in Mohler et al. (2015). This model is somewhat different, and simplified, from that used in the open_cp.sepp module:

This is an explicitly grid based model. All events are assigned to the grid cell in which the occur, and we make no more use of their location.
For each cell, we produce an independent estimate of the background rate of events.
We model “self-excitation” only in time, as a simple exponential decay (much like the classical Hawkes model in Financial mathematics). We assume the decay parameters are the same across all grid cells.

References¶

Mohler et al, “Randomized Controlled Field Trials of Predictive Policing”, Journal of the American Statistical Association (2015) DOI:10.1080/01621459.2015.1077710
Lewis, Mohler, “A Nonparametric EM Algorithm for Multiscale Hawkes Processes” in Proceedings of the 2011 Joint Statistical Meetings, pp. 1–16 http://math.scu.edu/~gmohler/EM_paper.pdf

class open_cp.seppexp.SEPPPredictor(region, grid_size, omega, theta, mu)¶

Bases: open_cp.predictors.DataTrainer

Returned by SEPPTrainer encapsulated computed background rates and triggering parameters. This class allows these to be evaluated on potentially different data to produce predictions.

background_prediction()¶

Make a “prediction” just using the background rate. Useful as it allows a direct comparison with the output of predict().

Returns:	Instance of `open_cp.predictors.GridPredictionArray`

background_rate(x, y)¶: Return the background rate in grid cell (x,y).

predict(predict_time, cutoff_time=None)¶

Make a prediction at a time, using the data held by this instance. That is, evaluate the background rate plus the trigger kernel at events before the prediction time. Optionally you can limit the data used, though this is against the underlying statistical model.

Parameters:	predict_time – Time point to make a prediction at. cutoff_time – Optionally, limit the input data to only be from before this time.
Returns:	Instance of `open_cp.predictors.GridPredictionArray`

class open_cp.seppexp.SEPPTrainer(region, grid_size=50)¶

Bases: open_cp.predictors.DataTrainer

Use the algorithm described in Mohler et al. 2015. The input data is placed into grid cells, and background rates estimated for each cell. The parameters for the exponential decay model of self-excitation are also estimated. The returned object can be used to make predictions of risk from other data.

Parameters:	region – The rectangular region the grid should cover. grid_size – The size of grid to use.

train(cutoff_time=None, iterations=20)¶

Perform the (slow) training step on historical data. This estimates kernels, and returns an object which can make predictions.

Parameters:	cutoff_time – If specified, then limit the historical data to before this time.
Returns:	A `SEPPPredictor` instance.

open_cp.seppexp.maximisation(cells, omega, theta, mu, time_duration)¶

Perform an iteration of the EM algorithm.

Parameters:	cells – An array (of any shape) each entry of which is an array of times of events, in increasing order. mu – An array, of the same shape as cells, giving the background rate in each cell. time_duration – The total time range of the data.
Returns:	Triple (omega, theta, mu) of new estimates.

open_cp.seppexp.maximisation_corrected(cells, omega, theta, mu, time_duration)¶

Perform an iteration of the EM algorithm. This version applies “edge corrections” (see Lewis, Mohler) which take account of the fact that by looking at a finite time window, we ignore aftershocks which occur after the end of the time window. This leads to better parameter estimation when omega is small.

Parameters:	cells – An array (of any shape) each entry of which is an array of times of events, in increasing order. mu – An array, of the same shape as cells, giving the background rate in each cell. time_duration – The total time range of the data.
Returns:	Triple (omega, theta, mu) of new estimates.

open_cp.seppexp.p_matrix(points, omega, theta, mu)¶

Computes the probability matrix. Diagonal entries are the background rate, and entry [i,j] is g(points[j] - points[i]) for i<j, where \(g(t) = heta \omega e^{-\omega t}\). Finally we normalise the matrix to have columns which sum to 1.

Parameters:	points – A one-dimensional array of the times of events, in increasing order. omega – The scale of the “triggering” exponential distribution theta – The rate of the “triggering” intensity mu – The background Poisson process rate.
Returns:	The normalised probability matrix.