open_cp.sepp

sepp

Implements the ETAS (Epidemic Type Aftershock-Sequences) model intensity estimation scheme outlined in Mohler et al. (2011).

As this is a statistical model, we separate out the statistical optimisation procedure into a separate class StocasticDecluster. This allows testing and exploration of the model without worry about real world issues such as time-stamps.

We can think of this algorithm in terms of a “machine learning” workflow, and separate a “training” stage from a “prediction” stage. The statistical model is that we have a “background” rate of random events, and then that existing events cause a time/space localised increase in risk, described by a “trigger” kernel. The trigger kernel does not vary with the time/space location of the event (which is perhaps a limit of the model). As such, both the background and trigger kernels should be fairly constant in time, and so if “trained” on historical data, should be valid to make predictions for, say, the next few weeks or months. (Over long time scales, we should expect the background kernel to change.)

This is also useful in practise, as the training stage is slow, but once trained, the kernels can quickly be evaluated to make predictions.

References

  1. Mohler et al, “Self-Exciting Point Process Modeling of Crime”, Journal of the American Statistical Association, 2011, DOI: 10.1198/jasa.2011.ap09546
  2. Rosser, Cheng, “Improving the Robustness and Accuracy of Crime Prediction with the Self-Exciting Point Process Through Isotropic Triggering”, Appl. Spatial Analysis, DOI: 10.1007/s12061-016-9198-y
class open_cp.sepp.AverageTimeAdjustedKernel(kernel, time_end)

Bases: open_cp.kernels.Kernel

Wraps a Kernel instance, which supports the space_kernel and time_kernel interface, and builds a new kernel which is constant in time. The new, constant time intensity is computed by taking an average of the middle half of the original time kernel.

Parameters:
  • kernel – The original kernel to delegate to.
  • time_end – We assume that the original kernel is roughly correct for times in the range 0 to time_end, and then sample the middle half of this interval.
set_scale(value)

Set the overall scaling factor; the returned kernel is multiplied by this value.

space_kernel(points)

The space component of this kernel; defers to the delegate kernel.

Parameters:points – Pair of (x,y) coords, or array of shape (2,N) representing N points.
time_kernel(points)

The time component of this kernel; in this case constant.

Parameters:points – Scalar or one-dimensional array of time points.
class open_cp.sepp.OptimisationResult(kernel, p, background_kernel, trigger_kernel, ell2_error, time_cutoff=None, space_cutoff=None)

Bases: object

Contains results of the optimisation process.

Parameters:
  • kernel – the overall estimated intensity kernel.
  • p – the estimated probability matrix.
  • background_kernel – the estimatede background event intensity kernel.
  • trigger_kernel – the estimated triggered event intensity kernel.
  • ell2_error – an array of the L^2 differences between successive estimates of the probability matrix. That these decay is a good indication of convergence.
  • time_cutoff – Optionally specify the maximum time extent of the trigger_kernel used in calculations.
  • space_cutoff – Optionally specify the maximum space extent of the trigger_kernel used in calculations.
class open_cp.sepp.SEPPPredictor(result, epoch_start, epoch_end)

Bases: open_cp.predictors.DataTrainer

Returned by SEPPTrainer encapsulated computed background and triggering kernels. This class allows these to be evaluated on potentially different data to produce predictions.

When making a prediction, the time component of the background kernel is ignored, using AverageTimeAdjustedKernel. This is allowed, because the kernel estimation used looks at time and space separately for the background kernel. We do this because KDE methods don’t allow us to “predict” into the future.

This class also stores information about the optimisation procedure.

background_kernel

The original, non-adjusted background kernel estimated by the training algorithm.

predict(predict_time, cutoff_time=None)

Make a prediction at a time, using the data held by this instance. That is, evaluate the background kernel plus the trigger kernel at events before the prediction time. Optionally you can limit the data used, though this is against the underlying statistical model.

Parameters:
  • predict_time – Time point to make a prediction at.
  • cutoff_time – Optionally, limit the input data to only be from before this time.
Returns:

Instance of ContinuousPrediction

trigger_kernel

The trigger / aftershock kernel estimated by the training algorithm.

class open_cp.sepp.SEPPTrainer(k_time=100, k_space=15)

Bases: open_cp.predictors.DataTrainer

Use the algorithm described in Mohler et al. 2011. The kernel estimation used is the “kth nearest neighbour variable bandwidth Gaussian” KDE. This is a two-step algorithm: this class “trains” itself on data, and returns a class which can then make predictions, possibly on other data.

Parameters:
  • k_time – The kth nearest neighbour to use in the KDE of the time kernel; defaults to 100.
  • k_space – The kth nearest neighbour to use in the KDE of space and space/time kernels; defaults to 15.
as_time_space_points(cutoff_time=None)

Return a copy of the input data as an array of shape (3,N) of time/space points (without units), as used by the declustering algorithm. Useful when trying to understand what the algorithm is doing.

space_cutoff

To speed up optimisation, set this to the minimal distance at which we think the spacial triggering will be effectively zero. For real data, 500m is a reasonable estimate.

time_cutoff

To speed up optimisation, set this to the minimal time gap at which we think the spacial triggering will be effectively zero. For real data, 120 days is a reasonable estimate.

train(cutoff_time=None, iterations=40)

Perform the (slow) training step on historical data. This estimates kernels, and returns an object which can make predictions.

Parameters:
  • cutoff_time – If specified, then limit the historical data to before this time.
  • iterations – The number of iterations of the optimisation algorithm to apply.
Returns:

A SEPPPredictor instance.

trigger_kernel_estimator

The kernel estimator to use for triggered events. Defaults to a kth nearest neighbour variable-bandwidth Gaussian kernel estimator with the value of k set in the constructor.

class open_cp.sepp.StocasticDecluster(background_kernel_estimator=None, trigger_kernel_estimator=None, initial_time_bandwidth=144.0, initial_space_bandwidth=50.0, space_cutoff=500.0, time_cutoff=172800.0, points=None)

Bases: object

Implements the ‘stocastic declustering algorithm’ from Mohler et al (2011). This allows estimation of two time-space kernels, one for the background events, and one the ‘trigger’ kernel which elevates risk according to past events.

This class works with floating-point data, and exposes elements of the underlying optimisation algorithm. It is designed for testing and experimentation.

Parameters:
  • background_kernel_estimator – The kernel estimator to use for background events.
  • trigger_kernel_estimator – The kernel estimator to use for triggered / aftershock events.
  • initial_time_bandwidth – The bandwidth in time to use when making an initial classification of data into background or triggered events. Default is 0.1 day**(-1) in units of minutes (so 0.1*24*60).
  • initial_space_bandwidth – The bandwidth in space to use when making an initial classification of data into background or triggered events. Default is 50 units.
  • space_cutoff – The maximum distance we believe the triggered kernel will extend to in space. Decrease this to improve the speed of the estimation, at the cost of possibly missing data. Default is 500 units.
  • time_cutoff – The maximum distance we believe the triggered kernel will extend to in time. Decrease this to improve the speed of the estimation, at the cost of possibly missing data. Default is 120 days, in units of minutes (so 120*24*60).
  • points – The three dimensional data. points[0] is the times of events, and points[1] and points[2] are the x and y coordinates.
next_iteration(p)

Perform a single iteration of the optimisation algorithm:

  1. Samples background and triggered events using the p matrix.
  2. Estimates kernels from these samples.
  3. Normalises these kernels.
  4. Computes the new p matrix from these kernels.
Parameters:p – The matrix of probabilities to sample from.
Returns:A triple (p, bkernel, tkernel) where p is the new probability matrix, bkernel the kernel for background events used to compute p, and tkernel the kernel for triggered events.
run_optimisation(iterations=20)

Runs the optimisation algorithm by taking an initial estimation of the probability matrix, and then running the optimisation step. If this step ever classifies most events as background, or as triggered, then optimisation will fail. Tuning the initial bandwidth parameters may help.

Parameters:iterations – The number of optimisation steps to perform.
Returns:OptimisationResult instance
open_cp.sepp.initial_p_matrix(points, initial_time_bandwidth=0.1, initial_space_bandwidth=50.0)

Returns an initial estimate of the probability matrix. Uses a Gaussian kernel in space, and an exponential kernel in time, both non-normalised. Diagonal (i.e. background “probabilities”) are set to 1. Finally the matrix is normalised.

Parameters:
  • points – The (time, x, y) data.
  • initial_time_bandwidth – The “scale” of the exponential.
  • initial_space_bandwidth – The standard deviation of the Gaussian.
open_cp.sepp.make_kernel(data, background_kernel, trigger_kernel)

Produce a kernel object which evaluates the background kernel, and the trigger kernel based on the space-time locations in the data.

Parameters:
  • data – An array of shape (3,N) giving the space-time locations events. Used when computing the triggered / aftershock events.
  • background_kernel – The kernel object giving the background risk intensity.
  • trigger_kernel – The kernel object giving the trigger / aftershock risk intensity.
Returns:

A kernel object which can be called on arrays on points.

open_cp.sepp.make_space_kernel(data, background_kernel, trigger_kernel, time, time_cutoff=None, space_cutoff=None)

Produce a kernel object which evaluates the background kernel, and the trigger kernel based on the space locations in the data, always using the fixed time as passed in.

Parameters:
  • data – An array of shape (3,N) giving the space-time locations events. Used when computing the triggered / aftershock events.
  • background_kernel – The kernel object giving the background risk intensity. We assume this has a method space_kernel which gives just the two dimensional spacial kernel.
  • trigger_kernel – The kernel object giving the trigger / aftershock risk intensity.
  • time – The fixed time coordinate to evaluate at.
  • time_cutoff – Optional; if set, then we assume the trigger_kernel is zero for times greater than this value (to speed up evaluation).
  • space_cutoff – Optional; if set, then we assume the trigger_kernel is zero for space distances greater than this value (to speed up evaluation).
Returns:

A kernel object which can be called on arrays of (2 dimensional space) points.

open_cp.sepp.p_matrix(points, background_kernel, trigger_kernel)

Computes the probability matrix.

Parameters:
  • points – The (time, x, y) data
  • background_kernel – The kernel giving the background event intensity.
  • trigger_kernel – The kernel giving the triggered event intensity.
Returns:

A matrix p such that p[i][i] is the probability event i is a background event, and p[i][j] is the probability event j is triggered by event i.

open_cp.sepp.p_matrix_fast(points, background_kernel, trigger_kernel, time_cutoff=150, space_cutoff=1)

Computes the probability matrix. Offers faster execution speed than p_matrix() by, in the calculation of triggered event probabilities, ignoring events which are beyond a space or time cutoff. These parameters should be set so that the trigger_kernel evaluates to (very close to) zero outside the cutoff zone.

Parameters:
  • points – The (time, x, y) data
  • background_kernel – The kernel giving the background event intensity.
  • trigger_kernel – The kernel giving the triggered event intensity.
  • time_cutoff – The maximum time between two events which can be considered in the trigging calculation.
  • space_cutoff – The maximum (two-dimensional Eucliean) distance between two events which can be considered in the trigging calculation.
Returns:

A matrix p such that p[i][i] is the probability event i is a background event, and p[i][j] is the probability event j is triggered by event i.

open_cp.sepp.sample_points(points, p)

Using the probability matrix, sample background and triggered points.

Parameters:
  • points – The (time, x, y) data.
  • p – The probability matrix.
Returns:

A pair of (backgrounds, triggered) where backgrounds is the (time, x, y) data of the points classified as being background events, and triggered is the (time, x, y) delta of the triggered events. That is, triggered represents the difference in space and time between each triggered event and the event which triggered it, as sampled from the probability matrix.