open_cp.sources package

open_cp.sources.chicago module

sources.chicago

Reads a CSV file in the format (as of April 2017) of data available from:

The default data is loaded from a file “chicago.csv” which should be downloaded from one of the above links.

The data is partly anonymous in that the address within a block is obscured, and the geocoding always returns a coordinate in the middle of a block.

open_cp.sources.chicago.default_burglary_data()

Load the default data, if available.

Returns:An instance of open_cp.data.TimedPoints or None.
open_cp.sources.chicago.load(filename, primary_description_names, to_meters=True)

Load data from a CSV file in the expected format.

Parameters:
  • filename – Name of the CSV file load.
  • primary_description_names – Set of names to search for in the “primary description field”. E.g. pass {“THEFT”} to return only the “theft” crime type.
Returns:

An instance of open_cp.data.TimedPoints or None.

open_cp.sources.random module

sources.random

Produces synthetic data based upon simple random models.

Currently overlaps a bit with the Sampler classes from the sources.sepp module.

class open_cp.sources.random.KernelSampler(region, kernel, k_max)

Bases: object

A simple “sampler” class which can sample from a kernel defined on a rectangular region. Call as kernel(N) to make N samples, returning an array of shape (2,N).

See also open_cp.sources.sepp.SpaceSampler

Parameters:
  • region – A open_cp.data.RectangularRegion instance describing the region the kernel is defined on.
  • kernel – The kernel, callable with an array of shape (2,k).
  • k_max – The maximum value the kernel takes (or an upper bound).
open_cp.sources.random.random_spatial(space_sampler, start_time, end_time, expected_number, time_rate_unit=numpy.timedelta64(1, 's'))

Simulate a homogeneous Poisson process in time with independent, identically distributed space locations.

Parameters:
  • space_sampler – The callable object to return the space coordinates. Expects to be called as space_sampler(N) and returns an array of shape (2,N) of (x,y) coordinates.
  • start_time – The start time of the simulation.
  • end_time – The end time of the simulation.
  • expected_number – The expected number of events to simulate.
  • time_rate_unit – The numpy.timedelta64 unit to use: this becomes the smallest interval of time we can simulate. By default, one second.
Returns:

A open_cp.data.TimedPoints instance giving the simulation.

open_cp.sources.random.random_uniform(region, start_time, end_time, expected_number, time_rate_unit=numpy.timedelta64(1, 's'))

Simulate a homogeneous Poisson process in time with space locations chosen uniformly at random in a region.

Parameters:
  • region – A open_cp.data.RectangularRegion instance giving the region to sample space locations in.
  • start_time – The start time of the simulation.
  • end_time – The end time of the simulation.
  • expected_number – The expected number of events to simulate.
  • time_rate_unit – The numpy.timedelta64 unit to use: this becomes the smallest interval of time we can simulate. By default, one second.
Returns:

A TimedPoints instance giving the simulation.

open_cp.sources.random.rejection_sample_2d(kernel, k_max, samples=1, oversample=2)

A simple two-dimensional rejection sampler. The kernel is assumed to be defined on [0,1] times [0,1].

Parameters:
  • kernel – A callable object giving the kernel. Should be able to accept an array of shape (2, #points) and return an array of shape (#points).
  • k_max – The maximum value the kernel takes (or an upper bound).
  • samples – The number of samples to return.
  • oversample – Change this to improve performance. At each iteration, we test this many more samples than we need. Make this parameter too large, and we “waste” random numbers. Make it too small, and we don’t utilise the parallel nature of numpy enough. Defaults to 2.0
Returns:

If one sample required, an array [x,y] of the point sampled. Otherwise an array of shape (2,N) where N is the number of samples.

open_cp.sources.sepp module

sources.sepp

Produces synthetic data based upon a “self-exciting” or “Hawkes model” point process. These are point processes where the conditional intensity function depends upon a background intensity (i.e. a homogeneous or possibly inhomogeneous Poisson process) and when each event in the past contributes a further (linearly additive) terms governed by a trigger / aftershock kernel.

Such models, with specific forms for the trigger kernel, are known as “epidemic type aftershock models” in the Earthquake modelling literature.

Rather than rely upon external libraries (excepting numpy which we do use) we produce a number of base classes which define kernels and samplers, and provide some common kernels and samplers for backgrounds and triggers.

class open_cp.sources.sepp.Exponential(exp_rate=1, total_rate=1)

Bases: open_cp.sources.sepp.TimeKernel

An exponentially decaying kernel.

Parameters:
  • exp_rate – The “rate” parameter of the exponential.
  • total_rate – The overall scaling of the kernel. If this kernel is used to simulate a point process, then this is the expected number of events.
kernel_max(time_start, time_end)
class open_cp.sources.sepp.ExponentialDecaySampler(intensity, exp_rate)

Bases: open_cp.sources.sepp.Sampler

A one-dimensional time sampler, sampling from an exponentially decaying kernel.

Parameters:
  • exp_rate – The “rate” parameter of the exponential.
  • intensity – The expected number of events.
sample(start_time, end_time)
class open_cp.sources.sepp.GaussianSpaceSampler(mus, variances, correlation)

Bases: open_cp.sources.sepp.SpaceSampler

Returns samples from a Multivariate normal distribution.

Parameters:
  • mus – A pair of the mean values of the Gaussian in each variable.
  • variances – A pair of the variances of the Gaussian in each variable.
  • correlation – The correlation between the two Gaussians.
class open_cp.sources.sepp.GridHawkesProcess(background_rates, theta, omega)

Bases: open_cp.sources.sepp.Sampler

Sample from a grid-based, Hawkes type (expoential decay self-excitation kernel) model, as used by Mohler et al, “Randomized Controlled Field Trials of Predictive Policing”, 2015.

Parameters:
  • background_rates – An array of arbitrary shape, giving the background rate in each “cell”.
  • theta – The overall “intensity” of trigger / aftershock events. Should be less than 1.
  • omega – The rate (or inverse scale) of the exponential kernel. Increase to make aftershock events more localised in time.
sample(start_time, end_time)

Will return an array of the same shape as that used by the background event, each entry of which is an array of zero or more times of events.

sample_to_randomised_grid(start_time, end_time, grid_size)

Asuming that the background rate is a two-dimensional array, generate (uniformly at random) event locations so when confinded to a grid, the time-stamps agree with simulated data for that grid cell. We treat the input background rate as a matrix, so it has entries [row, col] or [y, x].

Returns:An array of shape (3,N) of N sampled points
class open_cp.sources.sepp.HomogeneousPoisson(rate=1)

Bases: open_cp.sources.sepp.TimeKernel

A constant kernel, representing a homogeneous poisson process.

Parameters:rate – The rate of the process: the expected number of events per time unit.
kernel_max(time_start, time_end)
class open_cp.sources.sepp.HomogeneousPoissonSampler(rate)

Bases: open_cp.sources.sepp.Sampler

A one-dimensional time sampler, sampling from a homogeneous Poisson process.

Parameters:rate – The rate of the process: the expected number of events per time unit.
sample(start_time, end_time)
class open_cp.sources.sepp.InhomogeneousPoisson(region, kernel)

Bases: open_cp.sources.sepp.Sampler

A simple rejection (aka Otago thining) sampler.

Parameters:
  • region – the spatial extent of the simulation.
  • kernel – should follow the interface of :class SpaceTimeKernel:
sample(start_time, end_time)
class open_cp.sources.sepp.InhomogeneousPoissonFactors(time_kernel, space_sampler)

Bases: open_cp.sources.sepp.Sampler

A time/space sampler where the kernel factorises into a time kernel and a space kernel. For efficiency, we use a space sampler.

Parameters:
  • time_kernel – Should follow the interface of TimeKernel
  • space_sampler – Should follow the interface of SpaceSampler
sample(start_time, end_time)
class open_cp.sources.sepp.PoissonTimeGaussianSpace(time_rate, mus, variances, correlation)

Bases: open_cp.sources.sepp.SpaceTimeKernel

A kernel which is a constant rate Poisson process in time, and a two dimensional Gaussian kernel in space (see https://en.wikipedia.org/wiki/Multivariate_normal_distribution).

Parameters:
  • time_rate – The rate of the Poisson process in time.
  • mus – A pair of the mean values of the Gaussian in each variable.
  • variances – A pair of the variances of the Gaussian in each variable.
  • correlation – The correlation between the two Gaussians.
intensity(t, x, y)
kernel_max(time_start, time_end)
class open_cp.sources.sepp.Sampler

Bases: object

Sample from a point process.

sample(start_time, end_time)

Find a sample from a point process.

Parameters:
  • start_time – The start of the time window to sample from.
  • end_time – The end of the time window to sample from.
Returns:

An array of shape (3,n) of space/time coordinates. The data should always be _sorted_ in time.

class open_cp.sources.sepp.SelfExcitingPointProcess(background_sampler=None, trigger_sampler=None)

Bases: open_cp.sources.sepp.Sampler

Sample from a self-exciting point process model. Can sample in arbitrary dimensions: if the samplers return one-dimensional points then we simulate a time-only process. If the samplers return multi-dimensional points, then we use the first coordinate as time, and the remaining coordinates as space.

Parameters:
  • background_sampler – Should follow the interface of Sampler
  • trigger_sampler – Should follow the interface of Sampler
class Sample(points, backgrounds, trigger_deltas, trigger_points)

Bases: object

Contains details of the sample as returned by SelfExcitingPointProcess. This can be useful when, for example, checking the correctness of the simulation.

Parameters:
  • points – All points from the sampled process.
  • backgrounds – All the background events.
  • trigger_deltas – The “deltas” between trigger and triggered (aka parent and child) points.
  • trigger_points – With the same ordering as trigger_deltas, the position of the trigger (aka parent) point.
SelfExcitingPointProcess.sample(start_time, end_time)
SelfExcitingPointProcess.sample_with_details(start_time, end_time)

Takes a sample from the process, but returns details

class open_cp.sources.sepp.SpaceSampler

Bases: object

Base class for classes which can return samples from a space (two dimensional) distribution.

class open_cp.sources.sepp.SpaceTimeKernel

Bases: open_cp.kernels.Kernel

To produce a kernel as required by the samplers in this package, either extend this abstract class implementing intensity(t, x, y) or provide your own class which has the same signature as __call__ and the property kernel_max

intensity(t, x, y)

t, x and y will be one-dimensional numpy arrays of the same length.

Returns:A numpy array of the same length as the input
kernel_max(time_start, time_end)

Return a value which is greater than or equal to the maximum intensity of the kernel over the time range (and for any space input).

set_scale()
class open_cp.sources.sepp.TimeKernel

Bases: open_cp.kernels.Kernel

A one dimensional kernel which can estimate its upper bound, for use with rejection sampling.

kernel_max(time_start, time_end)

Return a value which is greater than or equal to the maximum intensity of the kernel over the time range.

set_scale()
class open_cp.sources.sepp.UniformRegionSampler(region)

Bases: open_cp.sources.sepp.SpaceSampler

Returns space samples chosen uniformly from a rectangular region.

Parameters:region – An instance of :class RectangularRegion: giving the region.
open_cp.sources.sepp.make_time_unit(length_of_time, minimal_time_unit=numpy.timedelta64(1, 'ms'))

Utility method to create a time_unit.

Parameters:
  • length_of_time – A time delta object, representing the length of time “one unit” should represent: e.g. an hour, a day, etc.
  • minimal_time_unit – The minimal time length the resulting data represents. Defaults to milli-seconds.
open_cp.sources.sepp.scale_to_real_time(points, start_time, time_unit=numpy.timedelta64(60, 's'))

Transform abstract time/space data to real timestamps.

Parameters:
  • points – Array of shape (3,n) representing time/space coordinates.
  • start_time – The time to map 0.0 to
  • time_unit – The duration of unit time, by default 60 seconds (so one minute, but giving the resulting data a resolution of seconds). See make_time_unit().
Returns:

An instance of open_cp.data.TimedPoints

open_cp.sources.ukpolice module

sources.ukpolice

Reads a CSV file in the format (as of April 2017) of data available from:

The default data is loaded from a file “uk_police.csv” which should be downloaded from one of the above links. Data from more than one month needs to be manually joined.

The data is partly anonymous in that the address is a street name (or other non-uniquely identifying location) and geocoding resolves to the centre of streets. Most importantly, all timestamps are only to a _monthly_ resolution.

open_cp.sources.ukpolice.default_burglary_data()

Load the default data, if available.

Returns:An instance of open_cp.data.TimedPoints or None.
open_cp.sources.ukpolice.load(filename, primary_description_names)

Load data from a CSV file in the expected format.

Parameters:
  • filename – Name of the CSV file load.
  • primary_description_names – Set of names to search for in the “primary description field”. E.g. pass {“Burglary”} to return only the “burglary” crime type.
Returns:

An instance of open_cp.data.TimedPoints or None.

Module contents