open_cp.sources package¶
open_cp.sources.chicago module¶
sources.chicago¶
Reads a CSV file in the format (as of April 2017) of data available from:
- https://catalog.data.gov/dataset/crimes-one-year-prior-to-present-e171f
- https://catalog.data.gov/dataset/crimes-2001-to-present-398a4
The default data is loaded from a file “chicago.csv” which should be downloaded from one of the above links.
The data is partly anonymous in that the address within a block is obscured, and the geocoding always returns a coordinate in the middle of a block.
-
open_cp.sources.chicago.
default_burglary_data
()¶ Load the default data, if available.
Returns: An instance of open_cp.data.TimedPoints
or None.
-
open_cp.sources.chicago.
load
(filename, primary_description_names, to_meters=True)¶ Load data from a CSV file in the expected format.
Parameters: - filename – Name of the CSV file load.
- primary_description_names – Set of names to search for in the “primary description field”. E.g. pass {“THEFT”} to return only the “theft” crime type.
Returns: An instance of
open_cp.data.TimedPoints
or None.
open_cp.sources.random module¶
sources.random¶
Produces synthetic data based upon simple random models.
Currently overlaps a bit with the Sampler classes from the sources.sepp module.
-
class
open_cp.sources.random.
KernelSampler
(region, kernel, k_max)¶ Bases:
object
A simple “sampler” class which can sample from a kernel defined on a rectangular region. Call as kernel(N) to make N samples, returning an array of shape (2,N).
See also
open_cp.sources.sepp.SpaceSampler
Parameters: - region – A
open_cp.data.RectangularRegion
instance describing the region the kernel is defined on. - kernel – The kernel, callable with an array of shape (2,k).
- k_max – The maximum value the kernel takes (or an upper bound).
- region – A
-
open_cp.sources.random.
random_spatial
(space_sampler, start_time, end_time, expected_number, time_rate_unit=numpy.timedelta64(1, 's'))¶ Simulate a homogeneous Poisson process in time with independent, identically distributed space locations.
Parameters: - space_sampler – The callable object to return the space coordinates. Expects to be called as space_sampler(N) and returns an array of shape (2,N) of (x,y) coordinates.
- start_time – The start time of the simulation.
- end_time – The end time of the simulation.
- expected_number – The expected number of events to simulate.
- time_rate_unit – The
numpy.timedelta64
unit to use: this becomes the smallest interval of time we can simulate. By default, one second.
Returns: A
open_cp.data.TimedPoints
instance giving the simulation.
-
open_cp.sources.random.
random_uniform
(region, start_time, end_time, expected_number, time_rate_unit=numpy.timedelta64(1, 's'))¶ Simulate a homogeneous Poisson process in time with space locations chosen uniformly at random in a region.
Parameters: - region – A
open_cp.data.RectangularRegion
instance giving the region to sample space locations in. - start_time – The start time of the simulation.
- end_time – The end time of the simulation.
- expected_number – The expected number of events to simulate.
- time_rate_unit – The
numpy.timedelta64
unit to use: this becomes the smallest interval of time we can simulate. By default, one second.
Returns: A
TimedPoints
instance giving the simulation.- region – A
-
open_cp.sources.random.
rejection_sample_2d
(kernel, k_max, samples=1, oversample=2)¶ A simple two-dimensional rejection sampler. The kernel is assumed to be defined on [0,1] times [0,1].
Parameters: - kernel – A callable object giving the kernel. Should be able to accept an array of shape (2, #points) and return an array of shape (#points).
- k_max – The maximum value the kernel takes (or an upper bound).
- samples – The number of samples to return.
- oversample – Change this to improve performance. At each iteration, we test this many more samples than we need. Make this parameter too large, and we “waste” random numbers. Make it too small, and we don’t utilise the parallel nature of numpy enough. Defaults to 2.0
Returns: If one sample required, an array [x,y] of the point sampled. Otherwise an array of shape (2,N) where N is the number of samples.
open_cp.sources.sepp module¶
sources.sepp¶
Produces synthetic data based upon a “self-exciting” or “Hawkes model” point process. These are point processes where the conditional intensity function depends upon a background intensity (i.e. a homogeneous or possibly inhomogeneous Poisson process) and when each event in the past contributes a further (linearly additive) terms governed by a trigger / aftershock kernel.
Such models, with specific forms for the trigger kernel, are known as “epidemic type aftershock models” in the Earthquake modelling literature.
Rather than rely upon external libraries (excepting numpy which we do use) we produce a number of base classes which define kernels and samplers, and provide some common kernels and samplers for backgrounds and triggers.
-
class
open_cp.sources.sepp.
Exponential
(exp_rate=1, total_rate=1)¶ Bases:
open_cp.sources.sepp.TimeKernel
An exponentially decaying kernel.
Parameters: - exp_rate – The “rate” parameter of the exponential.
- total_rate – The overall scaling of the kernel. If this kernel is used to simulate a point process, then this is the expected number of events.
-
kernel_max
(time_start, time_end)¶
-
class
open_cp.sources.sepp.
ExponentialDecaySampler
(intensity, exp_rate)¶ Bases:
open_cp.sources.sepp.Sampler
A one-dimensional time sampler, sampling from an exponentially decaying kernel.
Parameters: - exp_rate – The “rate” parameter of the exponential.
- intensity – The expected number of events.
-
sample
(start_time, end_time)¶
-
class
open_cp.sources.sepp.
GaussianSpaceSampler
(mus, variances, correlation)¶ Bases:
open_cp.sources.sepp.SpaceSampler
Returns samples from a Multivariate normal distribution.
Parameters: - mus – A pair of the mean values of the Gaussian in each variable.
- variances – A pair of the variances of the Gaussian in each variable.
- correlation – The correlation between the two Gaussians.
-
class
open_cp.sources.sepp.
GridHawkesProcess
(background_rates, theta, omega)¶ Bases:
open_cp.sources.sepp.Sampler
Sample from a grid-based, Hawkes type (expoential decay self-excitation kernel) model, as used by Mohler et al, “Randomized Controlled Field Trials of Predictive Policing”, 2015.
Parameters: - background_rates – An array of arbitrary shape, giving the background rate in each “cell”.
- theta – The overall “intensity” of trigger / aftershock events. Should be less than 1.
- omega – The rate (or inverse scale) of the exponential kernel. Increase to make aftershock events more localised in time.
-
sample
(start_time, end_time)¶ Will return an array of the same shape as that used by the background event, each entry of which is an array of zero or more times of events.
-
sample_to_randomised_grid
(start_time, end_time, grid_size)¶ Asuming that the background rate is a two-dimensional array, generate (uniformly at random) event locations so when confinded to a grid, the time-stamps agree with simulated data for that grid cell. We treat the input background rate as a matrix, so it has entries [row, col] or [y, x].
Returns: An array of shape (3,N) of N sampled points
-
class
open_cp.sources.sepp.
HomogeneousPoisson
(rate=1)¶ Bases:
open_cp.sources.sepp.TimeKernel
A constant kernel, representing a homogeneous poisson process.
Parameters: rate – The rate of the process: the expected number of events per time unit. -
kernel_max
(time_start, time_end)¶
-
-
class
open_cp.sources.sepp.
HomogeneousPoissonSampler
(rate)¶ Bases:
open_cp.sources.sepp.Sampler
A one-dimensional time sampler, sampling from a homogeneous Poisson process.
Parameters: rate – The rate of the process: the expected number of events per time unit. -
sample
(start_time, end_time)¶
-
-
class
open_cp.sources.sepp.
InhomogeneousPoisson
(region, kernel)¶ Bases:
open_cp.sources.sepp.Sampler
A simple rejection (aka Otago thining) sampler.
Parameters: - region – the spatial extent of the simulation.
- kernel – should follow the interface of :class SpaceTimeKernel:
-
sample
(start_time, end_time)¶
-
class
open_cp.sources.sepp.
InhomogeneousPoissonFactors
(time_kernel, space_sampler)¶ Bases:
open_cp.sources.sepp.Sampler
A time/space sampler where the kernel factorises into a time kernel and a space kernel. For efficiency, we use a space sampler.
Parameters: - time_kernel – Should follow the interface of
TimeKernel
- space_sampler – Should follow the interface of
SpaceSampler
-
sample
(start_time, end_time)¶
- time_kernel – Should follow the interface of
-
class
open_cp.sources.sepp.
PoissonTimeGaussianSpace
(time_rate, mus, variances, correlation)¶ Bases:
open_cp.sources.sepp.SpaceTimeKernel
A kernel which is a constant rate Poisson process in time, and a two dimensional Gaussian kernel in space (see https://en.wikipedia.org/wiki/Multivariate_normal_distribution).
Parameters: - time_rate – The rate of the Poisson process in time.
- mus – A pair of the mean values of the Gaussian in each variable.
- variances – A pair of the variances of the Gaussian in each variable.
- correlation – The correlation between the two Gaussians.
-
intensity
(t, x, y)¶
-
kernel_max
(time_start, time_end)¶
-
class
open_cp.sources.sepp.
Sampler
¶ Bases:
object
Sample from a point process.
-
sample
(start_time, end_time)¶ Find a sample from a point process.
Parameters: - start_time – The start of the time window to sample from.
- end_time – The end of the time window to sample from.
Returns: An array of shape (3,n) of space/time coordinates. The data should always be _sorted_ in time.
-
-
class
open_cp.sources.sepp.
SelfExcitingPointProcess
(background_sampler=None, trigger_sampler=None)¶ Bases:
open_cp.sources.sepp.Sampler
Sample from a self-exciting point process model. Can sample in arbitrary dimensions: if the samplers return one-dimensional points then we simulate a time-only process. If the samplers return multi-dimensional points, then we use the first coordinate as time, and the remaining coordinates as space.
Parameters: -
class
Sample
(points, backgrounds, trigger_deltas, trigger_points)¶ Bases:
object
Contains details of the sample as returned by
SelfExcitingPointProcess
. This can be useful when, for example, checking the correctness of the simulation.Parameters: - points – All points from the sampled process.
- backgrounds – All the background events.
- trigger_deltas – The “deltas” between trigger and triggered (aka parent and child) points.
- trigger_points – With the same ordering as trigger_deltas, the position of the trigger (aka parent) point.
-
SelfExcitingPointProcess.
sample
(start_time, end_time)¶
-
SelfExcitingPointProcess.
sample_with_details
(start_time, end_time)¶ Takes a sample from the process, but returns details
-
class
-
class
open_cp.sources.sepp.
SpaceSampler
¶ Bases:
object
Base class for classes which can return samples from a space (two dimensional) distribution.
-
class
open_cp.sources.sepp.
SpaceTimeKernel
¶ Bases:
open_cp.kernels.Kernel
To produce a kernel as required by the samplers in this package, either extend this abstract class implementing intensity(t, x, y) or provide your own class which has the same signature as __call__ and the property kernel_max
-
intensity
(t, x, y)¶ t, x and y will be one-dimensional numpy arrays of the same length.
Returns: A numpy array of the same length as the input
-
kernel_max
(time_start, time_end)¶ Return a value which is greater than or equal to the maximum intensity of the kernel over the time range (and for any space input).
-
set_scale
()¶
-
-
class
open_cp.sources.sepp.
TimeKernel
¶ Bases:
open_cp.kernels.Kernel
A one dimensional kernel which can estimate its upper bound, for use with rejection sampling.
-
kernel_max
(time_start, time_end)¶ Return a value which is greater than or equal to the maximum intensity of the kernel over the time range.
-
set_scale
()¶
-
-
class
open_cp.sources.sepp.
UniformRegionSampler
(region)¶ Bases:
open_cp.sources.sepp.SpaceSampler
Returns space samples chosen uniformly from a rectangular region.
Parameters: region – An instance of :class RectangularRegion: giving the region.
-
open_cp.sources.sepp.
make_time_unit
(length_of_time, minimal_time_unit=numpy.timedelta64(1, 'ms'))¶ Utility method to create a time_unit.
Parameters: - length_of_time – A time delta object, representing the length of time “one unit” should represent: e.g. an hour, a day, etc.
- minimal_time_unit – The minimal time length the resulting data represents. Defaults to milli-seconds.
-
open_cp.sources.sepp.
scale_to_real_time
(points, start_time, time_unit=numpy.timedelta64(60, 's'))¶ Transform abstract time/space data to real timestamps.
Parameters: - points – Array of shape (3,n) representing time/space coordinates.
- start_time – The time to map 0.0 to
- time_unit – The duration of unit time, by default 60 seconds
(so one minute, but giving the resulting data a resolution of seconds).
See
make_time_unit()
.
Returns: An instance of
open_cp.data.TimedPoints
open_cp.sources.ukpolice module¶
sources.ukpolice¶
Reads a CSV file in the format (as of April 2017) of data available from:
The default data is loaded from a file “uk_police.csv” which should be downloaded from one of the above links. Data from more than one month needs to be manually joined.
The data is partly anonymous in that the address is a street name (or other non-uniquely identifying location) and geocoding resolves to the centre of streets. Most importantly, all timestamps are only to a _monthly_ resolution.
-
open_cp.sources.ukpolice.
default_burglary_data
()¶ Load the default data, if available.
Returns: An instance of open_cp.data.TimedPoints
or None.
-
open_cp.sources.ukpolice.
load
(filename, primary_description_names)¶ Load data from a CSV file in the expected format.
Parameters: - filename – Name of the CSV file load.
- primary_description_names – Set of names to search for in the “primary description field”. E.g. pass {“Burglary”} to return only the “burglary” crime type.
Returns: An instance of
open_cp.data.TimedPoints
or None.