Pipelines#

Poisson Kriging pipelines#

pyinterpolate.filter_blocks(semivariogram_model: TheoreticalVariogram, point_support: PointSupport, number_of_neighbors, kriging_type='ata', data_crs=None, raise_when_negative_prediction=True, raise_when_negative_error=False, negative_prediction_to_zero=False, verbose=True) GeoDataFrame[source]

Function filters block data using Poisson Kriging. By filtering we understand computing aggregated values again using point support data for ratios regularization.

Parameters:
semivariogram_modelTheoreticalVariogram

The fitted variogram model.

point_supportPointSupport

Blocks and their point supports.

number_of_neighborsint

The minimum number of neighbours that potentially affect a block.

kriging_typestr, default=’ata’
A type of Poisson Kriging operation. Available methods:
  • 'ata': Area-to-Area Poisson Kriging.

  • 'atp': Area-to-Point Poisson Kriging.

  • 'cb': Centroid-based Poisson Kriging.

data_crsstr, default=None

Data crs, look into: https://geopandas.org/projections.html. If None given then returned GeoDataFrame doesn’t have a crs.

raise_when_negative_predictionbool, default=True

Raise error when prediction is negative.

raise_when_negative_errorbool, default=True

Raise error when prediction error is negative.

negative_prediction_to_zerobool, default=False

When predicted value is below zero then set it to zero.

verbosebool, default=True

Show progress bar

Returns:
: GeoPandas GeoDataFrame

Regularized set of ps_blocks: ['id', 'geometry', 'reg.est', 'reg.err', 'rmse']

Examples

>>> import os
>>> import geopandas as gpd
>>> from pyinterpolate import (
...     filter_blocks,
...     Blocks,
...     ExperimentalVariogram,
...     PointSupport,
...     TheoreticalVariogram
... )
>>>
>>>
>>> FILENAME = 'cancer_data.gpkg'
>>> LAYER_NAME = 'areas'
>>> DS = gpd.read_file(FILENAME, layer=LAYER_NAME)
>>> AREA_VALUES = 'rate'
>>> AREA_INDEX = 'FIPS'
>>> AREA_GEOMETRY = 'geometry'
>>> PS_LAYER_NAME = 'points'
>>> PS_VALUES = 'POP10'
>>> PS_GEOMETRY = 'geometry'
>>> PS = gpd.read_file(FILENAME, layer=PS_LAYER_NAME)
>>>
>>> CANCER_DATA = {
...    'ds': DS,
...    'index_column_name': AREA_INDEX,
...    'value_column_name': AREA_VALUES,
...    'geometry_column_name': AREA_GEOMETRY
... }
>>> POINT_SUPPORT_DATA = {
...     'ps': PS,
...     'value_column_name': PS_VALUES,
...     'geometry_column_name': PS_GEOMETRY
... }
>>> BLOCKS = Blocks(**CANCER_DATA)
>>> indexes = BLOCKS.block_indexes
>>>
>>> PS = PointSupport(
...     points=POINT_SUPPORT_DATA['ps'],
...     ps_blocks=BLOCKS,
...     points_value_column=POINT_SUPPORT_DATA['value_column_name'],
...     points_geometry_column=POINT_SUPPORT_DATA['geometry_column_name']
... )
>>>
>>> EXPERIMENTAL = ExperimentalVariogram(
...     ds=BLOCKS.representative_points_array(),
...     step_size=40000,
...     max_range=300001
... )
>>>
>>> THEO = TheoreticalVariogram()
>>> THEO.autofit(
...     experimental_variogram=EXPERIMENTAL,
...     sill=150
... )
>>> filtered = filter_blocks(
...     semivariogram_model=THEO,
...     point_support=PS,
...     number_of_neighbors=8,
...     kriging_type='cb',
...     raise_when_negative_error=False,
...     verbose=False
... )
>>> print(filtered.columns)
Index(['id', 'geometry', 'reg.est', 'reg.err', 'rmse'], dtype='object')
pyinterpolate.smooth_blocks(semivariogram_model: TheoreticalVariogram, point_support: PointSupport, number_of_neighbors, data_crs=None, raise_when_negative_prediction=True, raise_when_negative_error=True, negative_prediction_to_zero=False, verbose=True) GeoDataFrame[source]

Function smooths aggregated block values, and transform those into point support.

Parameters:
semivariogram_modelTheoreticalVariogram

The fitted variogram model.

point_supportPointSupport

Blocks and their point supports.

number_of_neighborsint

The minimum number of neighbours that potentially affect a block.

data_crsstr, default=None

Data crs, look into: https://geopandas.org/projections.html. If None given then returned GeoDataFrame doesn’t have a crs.

raise_when_negative_predictionbool, default=True

Raise error when prediction is negative.

raise_when_negative_errorbool, default=True

Raise error when prediction error is negative.

negative_prediction_to_zerobool, default=False

When predicted value is below zero then set it to zero.

verbosebool, default=True

Show progress bar

Returns:
: GeoPandas GeoDataFrame

columns = ['id', 'geometry', 'reg.est', 'reg.err', 'rmse']

Examples

>>> import os
>>> import geopandas as gpd
>>> from pyinterpolate import (
...     filter_blocks,
...     Blocks,
...     ExperimentalVariogram,
...     PointSupport,
...     TheoreticalVariogram
... )
>>>
>>>
>>> FILENAME = 'cancer_data.gpkg'
>>> LAYER_NAME = 'areas'
>>> DS = gpd.read_file(FILENAME, layer=LAYER_NAME)
>>> AREA_VALUES = 'rate'
>>> AREA_INDEX = 'FIPS'
>>> AREA_GEOMETRY = 'geometry'
>>> PS_LAYER_NAME = 'points'
>>> PS_VALUES = 'POP10'
>>> PS_GEOMETRY = 'geometry'
>>> PS = gpd.read_file(FILENAME, layer=PS_LAYER_NAME)
>>>
>>> CANCER_DATA = {
...    'ds': DS,
...    'index_column_name': AREA_INDEX,
...    'value_column_name': AREA_VALUES,
...    'geometry_column_name': AREA_GEOMETRY
... }
>>> POINT_SUPPORT_DATA = {
...     'ps': PS,
...     'value_column_name': PS_VALUES,
...     'geometry_column_name': PS_GEOMETRY
... }
>>> BLOCKS = Blocks(**CANCER_DATA)
>>> indexes = BLOCKS.block_indexes
>>>
>>> PS = PointSupport(
...     points=POINT_SUPPORT_DATA['ps'],
...     ps_blocks=BLOCKS,
...     points_value_column=POINT_SUPPORT_DATA['value_column_name'],
...     points_geometry_column=POINT_SUPPORT_DATA['geometry_column_name']
... )
>>>
>>> EXPERIMENTAL = ExperimentalVariogram(
...     ds=BLOCKS.representative_points_array(),
...     step_size=40000,
...     max_range=300001
... )
>>>
>>> THEO = TheoreticalVariogram()
>>> THEO.autofit(
...     experimental_variogram=EXPERIMENTAL,
...     sill=150
... )
>>> smoothed = smooth_blocks(
...     semivariogram_model=THEO,
...     point_support=PS,
...     number_of_neighbors=8,
...     verbose=True
... )
>>> print(smoothed.columns)
Index(['id', 'geometry', 'reg.est', 'reg.err', 'rmse'], dtype='object')

Ordinary Kriging pipelines#

pyinterpolate.interpolate_points(theoretical_model: TheoreticalVariogram, unknown_locations: ArrayLike, known_locations: ArrayLike = None, known_values: ArrayLike = None, known_geometries: ArrayLike = None, neighbors_range=None, no_neighbors=4, max_tick=5.0, use_all_neighbors_in_range=False, allow_approximate_solutions=False, progress_bar=True) ndarray[source]

Function predicts values at unknown locations with Ordinary Kriging.

Parameters:
theoretical_modelTheoreticalVariogram

Fitted theoretical variogram model.

unknown_locationsnumpy array

Points where you want to estimate value [(x, y), ...] <-> [(lon, lat), ...].

known_locationsnumpy array, optional

The known locations: [x, y, value].

known_valuesArrayLike, optional

Observation in the i-th geometry (from known_geometries). Optional parameter, if not given then known_locations must be provided.

known_geometriesArrayLike, optional

Array or similar structure with geometries. It must have the same length as known_values. Optional parameter, if not given then known_locations must be provided. Point type geometry.

neighbors_rangefloat, default=None

The maximum distance where we search for the neighbors. If None is given then range is selected from the theoretical model’s rang attribute.

no_neighborsint, default = 4

The number of the n-closest neighbors used for interpolation.

max_tickfloat, default=5.

Maximum number of degrees for neighbors search angle.

use_all_neighbors_in_rangebool, default = False

True: if the real number of neighbors within the neighbors_range is greater than the number_of_neighbors parameter then take all of them anyway.

allow_approximate_solutionsbool, default=False

Allows the approximation of kriging weights based on the OLS algorithm. We don’t recommend set it to True if you don’t know what are you doing. This parameter can be useful when you have clusters in your dataset, that can lead to singular or near-singular matrix creation. But the better idea is to get rid of those clusters.

progress_barbool, default = True

Shows progress bar

Returns:
: numpy array

[predicted value, variance error, longitude (x), latitude (y)]

Examples

>>> import geopandas as gpd
>>> import numpy as np
>>> import pandas as pd
>>>
>>> from pyinterpolate import (build_experimental_variogram,
...     build_theoretical_variogram)
>>> from pyinterpolate.core.pipelines.interpolate import interpolate_points
>>>
>>> dem = gpd.read_file('dem.gpkg')
>>> unknown_locations = gpd.read_file('unknown_locations.gpkg')
>>> step_size = 500
>>> max_range = 10000
>>> exp_variogram = build_experimental_variogram(
...     values=dem['dem'],
...     geometries=dem['geometry'],
...     step_size=step_size,
...     max_range=max_range
... )
>>> theo_variogram = build_theoretical_variogram(exp_variogram)
>>> interp = interpolate_points(
...     theoretical_model=theo_variogram,
...     unknown_locations=unknown_locations['geometry'],
...     known_values=dem['dem'],
...     known_geometries=dem['geometry']
... )
>>> print(interp[0])
[7.91222896e+01 9.72740449e+01 2.38012302e+05 5.51466805e+05]
pyinterpolate.interpolate_points_dask(theoretical_model: TheoreticalVariogram, unknown_locations: ArrayLike, known_locations: ArrayLike = None, known_values: ArrayLike = None, known_geometries: ArrayLike = None, neighbors_range=None, no_neighbors=4, max_tick=5.0, use_all_neighbors_in_range=False, allow_approximate_solutions=False, number_of_workers=1, progress_bar=True) ndarray[source]

Function predicts values at unknown locations with Ordinary Kriging using Dask backend, makes sense when you must interpolate large number of points.

Parameters:
theoretical_modelTheoreticalVariogram

Fitted theoretical variogram model.

unknown_locationsnumpy array

Points where you want to estimate value [(x, y), ...] <-> [(lon, lat), ...].

known_locationsnumpy array, optional

The known locations: [x, y, value].

known_valuesArrayLike, optional

Observation in the i-th geometry (from known_geometries). Optional parameter, if not given then known_locations must be provided.

known_geometriesArrayLike, optional

Array or similar structure with geometries. It must have the same length as known_values. Optional parameter, if not given then known_locations must be provided. Point type geometry.

neighbors_rangefloat, default=None

The maximum distance where we search for the neighbors. If None is given then range is selected from the theoretical model’s rang attribute.

no_neighborsint, default = 4

The number of the n-closest neighbors used for interpolation.

max_tickfloat, default=5.

Maximum number of degrees for neighbors search angle.

use_all_neighbors_in_rangebool, default = False

True: if the real number of neighbors within the neighbors_range is greater than the number_of_neighbors parameter then take all of them anyway.

allow_approximate_solutionsbool, default=False

Allows the approximation of kriging weights based on the OLS algorithm. We don’t recommend set it to True if you don’t know what are you doing. This parameter can be useful when you have clusters in your dataset, that can lead to singular or near-singular matrix creation. But the better idea is to get rid of those clusters.

number_of_workersint, default = 1

How many processing units can be used for predictions. Increase it only for a very large number of interpolated points (~10k+).

progress_barbool, default = True

Shows progress bar

Returns:
: numpy array

[predicted value, variance error, longitude (x), latitude (y)]

Examples

>>> import geopandas as gpd
>>> import numpy as np
>>> import pandas as pd
>>>
>>> from pyinterpolate import (build_experimental_variogram,
...     build_theoretical_variogram)
>>> from pyinterpolate.core.pipelines.interpolate import interpolate_points_dask
>>>
>>> dem = gpd.read_file('dem.gpkg')
>>> unknown_locations = gpd.read_file('unknown_locations.gpkg')
>>> step_size = 500
>>> max_range = 10000
>>> exp_variogram = build_experimental_variogram(
...     values=dem['dem'],
...     geometries=dem['geometry'],
...     step_size=step_size,
...     max_range=max_range
... )
>>> theo_variogram = build_theoretical_variogram(exp_variogram)
>>> interp = interpolate_points_dask(
...     theoretical_model=theo_variogram,
...     unknown_locations=unknown_locations['geometry'],
...     known_values=dem['dem'],
...     known_geometries=dem['geometry']
... )
>>> print(interp[0])
[7.91222896e+01 9.72740449e+01 2.38012302e+05 5.51466805e+05]