Pipelines#
Poisson Kriging pipelines#
- pyinterpolate.filter_blocks(semivariogram_model: TheoreticalVariogram, point_support: PointSupport, number_of_neighbors, kriging_type='ata', data_crs=None, raise_when_negative_prediction=True, raise_when_negative_error=False, negative_prediction_to_zero=False, verbose=True) GeoDataFrame[source]
Function filters block data using Poisson Kriging. By filtering we understand computing aggregated values again using point support data for ratios regularization.
- Parameters:
- semivariogram_modelTheoreticalVariogram
The fitted variogram model.
- point_supportPointSupport
Blocks and their point supports.
- number_of_neighborsint
The minimum number of neighbours that potentially affect a block.
- kriging_typestr, default=’ata’
- A type of Poisson Kriging operation. Available methods:
'ata': Area-to-Area Poisson Kriging.'atp': Area-to-Point Poisson Kriging.'cb': Centroid-based Poisson Kriging.
- data_crsstr, default=None
Data crs, look into: https://geopandas.org/projections.html. If None given then returned GeoDataFrame doesn’t have a crs.
- raise_when_negative_predictionbool, default=True
Raise error when prediction is negative.
- raise_when_negative_errorbool, default=True
Raise error when prediction error is negative.
- negative_prediction_to_zerobool, default=False
When predicted value is below zero then set it to zero.
- verbosebool, default=True
Show progress bar
- Returns:
- : GeoPandas GeoDataFrame
Regularized set of ps_blocks:
['id', 'geometry', 'reg.est', 'reg.err', 'rmse']
Examples
>>> import os >>> import geopandas as gpd >>> from pyinterpolate import ( ... filter_blocks, ... Blocks, ... ExperimentalVariogram, ... PointSupport, ... TheoreticalVariogram ... ) >>> >>> >>> FILENAME = 'cancer_data.gpkg' >>> LAYER_NAME = 'areas' >>> DS = gpd.read_file(FILENAME, layer=LAYER_NAME) >>> AREA_VALUES = 'rate' >>> AREA_INDEX = 'FIPS' >>> AREA_GEOMETRY = 'geometry' >>> PS_LAYER_NAME = 'points' >>> PS_VALUES = 'POP10' >>> PS_GEOMETRY = 'geometry' >>> PS = gpd.read_file(FILENAME, layer=PS_LAYER_NAME) >>> >>> CANCER_DATA = { ... 'ds': DS, ... 'index_column_name': AREA_INDEX, ... 'value_column_name': AREA_VALUES, ... 'geometry_column_name': AREA_GEOMETRY ... } >>> POINT_SUPPORT_DATA = { ... 'ps': PS, ... 'value_column_name': PS_VALUES, ... 'geometry_column_name': PS_GEOMETRY ... } >>> BLOCKS = Blocks(**CANCER_DATA) >>> indexes = BLOCKS.block_indexes >>> >>> PS = PointSupport( ... points=POINT_SUPPORT_DATA['ps'], ... ps_blocks=BLOCKS, ... points_value_column=POINT_SUPPORT_DATA['value_column_name'], ... points_geometry_column=POINT_SUPPORT_DATA['geometry_column_name'] ... ) >>> >>> EXPERIMENTAL = ExperimentalVariogram( ... ds=BLOCKS.representative_points_array(), ... step_size=40000, ... max_range=300001 ... ) >>> >>> THEO = TheoreticalVariogram() >>> THEO.autofit( ... experimental_variogram=EXPERIMENTAL, ... sill=150 ... ) >>> filtered = filter_blocks( ... semivariogram_model=THEO, ... point_support=PS, ... number_of_neighbors=8, ... kriging_type='cb', ... raise_when_negative_error=False, ... verbose=False ... ) >>> print(filtered.columns) Index(['id', 'geometry', 'reg.est', 'reg.err', 'rmse'], dtype='object')
- pyinterpolate.smooth_blocks(semivariogram_model: TheoreticalVariogram, point_support: PointSupport, number_of_neighbors, data_crs=None, raise_when_negative_prediction=True, raise_when_negative_error=True, negative_prediction_to_zero=False, verbose=True) GeoDataFrame[source]
Function smooths aggregated block values, and transform those into point support.
- Parameters:
- semivariogram_modelTheoreticalVariogram
The fitted variogram model.
- point_supportPointSupport
Blocks and their point supports.
- number_of_neighborsint
The minimum number of neighbours that potentially affect a block.
- data_crsstr, default=None
Data crs, look into: https://geopandas.org/projections.html. If
Nonegiven then returned GeoDataFrame doesn’t have a crs.- raise_when_negative_predictionbool, default=True
Raise error when prediction is negative.
- raise_when_negative_errorbool, default=True
Raise error when prediction error is negative.
- negative_prediction_to_zerobool, default=False
When predicted value is below zero then set it to zero.
- verbosebool, default=True
Show progress bar
- Returns:
- : GeoPandas GeoDataFrame
columns =
['id', 'geometry', 'reg.est', 'reg.err', 'rmse']
Examples
>>> import os >>> import geopandas as gpd >>> from pyinterpolate import ( ... filter_blocks, ... Blocks, ... ExperimentalVariogram, ... PointSupport, ... TheoreticalVariogram ... ) >>> >>> >>> FILENAME = 'cancer_data.gpkg' >>> LAYER_NAME = 'areas' >>> DS = gpd.read_file(FILENAME, layer=LAYER_NAME) >>> AREA_VALUES = 'rate' >>> AREA_INDEX = 'FIPS' >>> AREA_GEOMETRY = 'geometry' >>> PS_LAYER_NAME = 'points' >>> PS_VALUES = 'POP10' >>> PS_GEOMETRY = 'geometry' >>> PS = gpd.read_file(FILENAME, layer=PS_LAYER_NAME) >>> >>> CANCER_DATA = { ... 'ds': DS, ... 'index_column_name': AREA_INDEX, ... 'value_column_name': AREA_VALUES, ... 'geometry_column_name': AREA_GEOMETRY ... } >>> POINT_SUPPORT_DATA = { ... 'ps': PS, ... 'value_column_name': PS_VALUES, ... 'geometry_column_name': PS_GEOMETRY ... } >>> BLOCKS = Blocks(**CANCER_DATA) >>> indexes = BLOCKS.block_indexes >>> >>> PS = PointSupport( ... points=POINT_SUPPORT_DATA['ps'], ... ps_blocks=BLOCKS, ... points_value_column=POINT_SUPPORT_DATA['value_column_name'], ... points_geometry_column=POINT_SUPPORT_DATA['geometry_column_name'] ... ) >>> >>> EXPERIMENTAL = ExperimentalVariogram( ... ds=BLOCKS.representative_points_array(), ... step_size=40000, ... max_range=300001 ... ) >>> >>> THEO = TheoreticalVariogram() >>> THEO.autofit( ... experimental_variogram=EXPERIMENTAL, ... sill=150 ... ) >>> smoothed = smooth_blocks( ... semivariogram_model=THEO, ... point_support=PS, ... number_of_neighbors=8, ... verbose=True ... ) >>> print(smoothed.columns) Index(['id', 'geometry', 'reg.est', 'reg.err', 'rmse'], dtype='object')
Ordinary Kriging pipelines#
- pyinterpolate.interpolate_points(theoretical_model: TheoreticalVariogram, unknown_locations: ArrayLike, known_locations: ArrayLike = None, known_values: ArrayLike = None, known_geometries: ArrayLike = None, neighbors_range=None, no_neighbors=4, max_tick=5.0, use_all_neighbors_in_range=False, allow_approximate_solutions=False, progress_bar=True) ndarray[source]
Function predicts values at unknown locations with Ordinary Kriging.
- Parameters:
- theoretical_modelTheoreticalVariogram
Fitted theoretical variogram model.
- unknown_locationsnumpy array
Points where you want to estimate value
[(x, y), ...] <-> [(lon, lat), ...].- known_locationsnumpy array, optional
The known locations:
[x, y, value].- known_valuesArrayLike, optional
Observation in the i-th geometry (from
known_geometries). Optional parameter, if not given thenknown_locationsmust be provided.- known_geometriesArrayLike, optional
Array or similar structure with geometries. It must have the same length as
known_values. Optional parameter, if not given thenknown_locationsmust be provided. Point type geometry.- neighbors_rangefloat, default=None
The maximum distance where we search for the neighbors. If
Noneis given then range is selected from the theoretical model’srangattribute.- no_neighborsint, default = 4
The number of the n-closest neighbors used for interpolation.
- max_tickfloat, default=5.
Maximum number of degrees for neighbors search angle.
- use_all_neighbors_in_rangebool, default = False
True: if the real number of neighbors within theneighbors_rangeis greater than thenumber_of_neighborsparameter then take all of them anyway.- allow_approximate_solutionsbool, default=False
Allows the approximation of kriging weights based on the OLS algorithm. We don’t recommend set it to
Trueif you don’t know what are you doing. This parameter can be useful when you have clusters in your dataset, that can lead to singular or near-singular matrix creation. But the better idea is to get rid of those clusters.- progress_barbool, default = True
Shows progress bar
- Returns:
- : numpy array
[predicted value, variance error, longitude (x), latitude (y)]
Examples
>>> import geopandas as gpd >>> import numpy as np >>> import pandas as pd >>> >>> from pyinterpolate import (build_experimental_variogram, ... build_theoretical_variogram) >>> from pyinterpolate.core.pipelines.interpolate import interpolate_points >>> >>> dem = gpd.read_file('dem.gpkg') >>> unknown_locations = gpd.read_file('unknown_locations.gpkg') >>> step_size = 500 >>> max_range = 10000 >>> exp_variogram = build_experimental_variogram( ... values=dem['dem'], ... geometries=dem['geometry'], ... step_size=step_size, ... max_range=max_range ... ) >>> theo_variogram = build_theoretical_variogram(exp_variogram) >>> interp = interpolate_points( ... theoretical_model=theo_variogram, ... unknown_locations=unknown_locations['geometry'], ... known_values=dem['dem'], ... known_geometries=dem['geometry'] ... ) >>> print(interp[0]) [7.91222896e+01 9.72740449e+01 2.38012302e+05 5.51466805e+05]
- pyinterpolate.interpolate_points_dask(theoretical_model: TheoreticalVariogram, unknown_locations: ArrayLike, known_locations: ArrayLike = None, known_values: ArrayLike = None, known_geometries: ArrayLike = None, neighbors_range=None, no_neighbors=4, max_tick=5.0, use_all_neighbors_in_range=False, allow_approximate_solutions=False, number_of_workers=1, progress_bar=True) ndarray[source]
Function predicts values at unknown locations with Ordinary Kriging using Dask backend, makes sense when you must interpolate large number of points.
- Parameters:
- theoretical_modelTheoreticalVariogram
Fitted theoretical variogram model.
- unknown_locationsnumpy array
Points where you want to estimate value
[(x, y), ...] <-> [(lon, lat), ...].- known_locationsnumpy array, optional
The known locations:
[x, y, value].- known_valuesArrayLike, optional
Observation in the i-th geometry (from
known_geometries). Optional parameter, if not given thenknown_locationsmust be provided.- known_geometriesArrayLike, optional
Array or similar structure with geometries. It must have the same length as
known_values. Optional parameter, if not given thenknown_locationsmust be provided. Point type geometry.- neighbors_rangefloat, default=None
The maximum distance where we search for the neighbors. If
Noneis given then range is selected from the theoretical model’srangattribute.- no_neighborsint, default = 4
The number of the n-closest neighbors used for interpolation.
- max_tickfloat, default=5.
Maximum number of degrees for neighbors search angle.
- use_all_neighbors_in_rangebool, default = False
True: if the real number of neighbors within theneighbors_rangeis greater than thenumber_of_neighborsparameter then take all of them anyway.- allow_approximate_solutionsbool, default=False
Allows the approximation of kriging weights based on the OLS algorithm. We don’t recommend set it to
Trueif you don’t know what are you doing. This parameter can be useful when you have clusters in your dataset, that can lead to singular or near-singular matrix creation. But the better idea is to get rid of those clusters.- number_of_workersint, default = 1
How many processing units can be used for predictions. Increase it only for a very large number of interpolated points (~10k+).
- progress_barbool, default = True
Shows progress bar
- Returns:
- : numpy array
[predicted value, variance error, longitude (x), latitude (y)]
Examples
>>> import geopandas as gpd >>> import numpy as np >>> import pandas as pd >>> >>> from pyinterpolate import (build_experimental_variogram, ... build_theoretical_variogram) >>> from pyinterpolate.core.pipelines.interpolate import interpolate_points_dask >>> >>> dem = gpd.read_file('dem.gpkg') >>> unknown_locations = gpd.read_file('unknown_locations.gpkg') >>> step_size = 500 >>> max_range = 10000 >>> exp_variogram = build_experimental_variogram( ... values=dem['dem'], ... geometries=dem['geometry'], ... step_size=step_size, ... max_range=max_range ... ) >>> theo_variogram = build_theoretical_variogram(exp_variogram) >>> interp = interpolate_points_dask( ... theoretical_model=theo_variogram, ... unknown_locations=unknown_locations['geometry'], ... known_values=dem['dem'], ... known_geometries=dem['geometry'] ... ) >>> print(interp[0]) [7.91222896e+01 9.72740449e+01 2.38012302e+05 5.51466805e+05]