Core data structures#

Blocks#

class pyinterpolate.Blocks(ds: GeoDataFrame = None, values: ArrayLike = None, geometries: ArrayLike = None, value_column_name: str = None, geometry_column_name='geometry', index_column_name=None, representative_points_column_name=None, representative_points_from_centroid=False, representative_points_from_random_sample=False, representative_points_from_largest_area=True, distances_between_representative_points=True, angles_between_representative_points=False)[source]

Class represents aggregated blocks data.

Parameters:
dsgpd.GeoDataFrame, optional

Dataset with block values. Must be provided if values and geometry parameters are not given.

valuesArrayLike, optional

Aggregated values of each block. Optional parameter, if not given then ds must be provided.

geometriesArrayLike, optional

Array or similar structure with geometries. It must have the same length as values. Optional parameter, if not given then ds must be provided.

value_column_nameAny, optional

Name of the column with block rates. Must be provided when the ds parameter is given, otherwise it is set to ‘values’ if not provided.

geometry_column_nameAny, default = ‘geometry’

Name of the column with a block geometry.

index_column_nameAny, optional

Name of the indexing column.

representative_points_column_nameAny, optional

The column with representative points or coordinates.

representative_points_from_centroidbool, default = False

Calculate representative points from block centroids.

representative_points_from_random_samplebool, default = False

Calculate representative points from the point sampled from block geometry.

representative_points_from_largest_areabool, default = True

When MultiPolygon is passed sample representative points from the largest area.

distances_between_representative_pointsbool, default = True

Calculate distances between representative points during class initialization.

angles_between_representative_pointsbool, default = False

Calculate angles between representative points during class initialization.

Attributes:
dsgpd.GeoDataFrame

Dataset with block values.

value_column_nameAny

Name of the column with block rates.

geometry_column_nameAny, default = ‘geometry’

Name of the column with a block geometry.

index_column_nameAny, optional

Name of the indexing column.

rep_points_column_nameAny, optional

The column with representative points or coordinates.

anglesnumpy array

Angles between the blocks representative points.

distancesnumpy array

Distances between the blocks representative points.

Methods

block_data()

Longitude, latitude, and value as numpy array.

block_indexes()

Block indexes as numpy array.

block_representative_points()

Representative points - lon, lat as numpy array.

block_values()

Block values as numpy array.

block_coordinates(block_id)

Single block representative point.

block_real_value(block_id)

Single block observation.

calculate_angles_between_rep_points()

Angles between blocks, calculated as angles between each representative point and others. If update is True then it updates angles attribute. Returns dictionary with block index as a key and angles to other blocks ordered the same way as dictionary keys as values.

calculate_distances_between_rep_points()

Distances between blocks, calculated as distances between each representative point and others. If update is set to True then it updates distances attribute. Returns Data Frame with block indexes as columns and indexes and distances as values.

get_blocks_values()

Get multiple blocks values.

pop()

Experimental. Removes block with specified index from the dataset and returns removed block as the Blocks object. Alters object.

representative_points_array()

Numpy array with representative points - longitude, latitude, and value.

select_distances_between_blocks()

Select distances between a given block and all other blocks.

transform_crs()

Transform Blocks Coordinate Reference System.

Raises:
AttributeErrorIf both representative_points_from_centroid and

representative_points_from_random_sample are set to True.

See also

PointSupport

Class heavily using Blocks for the semivariogram deconvolution.

Examples

>>> import os
>>> import geopandas as gpd
>>> from pyinterpolate import Blocks
>>>
>>>
>>> FILENAME = 'cancer_data.gpkg'
>>> LAYER_NAME = 'areas'
>>> DS = gpd.read_file(FILENAME, layer=LAYER_NAME)
>>> AREA_VALUES = 'rate'
>>> AREA_INDEX = 'FIPS'
>>> AREA_GEOMETRY = 'geometry'
>>>
>>> CANCER_DATA = {
...    'ds': DS,
...    'index_column_name': AREA_INDEX,
...    'value_column_name': AREA_VALUES,
...    'geometry_column_name': AREA_GEOMETRY
... }
>>> block = Blocks(**CANCER_DATA)
>>> print(block.ds.columns)
Index(['FIPS', 'rate', 'geometry', 'rep_points', 'lon', 'lat'],
      dtype='object')
block_coordinates(block_id: Hashable)[source]

Gets block representative point.

Parameters:
block_idHashable
Returns:
: Point
property block_data

Returns block data.

Returns:
: numpy array

Block data [x, y, value].

property block_indexes

Returns index column values.

Returns:
: numpy array

Block indexes.

block_real_value(block_id: Hashable)[source]

Gets block total value.

Parameters:
block_idHashable
Returns:
: float
property block_representative_points

Returns block representative coordinates.

Returns:
: numpy array

Block representative coordinates.

property block_values

Returns block values.

Returns:
: numpy array

Block values.

calculate_angles_between_rep_points(update=True) Dict[source]

Angles between all representative points to all other representative points.

Parameters:
updatebool, default = True

Update angles attribute.

Returns:
: Dict

block index: angles to other blocks ordered like block indexes (keys) in a dictionary

calculate_distances_between_rep_points(update=True) DataFrame[source]

Gets distances between representative points within blocks.

Parameters:
updatebool, default = True

Update distances attribute.

Returns:
: DataFrame

Columns and indexes are blocks ids, values are distances between blocks.

get_blocks_values(indexes: ArrayLike = None)[source]

Returns values of observations aggregated within blocks.

Parameters:
indexesArray-like, optional

Indexes of blocks to get values from. If not given then all blocks are returned.

Returns:
: numpy array
pop(block_index: str | Hashable)[source]

Removes block with specified index from the dataset and returns removed block as the Blocks object.

Parameters:
block_indexUnion[str, Hashable]

Index of the block to remove.

Returns:
: Blocks

Single block as the Blocks object.

representative_points_array()[source]

Returns array with blocks’ representative points.

Returns:
: numpy array

[lon, lat, value]

select_distances_between_blocks(block_id, other_blocks=None) ndarray[source]

Method selects distances between specified blocks and all other blocks.

Parameters:
block_id

Single block ID or list with IDs to retrieve.

other_blocksoptional

Other blocks to get distance to those blocks, if not given then all other blocks are returned.

Returns:
: numpy array

Index is block id, columns are other blocks.

transform_crs(target_crs, inplace=True)[source]

Function transforms Blocks CRS

Parameters:
target_crspyproj.CRS or EPSG code

The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

inplacebool, default = True

When set to True then transform object’s instance on the fly, otherwise return modified object and do leave the old instance unchanged.

Point Support#

class pyinterpolate.PointSupport(blocks: Blocks, points: GeoDataFrame = None, values: ArrayLike = None, geometries: ArrayLike = None, points_value_column: str = None, points_geometry_column: str = None, store_dropped_points: bool = False, use_point_support_crs: bool = False, no_possible_neighbors=0, verbose=True)[source]

Class represents ps_blocks and their point support.

Parameters:
blocks: Blocks

Blocks object with polygons data.

points: gpd.GeoDataFrame, optional

Point support data, it should have geometry (Point) column and value column. Must be provided if values and geometry parameters are not given.

valuesArrayLike, optional

Aggregated values of each block. Optional parameter, if not given then ds must be provided.

geometriesArrayLike, optional

Array or similar structure with geometries. It must have the same length as values. Optional parameter, if not given then ds must be provided.

points_value_column: str, optional

The name of the point-support column with points values (e.g. population).

points_geometry_column: str, optional

The name of the point-support column with a geometry.

store_dropped_points: bool = False

Should object store points which weren’t joined with ps_blocks?

use_point_support_crs: bool = False

Should object use point support CRS instead of ps_blocks CRS? Both CRS are projected into the same projection, and this parameter decides into which CRS the data should be reprojected.

no_possible_neighborsint, default = 0

The maximum number of the closest ps_blocks used for the calculation of distances between point support coordinates. Default 0 indicates that all ps_blocks are used.

verbose: bool = True

Information about the progress of the calculations.

Attributes:
blocksBlocks

Blocks object with polygons data.

blocks_distancesnumpy array

Distances between the ps_blocks’ representative points.

blocks_index_columnstr

Name of the column with block indexes.

dropped_pointsGeoDataFrame, optional

Points which weren’t joined with ps_blocks (due to the lack of spatial overlap). Attribute can be None if the parameter store_dropped_points was set to False.

point_supportGeoDataFrame

Columns: lon_col_name, lat_col_name, point-support-value, block-index.

point_support_blocks_index_namestr, optional

Name of the column with block indexes in the point support. If the column name is not given in the ps_blocks object, then the default name "blocks_index" is used.

unique_blocksnumpy array

Unique block indexes from the point support.

Methods

lon_col_name

(str, property) Name of the column with longitude.

lat_col_name

(str, property) Name of the column with latitude.

get_distances_between_known_blocks()

Function returns distances between given ps_blocks.

get_point_to_block_indexes()

Method returns block indexes for each point in the same order as points are stored in the point_support.

get_points_array()

Method returns point coordinates and their values as a numpy array.

point_support_totals()

Function retrieves total point support values for given ps_blocks.

Notes

The PointSupport class structure is designed to store the information about the points within polygons. During the regularization process, the inblock semivariograms are estimated from the polygon’s point support, and semivariances are calculated between point supports of neighbouring ps_blocks.

The class takes the point support grid and ps_blocks data (polygons). Then, spatial join is performed and points are assigned to areas where they fall. The core attribute is point_support. It is a GeoDataFrame with columns:

  • lon_col_name - a floating representation of longitude,

  • lat_col_name - a floating representation of latitude,

  • point-support-value - the attribute describing the name of a column with the point-support’s value,

  • block-index - the name of a column directing to the block index values.

Examples

>>> import os
>>> import geopandas as gpd
>>> from pyinterpolate import (
>>> Blocks, ExperimentalVariogram, PointSupport, TheoreticalVariogram
>>> )
>>>
>>>
>>> FILENAME = 'cancer_data.gpkg'
>>> LAYER_NAME = 'areas'
>>> DS = gpd.read_file(FILENAME, layer=LAYER_NAME)
>>> AREA_VALUES = 'rate'
>>> AREA_INDEX = 'FIPS'
>>> AREA_GEOMETRY = 'geometry'
>>> PS_LAYER_NAME = 'points'
>>> PS_VALUES = 'POP10'
>>> PS_GEOMETRY = 'geometry'
>>> PS = gpd.read_file(FILENAME, layer=PS_LAYER_NAME)
>>>
>>> CANCER_DATA = {
...    'ds': DS,
...    'index_column_name': AREA_INDEX,
...    'value_column_name': AREA_VALUES,
...    'geometry_column_name': AREA_GEOMETRY
... }
>>> POINT_SUPPORT_DATA = {
...     'ps': PS,
...     'value_column_name': PS_VALUES,
...     'geometry_column_name': PS_GEOMETRY
... }
>>> block = Blocks(**CANCER_DATA)
>>>
>>> ps = PointSupport(
...     points=POINT_SUPPORT_DATA['ps'],
...     ps_blocks=BLOCKS,
...     points_value_column=POINT_SUPPORT_DATA['value_column_name'],
...     points_geometry_column=POINT_SUPPORT_DATA['geometry_column_name']
... )
>>> print(ps.unique_blocks[:2])
[42049. 42039.]
get_distances_between_known_blocks(block_ids: List | ndarray) ndarray[source]

Function returns distances between known ps_blocks.

Parameters:
block_idsUnion[List, np.ndarray]

List with block indexes.

Returns:
: numpy array

Distances from ps_blocks to all other ps_blocks (ordered the same way as input block_ids list, where rows and columns represent the block indexes).

get_point_to_block_indexes() Series[source]

Method returns block indexes for each point in the same order as points are stored in the point_support.

Returns:
: pandas Series

((point support index: block index))

get_points_array(block_id=None) ndarray[source]

Method returns point coordinates and their values as a numpy array

Parameters:
block_idAny

Block for which points should be retrieved, if not given then all points are returned.

Returns:
: numpy array

((lon_col_name, lat_col_name, value))

point_support_totals(blocks: Iterable)[source]

Function retrieves total point support values for given ps_blocks.

Parameters:
blocksIterable

Block indexes.

Returns:
: numpy array

Retrieved values.

class pyinterpolate.PointSupportDistance(verbose=True)[source]

Class calculates and stores distances between point supports of multiple blocks.

Parameters:
verbosebool, default = True

Show progress.

Attributes:
weighted_block_to_block_distancesDataFrame

Indexes: block indexes, Columns: block indexes, Cells: distances.

distances_between_point_supportsDict

(block_a, block_b): [[value_a(i), value_b(j), distance(i-j)], ...]

no_closest_neighborsint

Number of closest neighbors for each block.

closest_neighborsDict

Block id: [the closest blocks].

Methods

calc_pair_distances()

Returns distances between point supports from two blocks and updates distances dictionary.

calculate_point_support_distances()

Calculates distances between point supports.

calculate_weighted_block_to_block_distances()

Calculates weighted distances between blocks using their point supports.

get_weighted_distance()

Returns weighted distance to a block.

Raises:
AttributeError

When weighted block to block distances are not calculated and user wants to find closest neighbors (using calculate_point_support_distances() method with no_closest_neighbors > 0).

Examples

>>> import os
>>> import geopandas as gpd
>>> from pyinterpolate import (
>>> Blocks, ExperimentalVariogram, PointSupport, PointSupportDistance,
>>> TheoreticalVariogram,
>>> )
>>>
>>>
>>> FILENAME = 'cancer_data.gpkg'
>>> LAYER_NAME = 'areas'
>>> DS = gpd.read_file(FILENAME, layer=LAYER_NAME)
>>> AREA_VALUES = 'rate'
>>> AREA_INDEX = 'FIPS'
>>> AREA_GEOMETRY = 'geometry'
>>> PS_LAYER_NAME = 'points'
>>> PS_VALUES = 'POP10'
>>> PS_GEOMETRY = 'geometry'
>>> PS = gpd.read_file(FILENAME, layer=PS_LAYER_NAME)
>>>
>>> CANCER_DATA = {
...    'ds': DS,
...    'index_column_name': AREA_INDEX,
...    'value_column_name': AREA_VALUES,
...    'geometry_column_name': AREA_GEOMETRY
... }
>>> POINT_SUPPORT_DATA = {
...     'ps': PS,
...     'value_column_name': PS_VALUES,
...     'geometry_column_name': PS_GEOMETRY
... }
>>> block = Blocks(**CANCER_DATA)
>>>
>>> ps = PointSupport(
...     points=POINT_SUPPORT_DATA['ps'],
...     ps_blocks=BLOCKS,
...     points_value_column=POINT_SUPPORT_DATA['value_column_name'],
...     points_geometry_column=POINT_SUPPORT_DATA['geometry_column_name']
... )
>>>
>>> pds = PointSupportDistance(verbose=False)
>>> distances_between_neighbors = pds.calculate_point_support_distances(
...     point_support=ps,
...     block_id=36033,
...     number_of_neighbors=2
... )
>>> print(distances_between_neighbors.keys())  # dict[tuple: array]
dict_keys([(36033, 36019), (36019, 36033), (36033, 36089), (36089, 36033)])
>>> print(pds.closest_neighbors)
{36033: [36019, 36089]}
calc_pair_distances(point_support, block_pair: Tuple, update=True) ndarray[source]

Returns distances between point supports from two blocks and updates distances dictionary.

Parameters:
point_supportPointSupport

Blocks and their point supports.

block_pairTuple

(block_a, block_b)

updatebool, default = True

If True then distances are updated in the distances’ dictionary.

Returns:
: numpy array

Distances between point supports from two blocks.

calculate_point_support_distances(point_support, block_id, no_closest_neighbors: int = 0) dict[source]

Calculates distances between point supports.

Parameters:
point_supportPointSupport

Blocks and their point supports.

block_idint

The unique id of a block.

no_closest_neighborsint, default = 0

Number of the closest neighbors. If default then all distances are returned.

Returns:
: Dict

Dictionary with distances between point supports of a given block and its neighbors. Key is a block pair, and value is a numpy array with distances, where each row represents a point from a given block and each column represents a point from its neighbor.

calculate_weighted_block_to_block_distances(point_support, return_distances=False)[source]

Calculates weighted distances between blocks using their point supports.

Parameters:
point_supportPointSupport

Blocks and their point supports.

return_distancesbool, default = False

Should return DataFrame with distances?

Returns:
: pd.DataFrame

Indexes: block indexes, Columns: block indexes, Cells: distances.

get_weighted_distance(block_id) Series[source]

Returns weighted distance to a block.

Parameters:
block_idUnion[Hashable, str]

Block unique index.

Returns:
: pd.Series

Weighted distances between the block and other blocks centroids.