Core data structures#
Blocks#
- class pyinterpolate.Blocks(ds: GeoDataFrame = None, values: ArrayLike = None, geometries: ArrayLike = None, value_column_name: str = None, geometry_column_name='geometry', index_column_name=None, representative_points_column_name=None, representative_points_from_centroid=False, representative_points_from_random_sample=False, representative_points_from_largest_area=True, distances_between_representative_points=True, angles_between_representative_points=False)[source]
Class represents aggregated blocks data.
- Parameters:
- dsgpd.GeoDataFrame, optional
Dataset with block values. Must be provided if
valuesandgeometryparameters are not given.- valuesArrayLike, optional
Aggregated values of each block. Optional parameter, if not given then
dsmust be provided.- geometriesArrayLike, optional
Array or similar structure with geometries. It must have the same length as
values. Optional parameter, if not given thendsmust be provided.- value_column_nameAny, optional
Name of the column with block rates. Must be provided when the
dsparameter is given, otherwise it is set to ‘values’ if not provided.- geometry_column_nameAny, default = ‘geometry’
Name of the column with a block geometry.
- index_column_nameAny, optional
Name of the indexing column.
- representative_points_column_nameAny, optional
The column with representative points or coordinates.
- representative_points_from_centroidbool, default = False
Calculate representative points from block centroids.
- representative_points_from_random_samplebool, default = False
Calculate representative points from the point sampled from block geometry.
- representative_points_from_largest_areabool, default = True
When MultiPolygon is passed sample representative points from the largest area.
- distances_between_representative_pointsbool, default = True
Calculate distances between representative points during class initialization.
- angles_between_representative_pointsbool, default = False
Calculate angles between representative points during class initialization.
- Attributes:
- dsgpd.GeoDataFrame
Dataset with block values.
- value_column_nameAny
Name of the column with block rates.
- geometry_column_nameAny, default = ‘geometry’
Name of the column with a block geometry.
- index_column_nameAny, optional
Name of the indexing column.
- rep_points_column_nameAny, optional
The column with representative points or coordinates.
- anglesnumpy array
Angles between the blocks representative points.
- distancesnumpy array
Distances between the blocks representative points.
Methods
block_data()
Longitude, latitude, and value as numpy array.
block_indexes()
Block indexes as numpy array.
block_representative_points()
Representative points - lon, lat as numpy array.
block_values()
Block values as numpy array.
block_coordinates(block_id)
Single block representative point.
block_real_value(block_id)
Single block observation.
calculate_angles_between_rep_points()
Angles between blocks, calculated as angles between each representative point and others. If
updateis True then it updatesanglesattribute. Returns dictionary with block index as a key and angles to other blocks ordered the same way as dictionary keys as values.calculate_distances_between_rep_points()
Distances between blocks, calculated as distances between each representative point and others. If
updateis set to True then it updatesdistancesattribute. Returns Data Frame with block indexes as columns and indexes and distances as values.get_blocks_values()
Get multiple blocks values.
pop()
Experimental. Removes block with specified index from the dataset and returns removed block as the
Blocksobject. Alters object.representative_points_array()
Numpy array with representative points - longitude, latitude, and value.
select_distances_between_blocks()
Select distances between a given block and all other blocks.
transform_crs()
Transform Blocks Coordinate Reference System.
- Raises:
- AttributeErrorIf both
representative_points_from_centroidand representative_points_from_random_sampleare set to True.
- AttributeErrorIf both
See also
PointSupportClass heavily using
Blocksfor the semivariogram deconvolution.
Examples
>>> import os >>> import geopandas as gpd >>> from pyinterpolate import Blocks >>> >>> >>> FILENAME = 'cancer_data.gpkg' >>> LAYER_NAME = 'areas' >>> DS = gpd.read_file(FILENAME, layer=LAYER_NAME) >>> AREA_VALUES = 'rate' >>> AREA_INDEX = 'FIPS' >>> AREA_GEOMETRY = 'geometry' >>> >>> CANCER_DATA = { ... 'ds': DS, ... 'index_column_name': AREA_INDEX, ... 'value_column_name': AREA_VALUES, ... 'geometry_column_name': AREA_GEOMETRY ... } >>> block = Blocks(**CANCER_DATA) >>> print(block.ds.columns) Index(['FIPS', 'rate', 'geometry', 'rep_points', 'lon', 'lat'], dtype='object')
- block_coordinates(block_id: Hashable)[source]
Gets block representative point.
- Parameters:
- block_idHashable
- Returns:
- : Point
- property block_data
Returns block data.
- Returns:
- : numpy array
Block data [x, y, value].
- property block_indexes
Returns index column values.
- Returns:
- : numpy array
Block indexes.
- block_real_value(block_id: Hashable)[source]
Gets block total value.
- Parameters:
- block_idHashable
- Returns:
- : float
- property block_representative_points
Returns block representative coordinates.
- Returns:
- : numpy array
Block representative coordinates.
- property block_values
Returns block values.
- Returns:
- : numpy array
Block values.
- calculate_angles_between_rep_points(update=True) Dict[source]
Angles between all representative points to all other representative points.
- Parameters:
- updatebool, default = True
Update
anglesattribute.
- Returns:
- : Dict
block index: angles to other blocks ordered like block indexes (keys) in a dictionary
- calculate_distances_between_rep_points(update=True) DataFrame[source]
Gets distances between representative points within blocks.
- Parameters:
- updatebool, default = True
Update
distancesattribute.
- Returns:
- : DataFrame
Columns and indexes are blocks ids, values are distances between blocks.
- get_blocks_values(indexes: ArrayLike = None)[source]
Returns values of observations aggregated within blocks.
- Parameters:
- indexesArray-like, optional
Indexes of blocks to get values from. If not given then all blocks are returned.
- Returns:
- : numpy array
- pop(block_index: str | Hashable)[source]
Removes block with specified index from the dataset and returns removed block as the
Blocksobject.- Parameters:
- block_indexUnion[str, Hashable]
Index of the block to remove.
- Returns:
- : Blocks
Single block as the Blocks object.
- representative_points_array()[source]
Returns array with blocks’ representative points.
- Returns:
- : numpy array
[lon, lat, value]
- select_distances_between_blocks(block_id, other_blocks=None) ndarray[source]
Method selects distances between specified blocks and all other blocks.
- Parameters:
- block_id
Single block ID or list with IDs to retrieve.
- other_blocksoptional
Other blocks to get distance to those blocks, if not given then all other blocks are returned.
- Returns:
- : numpy array
Index is block id, columns are other blocks.
- transform_crs(target_crs, inplace=True)[source]
Function transforms Blocks CRS
- Parameters:
- target_crspyproj.CRS or EPSG code
The value can be anything accepted by
pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.- inplacebool, default = True
When set to True then transform object’s instance on the fly, otherwise return modified object and do leave the old instance unchanged.
Point Support#
- class pyinterpolate.PointSupport(blocks: Blocks, points: GeoDataFrame = None, values: ArrayLike = None, geometries: ArrayLike = None, points_value_column: str = None, points_geometry_column: str = None, store_dropped_points: bool = False, use_point_support_crs: bool = False, no_possible_neighbors=0, verbose=True)[source]
Class represents ps_blocks and their point support.
- Parameters:
- blocks: Blocks
Blocksobject with polygons data.- points: gpd.GeoDataFrame, optional
Point support data, it should have geometry (Point) column and value column. Must be provided if
valuesandgeometryparameters are not given.- valuesArrayLike, optional
Aggregated values of each block. Optional parameter, if not given then
dsmust be provided.- geometriesArrayLike, optional
Array or similar structure with geometries. It must have the same length as
values. Optional parameter, if not given thendsmust be provided.- points_value_column: str, optional
The name of the point-support column with points values (e.g. population).
- points_geometry_column: str, optional
The name of the point-support column with a geometry.
- store_dropped_points: bool = False
Should object store points which weren’t joined with ps_blocks?
- use_point_support_crs: bool = False
Should object use point support CRS instead of ps_blocks CRS? Both CRS are projected into the same projection, and this parameter decides into which CRS the data should be reprojected.
- no_possible_neighborsint, default = 0
The maximum number of the closest ps_blocks used for the calculation of distances between point support coordinates. Default 0 indicates that all ps_blocks are used.
- verbose: bool = True
Information about the progress of the calculations.
- Attributes:
- blocksBlocks
Blocks object with polygons data.
- blocks_distancesnumpy array
Distances between the ps_blocks’ representative points.
- blocks_index_columnstr
Name of the column with block indexes.
- dropped_pointsGeoDataFrame, optional
Points which weren’t joined with ps_blocks (due to the lack of spatial overlap). Attribute can be None if the parameter
store_dropped_pointswas set to False.- point_supportGeoDataFrame
Columns:
lon_col_name,lat_col_name,point-support-value,block-index.- point_support_blocks_index_namestr, optional
Name of the column with block indexes in the point support. If the column name is not given in the
ps_blocksobject, then the default name"blocks_index"is used.- unique_blocksnumpy array
Unique block indexes from the point support.
Methods
lon_col_name
(str, property) Name of the column with longitude.
lat_col_name
(str, property) Name of the column with latitude.
get_distances_between_known_blocks()
Function returns distances between given ps_blocks.
get_point_to_block_indexes()
Method returns block indexes for each point in the same order as points are stored in the
point_support.get_points_array()
Method returns point coordinates and their values as a numpy array.
point_support_totals()
Function retrieves total point support values for given ps_blocks.
Notes
The
PointSupportclass structure is designed to store the information about the points within polygons. During the regularization process, the inblock semivariograms are estimated from the polygon’s point support, and semivariances are calculated between point supports of neighbouring ps_blocks.The class takes the point support grid and ps_blocks data (polygons). Then, spatial join is performed and points are assigned to areas where they fall. The core attribute is
point_support. It is aGeoDataFramewith columns:lon_col_name- a floating representation of longitude,lat_col_name- a floating representation of latitude,point-support-value- the attribute describing the name of a column with the point-support’s value,block-index- the name of a column directing to the block index values.
Examples
>>> import os >>> import geopandas as gpd >>> from pyinterpolate import ( >>> Blocks, ExperimentalVariogram, PointSupport, TheoreticalVariogram >>> ) >>> >>> >>> FILENAME = 'cancer_data.gpkg' >>> LAYER_NAME = 'areas' >>> DS = gpd.read_file(FILENAME, layer=LAYER_NAME) >>> AREA_VALUES = 'rate' >>> AREA_INDEX = 'FIPS' >>> AREA_GEOMETRY = 'geometry' >>> PS_LAYER_NAME = 'points' >>> PS_VALUES = 'POP10' >>> PS_GEOMETRY = 'geometry' >>> PS = gpd.read_file(FILENAME, layer=PS_LAYER_NAME) >>> >>> CANCER_DATA = { ... 'ds': DS, ... 'index_column_name': AREA_INDEX, ... 'value_column_name': AREA_VALUES, ... 'geometry_column_name': AREA_GEOMETRY ... } >>> POINT_SUPPORT_DATA = { ... 'ps': PS, ... 'value_column_name': PS_VALUES, ... 'geometry_column_name': PS_GEOMETRY ... } >>> block = Blocks(**CANCER_DATA) >>> >>> ps = PointSupport( ... points=POINT_SUPPORT_DATA['ps'], ... ps_blocks=BLOCKS, ... points_value_column=POINT_SUPPORT_DATA['value_column_name'], ... points_geometry_column=POINT_SUPPORT_DATA['geometry_column_name'] ... ) >>> print(ps.unique_blocks[:2]) [42049. 42039.]
- get_distances_between_known_blocks(block_ids: List | ndarray) ndarray[source]
Function returns distances between known ps_blocks.
- Parameters:
- block_idsUnion[List, np.ndarray]
List with block indexes.
- Returns:
- : numpy array
Distances from ps_blocks to all other ps_blocks (ordered the same way as input
block_idslist, where rows and columns represent the block indexes).
- get_point_to_block_indexes() Series[source]
Method returns block indexes for each point in the same order as points are stored in the
point_support.- Returns:
- : pandas Series
((point support index: block index))
- get_points_array(block_id=None) ndarray[source]
Method returns point coordinates and their values as a numpy array
- Parameters:
- block_idAny
Block for which points should be retrieved, if not given then all points are returned.
- Returns:
- : numpy array
((lon_col_name, lat_col_name, value))
- point_support_totals(blocks: Iterable)[source]
Function retrieves total point support values for given ps_blocks.
- Parameters:
- blocksIterable
Block indexes.
- Returns:
- : numpy array
Retrieved values.
- class pyinterpolate.PointSupportDistance(verbose=True)[source]
Class calculates and stores distances between point supports of multiple blocks.
- Parameters:
- verbosebool, default = True
Show progress.
- Attributes:
- weighted_block_to_block_distancesDataFrame
Indexes: block indexes, Columns: block indexes, Cells: distances.
- distances_between_point_supportsDict
(block_a, block_b): [[value_a(i), value_b(j), distance(i-j)], ...]- no_closest_neighborsint
Number of closest neighbors for each block.
- closest_neighborsDict
Block id: [the closest blocks].
Methods
calc_pair_distances()
Returns distances between point supports from two blocks and updates distances dictionary.
calculate_point_support_distances()
Calculates distances between point supports.
calculate_weighted_block_to_block_distances()
Calculates weighted distances between blocks using their point supports.
get_weighted_distance()
Returns weighted distance to a block.
- Raises:
- AttributeError
When weighted block to block distances are not calculated and user wants to find closest neighbors (using
calculate_point_support_distances()method withno_closest_neighbors> 0).
Examples
>>> import os >>> import geopandas as gpd >>> from pyinterpolate import ( >>> Blocks, ExperimentalVariogram, PointSupport, PointSupportDistance, >>> TheoreticalVariogram, >>> ) >>> >>> >>> FILENAME = 'cancer_data.gpkg' >>> LAYER_NAME = 'areas' >>> DS = gpd.read_file(FILENAME, layer=LAYER_NAME) >>> AREA_VALUES = 'rate' >>> AREA_INDEX = 'FIPS' >>> AREA_GEOMETRY = 'geometry' >>> PS_LAYER_NAME = 'points' >>> PS_VALUES = 'POP10' >>> PS_GEOMETRY = 'geometry' >>> PS = gpd.read_file(FILENAME, layer=PS_LAYER_NAME) >>> >>> CANCER_DATA = { ... 'ds': DS, ... 'index_column_name': AREA_INDEX, ... 'value_column_name': AREA_VALUES, ... 'geometry_column_name': AREA_GEOMETRY ... } >>> POINT_SUPPORT_DATA = { ... 'ps': PS, ... 'value_column_name': PS_VALUES, ... 'geometry_column_name': PS_GEOMETRY ... } >>> block = Blocks(**CANCER_DATA) >>> >>> ps = PointSupport( ... points=POINT_SUPPORT_DATA['ps'], ... ps_blocks=BLOCKS, ... points_value_column=POINT_SUPPORT_DATA['value_column_name'], ... points_geometry_column=POINT_SUPPORT_DATA['geometry_column_name'] ... ) >>> >>> pds = PointSupportDistance(verbose=False) >>> distances_between_neighbors = pds.calculate_point_support_distances( ... point_support=ps, ... block_id=36033, ... number_of_neighbors=2 ... ) >>> print(distances_between_neighbors.keys()) # dict[tuple: array] dict_keys([(36033, 36019), (36019, 36033), (36033, 36089), (36089, 36033)]) >>> print(pds.closest_neighbors) {36033: [36019, 36089]}
- calc_pair_distances(point_support, block_pair: Tuple, update=True) ndarray[source]
Returns distances between point supports from two blocks and updates distances dictionary.
- Parameters:
- point_supportPointSupport
Blocks and their point supports.
- block_pairTuple
(block_a, block_b)
- updatebool, default = True
If True then distances are updated in the distances’ dictionary.
- Returns:
- : numpy array
Distances between point supports from two blocks.
- calculate_point_support_distances(point_support, block_id, no_closest_neighbors: int = 0) dict[source]
Calculates distances between point supports.
- Parameters:
- point_supportPointSupport
Blocks and their point supports.
- block_idint
The unique id of a block.
- no_closest_neighborsint, default = 0
Number of the closest neighbors. If default then all distances are returned.
- Returns:
- : Dict
Dictionary with distances between point supports of a given block and its neighbors. Key is a block pair, and value is a numpy array with distances, where each row represents a point from a given block and each column represents a point from its neighbor.
- calculate_weighted_block_to_block_distances(point_support, return_distances=False)[source]
Calculates weighted distances between blocks using their point supports.
- Parameters:
- point_supportPointSupport
Blocks and their point supports.
- return_distancesbool, default = False
Should return DataFrame with distances?
- Returns:
- : pd.DataFrame
Indexes: block indexes, Columns: block indexes, Cells: distances.
- get_weighted_distance(block_id) Series[source]
Returns weighted distance to a block.
- Parameters:
- block_idUnion[Hashable, str]
Block unique index.
- Returns:
- : pd.Series
Weighted distances between the block and other blocks centroids.