A.2.1 Directional semivariograms#

Table of Contents:#

Read point data,
Create directional and isotropic semivariograms,
Compare semivariograms,
Compare triangular vs elliptical neighbors selection methods.

Introduction#

In this tutorial, we will learn about directional semivariograms, how to set the angle of direction, and the tolerance parameter. We compare two neighbor selection methods: triangular, which is fast, and elliptical, which is accurate.

A directional process#

Not every spatial process may be described by isotropic semivariograms. Sometimes we see a specific trend in one direction (N-S, W-E, or NE-SW, NW-SE). A well-performed analysis includes directions. That’s why we will learn to write code in pyinterpolate that retrieves the directional semivariogram.

We use the air pollution data.

Note: if you are a machine learning enthusiast, it will be easier to understand direction as an additional feature to model semivariance.

[3]:

import geopandas as gpd
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

# Experimental variogram
from pyinterpolate import build_experimental_variogram

1) Read and show point data#

[4]:

DATASET = 'PM2.5'

[5]:

ds = pd.read_csv('samples/directional_variogram_PM25_20220922.csv', index_col='station_id')
ds.head()

[5]:

	y	x	PM2.5
station_id
659	50.529892	22.112467	7.12765
736	54.380279	18.620274	4.66432
861	54.167847	19.410942	3.00739
266	51.259431	22.569133	7.10000
355	51.856692	19.421231	2.00000

[6]:

def df2gdf(df, lon_col='x', lat_col='y', epsg=4326, crs=None):
    """
    Function transforms DataFrame into GeoDataFrame.

    Parameters
    ----------
    df : pandas DataFrame

    lon_col : str, default = 'x'
        Longitude column name.

    lat_col : str, default = 'y'
        Latitude column name.

    epsg : Union[int, str], default = 4326
        EPSG number of projection.

    crs : str, default = None
        Coordinate Reference System of data.

    Returns
    -------
    gdf : GeoPandas GeoDataFrame
        GeoDataFrame with set geometry column ('geometry'), CRS, and all columns from the passed DataFrame.

    """

    geometry = gpd.points_from_xy(x=df[lon_col], y=df[lat_col])
    geometry.name = 'geometry'
    gdf = gpd.GeoDataFrame(df, geometry=geometry)

    if crs is None:
        gdf.set_crs(epsg=epsg, inplace=True)
    else:
        gdf.set_crs(crs=crs, inplace=True)

    return gdf

[7]:

gds = df2gdf(ds)
gds.head()

[7]:

	y	x	PM2.5	geometry
station_id
659	50.529892	22.112467	7.12765	POINT (22.11247 50.52989)
736	54.380279	18.620274	4.66432	POINT (18.62027 54.38028)
861	54.167847	19.410942	3.00739	POINT (19.41094 54.16785)
266	51.259431	22.569133	7.10000	POINT (22.56913 51.25943)
355	51.856692	19.421231	2.00000	POINT (19.42123 51.85669)

[8]:

gds.isna().any()

[8]:

y           False
x           False
PM2.5        True
geometry    False
dtype: bool

[9]:

gds = gds.dropna(axis=0)

[10]:

gds = gds.to_crs(epsg=2180)
gds.plot(figsize=(10, 10), column=DATASET, legend=True, markersize=20, alpha=0.7)
plt.title('PM2.5 concentrations in Poland')

[10]:

Text(0.5, 1.0, 'PM2.5 concentrations in Poland')

../../_images/usage_tutorials_A21-Directional-Semivariogram_9_1.png

2) Create the experimental semivariogram#

If we recall the tutorial A.1.1 Semivariogram Estimation, there are three main parameters to set for the build_experimental_variogram() function:

input_array: numpy array with coordinates and observed values, for example: [[0, 0, 10], [0, 1, 20]],
step_size: we must divide our analysis area into discrete lags. Lags are intervals (usually circular) within which we check if the point has a neighbor. For example, if we look into the lag 500, then we are going to compare one point with other points in a distance (0, 1000] from this point,
max_range: This parameter represents the possible maximum range of spatial dependency. This parameter should be at most half of the extent.

But that’s not everything! We didn’t use three other parameters:

direction: it is a float in the range [0, 360]. We set the direction of the semivariogram:

0 or 180: is WE direction,
90 or 270 is NS direction,
45 or 225 is NE-SW direction,
135 or 315 is NW-SE direction.

tolerance: it is a float in the range [0, 1]. If we leave tolerance with default 1, we will always get an isotropic semivariogram. Another edge case is if we set tolerance to 0, then points must be placed on a single line with the beginning in the origin of the coordinate system and the angle given by the y-axis and direction parameter. If tolerance is > 0 and < 1, the bin is selected as an elliptical area with a major axis pointed in the same direction as the line for 0 tolerance.

The major axis size is (tolerance * step_size),
The minor axis size is ((1 - tolerance) * step_size),
The baseline point is at the center of the ellipse.

method: it is str with possible values:

t or triangle for triangular neighbors selection which is fast and recommended for a big dataset,
e or ellipse for elliptical neighbors selection which is accurate but slow, recommended for small-size datasets.

The best idea is to visualize it on a cartesian plane:

The visualization of directional variogram calculation

The top plane shows a unit circle that is the omnidirectional variogram (black circle). Within it, we see two ellipses: one is bright green, and another is dark green.
The bottom plane also shows a unit circle and two ellipses: the brighter yellow, and darker purple.
The long arrows within both circles are radiuses of the omnidirectional variogram or the semi-major axis of a directional ellipse. The step_size parameter controls its length. The short arrows are present only in the directional variograms and are the semi-minor axes. The tolerance parameter controls their length, and it is always:
a fraction of step_size,
1, in this case, a semivariogram is omnidirectional,
a value very close to 0 (but not 0) - an ellipse falls into the line.
The good idea is to set tolerance to 0.5 and gradually make it smaller or larger (depending on the spatial features).

We can start the analysis by understanding how our parameters affect the semivariogram bins range. We will set tolerance to 0.2 in each case to better visualize the effects of a leading direction.

Case 1: W-E Direction#

[11]:

BIN_RADIUS = 60000  # meters
MAX_RANGE = 400000  # meters
TOLERANCE = 0.2

[12]:

inp_arr = np.array(list(zip(gds['geometry'].x,
                            gds['geometry'].y,
                            gds[DATASET])))

[13]:

we_variogram = build_experimental_variogram(
    input_array=inp_arr,
    step_size=BIN_RADIUS,
    max_range=MAX_RANGE,
    direction=0,
    tolerance=TOLERANCE,
    method='e'
)

we_variogram.plot(plot_semivariance=True)

../../_images/usage_tutorials_A21-Directional-Semivariogram_16_0.png

Case 2: N-S Direction#

[14]:

ns_variogram = build_experimental_variogram(
    input_array=inp_arr,
    step_size=BIN_RADIUS,
    max_range=MAX_RANGE,
    direction=90,
    tolerance=TOLERANCE,
    method='e'
)

ns_variogram.plot(plot_semivariance=True)

../../_images/usage_tutorials_A21-Directional-Semivariogram_18_0.png

Case 3: NW-SE Direction#

[15]:

nw_se_variogram = build_experimental_variogram(
    input_array=inp_arr,
    step_size=BIN_RADIUS,
    max_range=MAX_RANGE,
    direction=135,
    tolerance=TOLERANCE,
    method='e'
)

nw_se_variogram.plot(plot_semivariance=True)

../../_images/usage_tutorials_A21-Directional-Semivariogram_20_0.png

Case 4: NE-SW Direction#

[16]:

ne_sw_variogram = build_experimental_variogram(
    input_array=inp_arr,
    step_size=BIN_RADIUS,
    max_range=MAX_RANGE,
    direction=45,
    tolerance=TOLERANCE,
    method='e'
)

ne_sw_variogram.plot(plot_semivariance=True)

../../_images/usage_tutorials_A21-Directional-Semivariogram_22_0.png

Case 5: Isotropic Variogram#

[17]:

iso_variogram = build_experimental_variogram(
    input_array=inp_arr,
    step_size=BIN_RADIUS,
    max_range=MAX_RANGE
)

iso_variogram.plot()

../../_images/usage_tutorials_A21-Directional-Semivariogram_24_0.png

3) Compare semivariograms#

We have created a set of variograms. What did we observe?

The NE-SW variogram is very weak at describing a short-range variation (compare it to the map of air pollution from the beginning of the tutorial. Points in this direction are relatively similar).
The N-S variogram works well for a short range.
The W-E variogram catches too much variability, and lags must be longer for this direction.
The NW-SE variogram looks good and shows approximately linear variability change in a distance function. It has the smallest variance from all variograms.

We can visualize and compare all variograms simultaneously, and we are sure that the y-axis is the same for every plot.

[18]:

_lags = iso_variogram.lags
_ns = ns_variogram.experimental_semivariances
_we = we_variogram.experimental_semivariances
_nw_se = nw_se_variogram.experimental_semivariances
_ne_sw = ne_sw_variogram.experimental_semivariances
_iso = iso_variogram.experimental_semivariances

plt.figure(figsize=(20, 8))
plt.plot(_lags, _iso, color='black')
plt.plot(_lags, _ns, '--', color='#c2a5cf')
plt.plot(_lags, _we, '--', color='#e7d4e8')
plt.plot(_lags, _nw_se, '--', color='#5aae61')
plt.plot(_lags, _ne_sw, '--', color='#1b7837')
plt.title('Comparison of experimental semivariance models')
plt.legend(['Isotropic',
            'NS',
            'WE',
            'NW-SE',
            'NE-SW'])
plt.xlabel('Distance')
plt.ylabel('Semivariance')
plt.show()

../../_images/usage_tutorials_A21-Directional-Semivariogram_26_0.png

What do you think about this comparison? Do you agree that the NW-SE variogram best fits the data?

4) Bonus: Compare calculation times and results#

Let’s compare calculation times and results from two methods of neighbors interpolation: t (triangular) and e (elliptical). We will follow data in the NW-SE direction.

[19]:

from datetime import datetime

[20]:

t0e = datetime.now()

# Here elliptical

nw_se_variogram_elliptical = build_experimental_variogram(
    input_array=inp_arr,
    step_size=BIN_RADIUS,
    max_range=MAX_RANGE,
    direction=135,
    tolerance=TOLERANCE,
    method='e'
)

tfin_e = (datetime.now() - t0e).total_seconds()

[21]:

t0t = datetime.now()

# Here triangular

nw_se_variogram_triangular = build_experimental_variogram(
    input_array=inp_arr,
    step_size=BIN_RADIUS,
    max_range=MAX_RANGE,
    direction=135,
    tolerance=TOLERANCE,
    method='t'
)

tfin_t = (datetime.now() - t0t).total_seconds()

[22]:

msg = f'The triangular method is {tfin_e / tfin_t} times faster than the elliptical selection.'

print(msg)

The triangular method is 2.7558084695886036 times faster than the elliptical selection.

For this scenario, the triangular method is 1.5x faster than the elliptical selection. However, this difference will be more significant with more points and point pairs, and triangular selection is 5x faster for big datasets.

Are variograms different? Let’s compare!

[23]:

_lags = iso_variogram.lags
_nw_se_ell = nw_se_variogram_elliptical.experimental_semivariances
_nw_se_tri = nw_se_variogram_triangular.experimental_semivariances

plt.figure(figsize=(20, 8))
plt.plot(_lags, _nw_se_ell)
plt.plot(_lags, _nw_se_tri)
plt.title('Comparison of experimental semivariance models for NW-SE direction.')
plt.legend(['Elliptical selection', 'Triangular selection'])
plt.xlabel('Distance')
plt.ylabel('Semivariance')
plt.show()

../../_images/usage_tutorials_A21-Directional-Semivariogram_34_0.png

The general shape is the same, but the variance of a semivariogram from a triangular selection is lower; it is harder to distinguish the covariance between neighboring points and could be dangerously close to the nugget effect. You should consider this (especially) in automatic calculations to avoid strange results or models that are not better than a simple IDW technique.

But semivariogram for the elliptical selection can have different parameters than for the triangular selection. If you choose a different step size and tolerance, the semivariogram’s shape may change and could better describe dissimilarities.

Where to go from here?#

A.2.2 Variogram Point Cloud
A.2.3 Experimental Variogram and Variogram Point Cloud Classes
B.1.1 Ordinary and Simple Kriging
B.3.1 Directional Ordinary Kriging

Changelog#

Date	Change description	Author
2023-08-21	The tutorial was refreshed and set along with the 0.5.0 version of the package	@SimonMolinsky
2023-04-15	Update: 0.4.1	@SimonMolinsky
2023-03-04	Removed import for the download air quality dataset	@SimonMolinsky
2022-11-05	Tutorial updated for the 0.3.5 version of the package	@SimonMolinsky
2022-10-31	Bug update: wrong direction of angle from origin, it was clockwise but now it is counterclockwise	@SimonMolinsky
2022-10-21	Bug update: wrong name of the N-S variogram	@SimonMolinsky
2022-10-17	Bug update: directions are calculated in a valid way. New feature: added comparison between elliptical and triangular neighbors selection	@SimonMolinsky
2022-09-22	The first version of tutorial	@SimonMolinsky

[ ]: