Directional Semivariogram#

In this tutorial, we will learn about directional semivariograms, how to set the angle of direction, and the tolerance parameter. We compare two neighbor selection methods: triangular, which is fast, and elliptical, which is accurate.

Prerequisites#

  • Domain:

    • semivariance and covariance functions

  • Package:

    • TheoreticalVariogram, ExperimentalVariogram

  • Programming:

    • Python basics

Table of contents#

  1. A directional process.

  2. Create directional and isotropic semivariograms.

  3. Compare directional semivariograms.

1. Directional process#

Not every spatial process may be described by isotropic semivariograms. Sometimes we see a specific trend in one direction (N-S, W-E, or NE-SW, NW-SE). Pyinterpolate lets us model semivariance in a specific direction, or multiple directions with omnidirectional case. If you want to go straight to the code, then you should check API examples in notebooks (1) tutorials/api-examples/a-1-2-directional-experimental-variogram and (2) tutorials/api-examples/a-1-4-directional-variogram-class. This guide is more detailed, thus recommeded for users not familiar with the directional semivariogram concept.

Note: if you are a machine learning enthusiast, then you can treat the direction as an additional feature to model semivariance.

2. Create directional and isotropic semivariograms#

[1]:
import geopandas as gpd
from pyinterpolate import ExperimentalVariogram, DirectionalVariogram

import matplotlib.pyplot as plt
[2]:
VALUE_COL = 'PM2.5'
df = gpd.read_file('../data/air_pollution.gpkg', layer='pm2_5')
df.set_index('station_id', inplace=True)
[3]:
df.head()
[3]:
PM2.5 geometry
station_id
659 7.12765 POINT (720513.167 300494.94)
736 4.66432 POINT (475344.002 724090.975)
861 3.00739 POINT (526820.46 700473.261)
266 7.10000 POINT (748935.825 383022.087)
355 2.00000 POINT (529001.219 443458.908)
[4]:
df.plot(figsize=(10, 10), column=VALUE_COL, legend=True, markersize=50, alpha=0.7, marker="h", cmap='Reds')
plt.title('PM2.5 concentrations in Poland')
plt.show()
../../../_images/usage_tutorials_functional_2-1-directional-semivariogram_5_0.png

Including direction in experimental variogram#

If we recall the first tutorial 1-1-semivariogram-exploration, we set there three parameters:

  1. ds: numpy array with coordinates and observed values, for example: [[0, 0, 10], [0, 1, 20]],

  2. step_size: we must divide our analysis area into discrete lags. Lags are intervals (usually circular) within which we check if the point has a neighbor. For example, if we look into the lag 500, then we are going to compare one point with other points in a distance (0, 1000] from this point,

  3. max_range: This parameter represents the possible maximum range of spatial dependency. This parameter should be at most half of the extent.

But that’s not everything! We can use other parameters:

  1. direction: it is a float in the range [0, 360]. We set the direction of the semivariogram:

  • 0 or 180: is WE direction,

  • 90 or 270 is NS direction,

  • 45 or 225 is NE-SW direction,

  • 135 or 315 is NW-SE direction.

  1. tolerance: it is a float in the range [0, 1]. If we leave tolerance with default 1, we will always get an isotropic semivariogram. Another edge case is if we set tolerance to 0, then points must be placed on a single line with the beginning in the origin of the coordinate system and the angle given by the y-axis and direction parameter. If tolerance is > 0 and < 1, the bin is selected as an elliptical area with a major axis pointed in the same direction as the line for 0 tolerance.

  • The major axis size is (tolerance * step_size),

  • The minor axis size is ((1 - tolerance) * step_size),

  • The baseline point is at the center of the ellipse.

Those parameters are used to estimate semivariance and covariance in a leading direction. The direction and tolerance parameters might be better described with the picture:

The visualization of directional variogram calculation

  • The top plane shows the black unit circle that represents the omnidirectional variogram. Within it, we see two ellipses: one is bright green and another is dark green.

  • The bottom plane shows a unit circle and two ellipses: the brighter yellow, and darker purple.

  • The long arrows within both circles are radii of the omnidirectional variogram or the semi-major axis of a directional ellipse. The step_size parameter controls its length. The shorter arrows are present only in the directional variograms. These represent the semi-minor axes. The tolerance parameter controls their length, and it is always:

  • a fraction of step_size,

  • 1, in this case, a semivariogram is omnidirectional,

  • a value very close to 0 (but not 0) - then ellipse falls into line.

  • The good idea is to set tolerance to 0.5 and gradually make it smaller or larger (depending on the spatial properties of a dataset).


We can start the analysis by understanding how our parameters affect the semivariogram bins range. We will set tolerance to 0.2 in each case to better visualize the effects of the leading direction.

Case 1: West-East direction#

[5]:
SEMI_MAJOR_AXIS_SIZE = 40000  # meters
MAX_RANGE = 400000  # meters
TOLERANCE = 0.2
WE_DIRECTION = 0  # or 180
[6]:
exp_var = ExperimentalVariogram(
    values=df[VALUE_COL],
    geometries=df['geometry'],
    step_size=SEMI_MAJOR_AXIS_SIZE,
    max_range=MAX_RANGE,
    direction=WE_DIRECTION,
    tolerance=TOLERANCE
)
exp_var.plot(semivariance=True, covariance=False, variance=False)
../../../_images/usage_tutorials_functional_2-1-directional-semivariogram_9_0.png

Case 2: North-South direction#

[7]:
NS_DIRECTION = 90  # or 270
[8]:
exp_var = ExperimentalVariogram(
    values=df[VALUE_COL],
    geometries=df['geometry'],
    step_size=SEMI_MAJOR_AXIS_SIZE,
    max_range=MAX_RANGE,
    direction=NS_DIRECTION,
    tolerance=TOLERANCE
)
exp_var.plot(semivariance=True, covariance=False, variance=False)
../../../_images/usage_tutorials_functional_2-1-directional-semivariogram_12_0.png

Case 3: Northwest-Southeast direction#

[9]:
NW_SE_DIRECTION = 135  # or 315
[10]:
exp_var = ExperimentalVariogram(
    values=df[VALUE_COL],
    geometries=df['geometry'],
    step_size=SEMI_MAJOR_AXIS_SIZE,
    max_range=MAX_RANGE,
    direction=NW_SE_DIRECTION,
    tolerance=TOLERANCE
)
exp_var.plot(semivariance=True, covariance=False, variance=False)
../../../_images/usage_tutorials_functional_2-1-directional-semivariogram_15_0.png

Case 4: Northeast-Southwest direction#

[11]:
NE_SW_DIRECTION = 45  # or 225
[12]:
exp_var = ExperimentalVariogram(
    values=df[VALUE_COL],
    geometries=df['geometry'],
    step_size=SEMI_MAJOR_AXIS_SIZE,
    max_range=MAX_RANGE,
    direction=NE_SW_DIRECTION,
    tolerance=TOLERANCE
)
exp_var.plot(semivariance=True, covariance=False, variance=False)
../../../_images/usage_tutorials_functional_2-1-directional-semivariogram_18_0.png

Case 5: Isotropic variogram - no leading direction#

[13]:
exp_var = ExperimentalVariogram(
    values=df[VALUE_COL],
    geometries=df['geometry'],
    step_size=SEMI_MAJOR_AXIS_SIZE,
    max_range=MAX_RANGE
)
exp_var.plot(semivariance=True, covariance=False, variance=False)
../../../_images/usage_tutorials_functional_2-1-directional-semivariogram_20_0.png

3. Compare semivariograms#

We have created a set of variograms. What do we observe?

  • The NE-SW variogram is very weak at describing a short-range variation (compare it to the map of air pollution from the beginning of the tutorial. Points in this direction are relatively similar).

  • The N-S variogram works well for a short range.

  • The W-E variogram catches too much variability, and lags must be longer for this direction.

  • The NW-SE variogram looks good and shows approximately linear variability change in a distance function. It has the smallest variance from all variograms.

We can visualize and compare all variograms simultaneously, and we are sure that the y-axis is the same for every plot. We will use special class of DirectionalVariogram to calculate semivariances in all directions at once!

[14]:
dir_var = DirectionalVariogram(
    values=df[VALUE_COL],
    geometries=df['geometry'],
    step_size=SEMI_MAJOR_AXIS_SIZE,
    max_range=MAX_RANGE,
    tolerance=TOLERANCE
)
[15]:
dir_var.show()
../../../_images/usage_tutorials_functional_2-1-directional-semivariogram_23_0.png

In your opinion, which semivariogram model is optimal? Should we pick isotropic (omnidirectional) variogram, or one of directional variograms? I’d check how variograms behave at a close distances - in this comparison NE-SW shouldn’t be used. But any decision is not definitive, the best what we can do is to perform cross validation on different directional variograms and with different step sizes (bins width).

Changelog#

Date

Changes

Author

2025-11-07

Used values and geometries parameters instead of ds in the experimental variogram initialization

@SimonMolinsky (Szymon Moliński)

2025-09-16

Removed dir_neighbors_selection_method parameter

@SimonMolinsky (Szymon Moliński)

2025-04-24

Tutorial has been adapted to the 1.0 release

@SimonMolinsky (Szymon Moliński)