Models evaluation#

Cross-validation#

pyinterpolate.evaluate.cross_validation.validate_kriging(theoretical_model: TheoreticalVariogram, points: ArrayLike = None, values: ArrayLike = None, geometries: ArrayLike = None, how: str = 'ok', neighbors_range: float | None = None, no_neighbors: int = 4, use_all_neighbors_in_range=False, sk_mean: float | None = None, allow_approximate_solutions=False, progress_bar: bool = True) → Tuple[float, float, ndarray][source]

Function performs cross-validation of kriging models.

Parameters:

theoretical_modelTheoreticalVariogram

Fitted variogram model.

pointsArrayLike, optional

Known points and their values [x, y, value].

valuesArrayLike, optional

Observation in the i-th geometry (from geometries). Optional parameter, if not given then points must be provided.

geometriesArrayLike, optional

Array or similar structure with geometries. It must have the same length as values. Optional parameter, if not given then points must be provided. Point type geometry.

howstr, default=’ok’

Select what kind of kriging you want to perform

‘ok’: ordinary kriging,

‘sk’: simple kriging - if it is set then sk_mean parameter must be provided.

neighbors_rangefloat, default=None

The maximum distance where we search for neighbors. If None is given then range is selected from the theoretical_model rang attribute.

no_neighborsint, default = 4

The number of the n-closest neighbors used for interpolation.

use_all_neighbors_in_rangebool, default = False

True: if the number of neighbors within the neighbors_range is greater than the number_of_neighbors parameter then use all neighbors, do not clip their number.

sk_meanfloat, default=None

The mean value of a process over a study area. Should be known before processing. That’s why Simple Kriging has a limited number of applications. You must have multiple samples and well-known area to know this parameter.

allow_approximate_solutionsbool, default=False

Allows the approximation of kriging weights based on the OLS algorithm. We don’t recommend set it to True if you don’t know what are you doing. This parameter can be useful when you have clusters in your dataset, that can lead to singular or near-singular matrix creation.

progress_barbool, default=True

Show process status.

Returns:

: Tuple

Function returns tuple with:

Mean Prediction Error,

Mean Kriging Error: ratio of variance of prediction errors to the average variance error of kriging,

array with: [coordinate x, coordinate y, prediction error, kriging estimate error, z-value, z-ci-min, z-ci-max]

References

Clark, I., (2004), The Art of Cross Validation in Geostatistical Applications
Clark I., (1979), Does Geostatistics Work, Proc. 16th APCOM, pp.213.-225.

Examples

>>> from pyinterpolate import (
...     ExperimentalVariogram,
...     validate_kriging,
...     TheoreticalVariogram
... )
>>>
>>>
>>> POINTS_DATA = ...  # load dataset
>>> POINTS_VARIOGRAM = ExperimentalVariogram(POINTS_DATA,
...                                          step_size=1,
...                                          max_range=6)
>>> THEORETICAL_MODEL = TheoreticalVariogram()
>>> THEORETICAL_MODEL.autofit(experimental_variogram=POINTS_VARIOGRAM,
...                           models_group='linear',
...                           nugget=0.0)
>>> validation_results = validate_kriging(
...     theoretical_model=THEORETICAL_MODEL,
...     values=POINTS_DATA[:, -1],
...     geometries=POINTS_DATA[:, :-1],
...     no_neighbors=4,
...     progress_bar=False
... )
>>> print(validation_results[0])  # mean prediction error
-0.01613441673494531
>>> print(validation_results[1])  # mean kriging error
1.6386630811210166

Metrics#

pyinterpolate.evaluate.metrics.forecast_bias(predicted_array: ndarray, real_array: ndarray) → float[source]

Function calculates forecast bias of prediction.

Parameters:

predicted_arraynumpy array: Predicted values.
real_arraynumpy array: Real observations.

Returns:

fbfloat: Forecast Bias of prediction.

Notes

How do we interpret forecast bias? Here are two important properties:

Large positive value means that our observations are usually higher than prediction. Our model underestimates predictions.
Large negative value tells us that our predictions are usually higher than expected values. Our model overestimates predictions.

Equation:

\[e_{fb} = \frac{\sum_{i}^{N}{y_{i} - \bar{y_{i}}}}{N}\]

where:

\(e_{fb}\) - forecast bias,
\(y_{i}\) - i-th observation,
\(\bar{y_{i}}\) - i-th prediction,
\(N\) - number of observations.

Examples

>>> import numpy as np
>>> from pyinterpolate.evaluate.metrics import forecast_bias
>>>
>>>
>>> arr = np.array([1, 2, 3, 4, 5])
>>> preds = np.array([1, 2, 2, 5, 6])
>>> bias = forecast_bias(preds, arr)
>>> print(bias)
-0.2

pyinterpolate.evaluate.metrics.mean_absolute_error(predicted_array: ndarray, real_array: ndarray) → float[source]

Function calculates Mean Absolute Error (MAE) of prediction.

Parameters:

predicted_arraynumpy array: Predicted values.
real_arraynumpy array: Observations.

Returns:

maefloat: Mean absolute Error (MAE) of prediction.
Equation:

\[e_{mae} = \frac{\sum_{i}^{N}{|y_{i} - \bar{y_{i}}|}}{N} ..\]

where:

\(e_{mae}\) - mean absolute error,
\(y_{i}\) - i-th observation,
\(\bar{y_{i}}\) - i-th prediction,
\(N\) - number of observations.

Examples

>>> import numpy as np
>>> from pyinterpolate.evaluate.metrics import mean_absolute_error
>>>
>>>
>>> arr = np.array([1, 2, 3, 4, 5])
>>> preds = np.array([1, 2, 2, 5, 6])
>>> err = mean_absolute_error(preds, arr)
>>> print(err)
0.6

pyinterpolate.evaluate.metrics.root_mean_squared_error(predicted_array: ndarray, real_array: ndarray) → float[source]

Function calculates Root Mean Squared Error of predictions.

Parameters:

predicted_arraynumpy array: Predictions.
real_arraynumpy array: Observations.

Returns:

rmsefloat: Root Mean Squared Error of prediction.

Notes

Equation:

\[e_{rmse} = \sqrt{\frac{\sum_{i}^{N}({y_{i} - \bar{y_{i}})^2}}{N}}\]

where:

\(e_{rmse}\) - root mean squared error,
\(y_{i}\) - i-th observation,
\(\bar{y_{i}}\) - i-th prediction,
\(N\) - number of observations.

Examples

>>> import numpy as np
>>> from pyinterpolate.evaluate.metrics import root_mean_squared_error
>>>
>>>
>>> arr = np.array([1, 2, 3, 4, 5])
>>> preds = np.array([1, 2, 2, 5, 6])
>>> rmse = root_mean_squared_error(preds, arr)
>>> print(rmse)
0.7745966692414834

pyinterpolate.evaluate.metrics.symmetric_mean_absolute_percentage_error(predicted_array: ndarray, real_array: ndarray, check_undefined=True) → float[source]

Function calculates Symmetric Mean Absolute Percentage Error (SMAPE) of predictions, allowing researcher to compare different models.

Parameters:

predicted_arraynumpy array: Predictions.
real_arraynumpy array: Observations
check_undefinedbool, default = True: Check if there are cases when prediction and observation are equal to 0.

Returns:

smapefloat: Symmetric Mean Absolute Percentage Error.

Warns:

UndefinedSMAPEWarning: Observation and prediction are equal to 0 - SMAPE of this pair is undefined, algorithm assumes that SMAPE equals to 0.

Notes

Symmetric Mean Absolute Percentage Error is an accuracy measure that returns prediction error in percent. It is a relative evaluation metric. It shouldn’t be used alone because SMAPE penalizes more underforecasting, thus it should be compared to Forecast Bias to have a full view of the model properties. SMAPE is better than RMSE or FB for comparing multiple models and algorithms.

More about SMAPE here: https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error

Equation:

\[e_{smape} = \frac{100}{N} \sum_{i}^{N}{\frac{|\bar{y_{i}} - y_{i}|}{|y_{i}|+|\bar{y_{i}}|}}\]

where:

\(e_{smape}\) - symmetric mean absolute percentage error,
\(y_{i}\) - i-th observation,
\(\bar{y_{i}}\) - i-th prediction,
\(N\) - number of observations.

Examples

>>> import numpy as np
>>> from pyinterpolate.evaluate.metrics import symmetric_mean_absolute_percentage_error
>>>
>>>
>>> arr = np.array([1, 2, 3, 4, 5])
>>> preds = np.array([1, 2, 2, 5, 6])
>>> smape = symmetric_mean_absolute_percentage_error(preds, arr)
>>> print(smape)
8.040404040404042

pyinterpolate.evaluate.metrics.weighted_root_mean_squared_error(predicted_array: ndarray, real_array: ndarray, weighting_method: str, lag_points_distribution=None) → float[source]

Function weights RMSE of each lag by a specific weighting factor.

Parameters:

predicted_arraynumpy array: Predictions.
real_arraynumpy array: Observations.
weighting_methodstr: The name of a method used to weight error at a given lags. Available methods: - closest: lags at a close range have greater weights, - distant: lags that are further away have greater weights, - dense: error is weighted by the number of point pairs within a lag.
lag_points_distributionnumpy array, optional: Number of points pairs per lag.

Returns:

wrmsefloat: Weighted Root Mean Squared Error.

Raises:

AttributeError: The lag_points_distribution parameter is undefined when “dense” method is set.

Notes

Error weighting is a useful in the case when we want to force semivariogram to better represent semivariances at specific ranges. The most popular is the "closest" method - we create model that fits better semivariogram at a close distances.

Equations:

"closest"

\[e_{wrmse} = \sqrt{\frac{\sum_{i}^{N}({y_{i} - \bar{y_{i}})^2}*\frac{N-i}{N}}{N}}\]

where:

\(e_{rmse}\) - weighted root mean squared error,
\(i\) - lag, i > 0,
\(y_{i}\) - i-th observation,
\(\bar{y_{i}}\) - i-th prediction,
\(N\) - number of observations.

"distant"

\[e_{wrmse} = \sqrt{\frac{\sum_{i}^{N}({y_{i} - \bar{y_{i}})^2}*\frac{i}{N}}{N}}\]

where:

\(e_{rmse}\) - weighted root mean squared error,
\(i\) - lag, i > 0,
\(y_{i}\) - i-th observation,
\(\bar{y_{i}}\) - i-th prediction,
\(N\) - number of observations.

"dense"

\[e_{wrmse} = \sqrt{\frac{\sum_{i}^{N}({y_{i} - \bar{y_{i}})^2}*\frac{p_{i}}{P}}{N}}\]

where:

\(e_{rmse}\) - weighted root mean squared error,
\(y_{i}\) - i-th observation,
\(\bar{y_{i}}\) - i-th prediction,
\(p_{i}\) - number of points within i-th lag,
\(P\) - number of all points,
\(N\) - number of observations.

Examples

>>> import numpy as np
>>> from pyinterpolate.evaluate.metrics import weighted_root_mean_squared_error
>>>
>>>
>>> arr = np.array([1, 2, 3, 4, 5])
>>> preds = np.array([1, 2, 2, 5, 6])
>>> lag_dist = np.array([2, 4, 8, 16, 32])
>>> wrmse_closest = weighted_root_mean_squared_error(preds, arr, 'closest')
>>> wrmse_distant = weighted_root_mean_squared_error(preds, arr, 'distant')
>>> wrmse_dense = weighted_root_mean_squared_error(preds,
...                                                arr,
...                                                'dense',
...                                                lag_dist)
>>> print(wrmse_closest, wrmse_distant, wrmse_dense)
0.3464101615137755 0.6928203230275509 0.4250237185032414

Models evaluation#

Cross-validation#

Metrics#

This Page