Models evaluation#
Cross-validation#
- pyinterpolate.evaluate.cross_validation.validate_kriging(theoretical_model: TheoreticalVariogram, points: ArrayLike = None, values: ArrayLike = None, geometries: ArrayLike = None, how: str = 'ok', neighbors_range: float | None = None, no_neighbors: int = 4, use_all_neighbors_in_range=False, sk_mean: float | None = None, allow_approximate_solutions=False, progress_bar: bool = True) Tuple[float, float, ndarray][source]
Function performs cross-validation of kriging models.
- Parameters:
- theoretical_modelTheoreticalVariogram
Fitted variogram model.
- pointsArrayLike, optional
Known points and their values
[x, y, value].- valuesArrayLike, optional
Observation in the i-th geometry (from
geometries). Optional parameter, if not given thenpointsmust be provided.- geometriesArrayLike, optional
Array or similar structure with geometries. It must have the same length as
values. Optional parameter, if not given thenpointsmust be provided. Point type geometry.- howstr, default=’ok’
Select what kind of kriging you want to perform
‘ok’: ordinary kriging,
‘sk’: simple kriging - if it is set then
sk_meanparameter must be provided.
- neighbors_rangefloat, default=None
The maximum distance where we search for neighbors. If
Noneis given then range is selected from thetheoretical_modelrangattribute.- no_neighborsint, default = 4
The number of the n-closest neighbors used for interpolation.
- use_all_neighbors_in_rangebool, default = False
True: if the number of neighbors within theneighbors_rangeis greater than thenumber_of_neighborsparameter then use all neighbors, do not clip their number.- sk_meanfloat, default=None
The mean value of a process over a study area. Should be known before processing. That’s why Simple Kriging has a limited number of applications. You must have multiple samples and well-known area to know this parameter.
- allow_approximate_solutionsbool, default=False
Allows the approximation of kriging weights based on the OLS algorithm. We don’t recommend set it to
Trueif you don’t know what are you doing. This parameter can be useful when you have clusters in your dataset, that can lead to singular or near-singular matrix creation.- progress_barbool, default=True
Show process status.
- Returns:
- : Tuple
Function returns tuple with:
Mean Prediction Error,
Mean Kriging Error: ratio of variance of prediction errors to the average variance error of kriging,
array with:
[coordinate x, coordinate y, prediction error, kriging estimate error, z-value, z-ci-min, z-ci-max]
References
Clark, I., (2004), The Art of Cross Validation in Geostatistical Applications
Clark I., (1979), Does Geostatistics Work, Proc. 16th APCOM, pp.213.-225.
Examples
>>> from pyinterpolate import ( ... ExperimentalVariogram, ... validate_kriging, ... TheoreticalVariogram ... ) >>> >>> >>> POINTS_DATA = ... # load dataset >>> POINTS_VARIOGRAM = ExperimentalVariogram(POINTS_DATA, ... step_size=1, ... max_range=6) >>> THEORETICAL_MODEL = TheoreticalVariogram() >>> THEORETICAL_MODEL.autofit(experimental_variogram=POINTS_VARIOGRAM, ... models_group='linear', ... nugget=0.0) >>> validation_results = validate_kriging( ... theoretical_model=THEORETICAL_MODEL, ... values=POINTS_DATA[:, -1], ... geometries=POINTS_DATA[:, :-1], ... no_neighbors=4, ... progress_bar=False ... ) >>> print(validation_results[0]) # mean prediction error -0.01613441673494531 >>> print(validation_results[1]) # mean kriging error 1.6386630811210166
Metrics#
- pyinterpolate.evaluate.metrics.forecast_bias(predicted_array: ndarray, real_array: ndarray) float[source]
Function calculates forecast bias of prediction.
- Parameters:
- predicted_arraynumpy array
Predicted values.
- real_arraynumpy array
Real observations.
- Returns:
- fbfloat
Forecast Bias of prediction.
Notes
- How do we interpret forecast bias? Here are two important properties:
Large positive value means that our observations are usually higher than prediction. Our model underestimates predictions.
Large negative value tells us that our predictions are usually higher than expected values. Our model overestimates predictions.
Equation:
\[e_{fb} = \frac{\sum_{i}^{N}{y_{i} - \bar{y_{i}}}}{N}\]- where:
\(e_{fb}\) - forecast bias,
\(y_{i}\) - i-th observation,
\(\bar{y_{i}}\) - i-th prediction,
\(N\) - number of observations.
Examples
>>> import numpy as np >>> from pyinterpolate.evaluate.metrics import forecast_bias >>> >>> >>> arr = np.array([1, 2, 3, 4, 5]) >>> preds = np.array([1, 2, 2, 5, 6]) >>> bias = forecast_bias(preds, arr) >>> print(bias) -0.2
- pyinterpolate.evaluate.metrics.mean_absolute_error(predicted_array: ndarray, real_array: ndarray) float[source]
Function calculates Mean Absolute Error (MAE) of prediction.
- Parameters:
- predicted_arraynumpy array
Predicted values.
- real_arraynumpy array
Observations.
- Returns:
- maefloat
Mean absolute Error (MAE) of prediction.
- Equation:
\[e_{mae} = \frac{\sum_{i}^{N}{|y_{i} - \bar{y_{i}}|}}{N} ..\]- where:
\(e_{mae}\) - mean absolute error,
\(y_{i}\) - i-th observation,
\(\bar{y_{i}}\) - i-th prediction,
\(N\) - number of observations.
Examples
>>> import numpy as np >>> from pyinterpolate.evaluate.metrics import mean_absolute_error >>> >>> >>> arr = np.array([1, 2, 3, 4, 5]) >>> preds = np.array([1, 2, 2, 5, 6]) >>> err = mean_absolute_error(preds, arr) >>> print(err) 0.6
- pyinterpolate.evaluate.metrics.root_mean_squared_error(predicted_array: ndarray, real_array: ndarray) float[source]
Function calculates Root Mean Squared Error of predictions.
- Parameters:
- predicted_arraynumpy array
Predictions.
- real_arraynumpy array
Observations.
- Returns:
- rmsefloat
Root Mean Squared Error of prediction.
Notes
Equation:
\[e_{rmse} = \sqrt{\frac{\sum_{i}^{N}({y_{i} - \bar{y_{i}})^2}}{N}}\]- where:
\(e_{rmse}\) - root mean squared error,
\(y_{i}\) - i-th observation,
\(\bar{y_{i}}\) - i-th prediction,
\(N\) - number of observations.
Examples
>>> import numpy as np >>> from pyinterpolate.evaluate.metrics import root_mean_squared_error >>> >>> >>> arr = np.array([1, 2, 3, 4, 5]) >>> preds = np.array([1, 2, 2, 5, 6]) >>> rmse = root_mean_squared_error(preds, arr) >>> print(rmse) 0.7745966692414834
- pyinterpolate.evaluate.metrics.symmetric_mean_absolute_percentage_error(predicted_array: ndarray, real_array: ndarray, check_undefined=True) float[source]
Function calculates Symmetric Mean Absolute Percentage Error (SMAPE) of predictions, allowing researcher to compare different models.
- Parameters:
- predicted_arraynumpy array
Predictions.
- real_arraynumpy array
Observations
- check_undefinedbool, default = True
Check if there are cases when prediction and observation are equal to 0.
- Returns:
- smapefloat
Symmetric Mean Absolute Percentage Error.
- Warns:
- UndefinedSMAPEWarning
Observation and prediction are equal to 0 - SMAPE of this pair is undefined, algorithm assumes that SMAPE equals to 0.
Notes
Symmetric Mean Absolute Percentage Error is an accuracy measure that returns prediction error in percent. It is a relative evaluation metric. It shouldn’t be used alone because SMAPE penalizes more underforecasting, thus it should be compared to Forecast Bias to have a full view of the model properties. SMAPE is better than RMSE or FB for comparing multiple models and algorithms.
More about SMAPE here: https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error
Equation:
\[e_{smape} = \frac{100}{N} \sum_{i}^{N}{\frac{|\bar{y_{i}} - y_{i}|}{|y_{i}|+|\bar{y_{i}}|}}\]- where:
\(e_{smape}\) - symmetric mean absolute percentage error,
\(y_{i}\) - i-th observation,
\(\bar{y_{i}}\) - i-th prediction,
\(N\) - number of observations.
Examples
>>> import numpy as np >>> from pyinterpolate.evaluate.metrics import symmetric_mean_absolute_percentage_error >>> >>> >>> arr = np.array([1, 2, 3, 4, 5]) >>> preds = np.array([1, 2, 2, 5, 6]) >>> smape = symmetric_mean_absolute_percentage_error(preds, arr) >>> print(smape) 8.040404040404042
- pyinterpolate.evaluate.metrics.weighted_root_mean_squared_error(predicted_array: ndarray, real_array: ndarray, weighting_method: str, lag_points_distribution=None) float[source]
Function weights RMSE of each lag by a specific weighting factor.
- Parameters:
- predicted_arraynumpy array
Predictions.
- real_arraynumpy array
Observations.
- weighting_methodstr
The name of a method used to weight error at a given lags. Available methods: - closest: lags at a close range have greater weights, - distant: lags that are further away have greater weights, - dense: error is weighted by the number of point pairs within a lag.
- lag_points_distributionnumpy array, optional
Number of points pairs per lag.
- Returns:
- wrmsefloat
Weighted Root Mean Squared Error.
- Raises:
- AttributeError
The
lag_points_distributionparameter is undefined when “dense” method is set.
Notes
Error weighting is a useful in the case when we want to force semivariogram to better represent semivariances at specific ranges. The most popular is the
"closest"method - we create model that fits better semivariogram at a close distances.Equations:
"closest"\[e_{wrmse} = \sqrt{\frac{\sum_{i}^{N}({y_{i} - \bar{y_{i}})^2}*\frac{N-i}{N}}{N}}\]- where:
\(e_{rmse}\) - weighted root mean squared error,
\(i\) - lag, i > 0,
\(y_{i}\) - i-th observation,
\(\bar{y_{i}}\) - i-th prediction,
\(N\) - number of observations.
"distant"\[e_{wrmse} = \sqrt{\frac{\sum_{i}^{N}({y_{i} - \bar{y_{i}})^2}*\frac{i}{N}}{N}}\]- where:
\(e_{rmse}\) - weighted root mean squared error,
\(i\) - lag, i > 0,
\(y_{i}\) - i-th observation,
\(\bar{y_{i}}\) - i-th prediction,
\(N\) - number of observations.
"dense"\[e_{wrmse} = \sqrt{\frac{\sum_{i}^{N}({y_{i} - \bar{y_{i}})^2}*\frac{p_{i}}{P}}{N}}\]- where:
\(e_{rmse}\) - weighted root mean squared error,
\(y_{i}\) - i-th observation,
\(\bar{y_{i}}\) - i-th prediction,
\(p_{i}\) - number of points within i-th lag,
\(P\) - number of all points,
\(N\) - number of observations.
Examples
>>> import numpy as np >>> from pyinterpolate.evaluate.metrics import weighted_root_mean_squared_error >>> >>> >>> arr = np.array([1, 2, 3, 4, 5]) >>> preds = np.array([1, 2, 2, 5, 6]) >>> lag_dist = np.array([2, 4, 8, 16, 32]) >>> wrmse_closest = weighted_root_mean_squared_error(preds, arr, 'closest') >>> wrmse_distant = weighted_root_mean_squared_error(preds, arr, 'distant') >>> wrmse_dense = weighted_root_mean_squared_error(preds, ... arr, ... 'dense', ... lag_dist) >>> print(wrmse_closest, wrmse_distant, wrmse_dense) 0.3464101615137755 0.6928203230275509 0.4250237185032414