{ "cells": [ { "metadata": {}, "cell_type": "markdown", "source": [ "# Spatial Dependency Index\n", "\n", "In this tutorial, we will learn how to estimate the **spatial dependency index**. Algorithm is based on the work:\n", "\n", "> [1] CAMBARDELLA, C.A.; MOORMAN, T.B.; PARKIN, T.B.; KARLEN, D.L.; NOVAK, J.M.; TURCO, R.F.; KONOPKA, A.E. Field-scale variability of soil properties in central Iowa soils. Soil Science Society of America Journal, v. 58, n. 5, p. 1501-1511, 1994.\n", "\n", "## Prerequisites\n", "\n", "- **Domain**:\n", " - semivariance and covariance functions\n", "- **Package**:\n", " - `TheoreticalVariogram`\n", "- **Programming**:\n", " - Python basics\n", "\n", "## Table of contents\n", "\n", "1. What is the spatial dependency index?\n", "2. Why do we use spatial dependency index?\n", "3. Example: The comparison of different Spatial Dependence Index over the same extent but different elements.\n", "4. API links.\n", "\n", "## What is the spatial dependency index?\n", "\n", "The spatial dependency index (SDI) measures the strength of a spatial process we are modeling. SDI is normalized to the interval between 0 and 1. Therefore, we can transform it into percentages and assign an order of spatial dependency from weak to strong.\n", "\n", "The SDI is a ratio of the nugget to the total variance (sill) of a model:\n", "\n", "$$SDI = \\frac{nugget}{sill} * 100$$\n", "\n", "Whenever we fit a theoretical variogram with the pyinterpolate package, SDI is calculated, and we will take advantage of it in the examples. Two values represent SDI:\n", "\n", "- **numeric**, a ratio of nugget and sill in percent,\n", "- **categorical**, a description of a spatial dependency strength.\n", "\n", "There are four levels of spatial dependency.\n", "\n", "| Lower Limit (included) | Upper Limit (excluded) | Strength |\n", "|------------------------------|------------------------|-----------------------|\n", "| 0 | 25 | strong |\n", "| 25 | 75 | moderate |\n", "| 75 | 95 | weak |\n", "| 95 | inf | no spatial dependence |\n", "\n", "**The lower the ratio, the more substantial is spatial dependence**. If the ratio is greater than 75 percent, we should be cautious with spatial modeling because spatial similarities at analysed scale may not explain the process.\n", "\n", "\n" ], "id": "6e96876692e3fa89" }, { "metadata": {}, "cell_type": "markdown", "source": [ "## 2. Why do we use Spatial Dependency Index?\n", "\n", "In a world where Tobler’s Law can be applied to every spatial phenomenon, we might always use kriging without consideration. We know that spatial dependence exists, and close neighbors are always similar.\n", "\n", "This world is not our world! Not every process follows Tobler’s Law. We can find different elements and chemical compounds sampled over the same area and in the same scale, but their concentrations might be distributed randomly, without any spatial dependency. The example can be seen in publication [1].\n", "\n", "The spatial dependence index level is the first indicator that we can use to decide what to do next with a spatial dataset:\n", "\n", "- Strong: just krige it!\n", "- Moderate: there might be some other thing that explains process variation.\n", "- Weak: the other non-spatial process has more influence on data than spatial similarities.\n", "- No spatial dependence: the process is random, or spatial dependencies cannot explain variance.\n", "\n", "**Note**: Be careful! The last two points are red flags, BUT sometimes processes with a low variogram variability at one scale may be explained with spatial relations at a changed scale. A practical example is a comparison of rental apartment pricing: if you look at the scale of hundreds of kilometers - multiple cities within a country, the spatial dependence may be very weak or none. On the other hand, the spatial similarity between prices of apartments close to each other (up to 10 kilometers, 6 miles) tends to show a classic variogram curve. The reason is simple: most managers and algorithms use information about pricing of the closest neighbors, and neighborhood prices are affected by the same external objects or events." ], "id": "242242c5dc49ca8b" }, { "metadata": {}, "cell_type": "markdown", "source": [ "## 3. Example: Spatial Dependence over the same study extent but for different elements\n", "\n", "We will compare the spatial dependence index of four elements: cadmium, copper, lead, and zinc. We use the meuse dataset. The dataset comes from:\n", "\n", "```Pebesma, Edzer. (2009). The meuse data set: a tutorial for the gstat R package -> [link to the publication](https://cran.r-project.org/web/packages/gstat/vignettes/gstat.pdf)```" ], "id": "4600b58224287df6" }, { "metadata": { "ExecuteTime": { "end_time": "2025-12-21T14:56:55.949429Z", "start_time": "2025-12-21T14:56:52.665312Z" } }, "cell_type": "code", "source": [ "import numpy as np\n", "import pandas as pd\n", "import pyinterpolate as ptp" ], "id": "2341aa78e62d7ccb", "outputs": [], "execution_count": 1 }, { "metadata": { "ExecuteTime": { "end_time": "2025-12-21T14:56:56.119320Z", "start_time": "2025-12-21T14:56:56.115842Z" } }, "cell_type": "code", "source": [ "MEUSE_FILE = '../data/meuse/meuse.csv'\n", "\n", "# Variogram parameters\n", "STEP_SIZE = 100\n", "MAX_RANGE = 1600\n", "ALLOWED_MODELS = 'safe'\n", "\n", "\n", "# Elements\n", "ELEMENTS = ['cadmium', 'copper', 'zinc', 'lead']\n", "COLS = ['x', 'y']\n", "COLS.extend(ELEMENTS)" ], "id": "2114c4c981cd72f0", "outputs": [], "execution_count": 2 }, { "metadata": { "ExecuteTime": { "end_time": "2025-12-21T14:56:56.164797Z", "start_time": "2025-12-21T14:56:56.141165Z" } }, "cell_type": "code", "source": [ "df = pd.read_csv(MEUSE_FILE, usecols=COLS)\n", "df.head()" ], "id": "7efad3523a0a4ee5", "outputs": [ { "data": { "text/plain": [ " x y cadmium copper lead zinc\n", "0 181072 333611 11.7 85 299 1022\n", "1 181025 333558 8.6 81 277 1141\n", "2 181165 333537 6.5 68 199 640\n", "3 181298 333484 2.6 81 116 257\n", "4 181307 333330 2.8 48 117 269" ], "text/html": [ "
| \n", " | x | \n", "y | \n", "cadmium | \n", "copper | \n", "lead | \n", "zinc | \n", "
|---|---|---|---|---|---|---|
| 0 | \n", "181072 | \n", "333611 | \n", "11.7 | \n", "85 | \n", "299 | \n", "1022 | \n", "
| 1 | \n", "181025 | \n", "333558 | \n", "8.6 | \n", "81 | \n", "277 | \n", "1141 | \n", "
| 2 | \n", "181165 | \n", "333537 | \n", "6.5 | \n", "68 | \n", "199 | \n", "640 | \n", "
| 3 | \n", "181298 | \n", "333484 | \n", "2.6 | \n", "81 | \n", "116 | \n", "257 | \n", "
| 4 | \n", "181307 | \n", "333330 | \n", "2.8 | \n", "48 | \n", "117 | \n", "269 | \n", "