Examples of spatial querying

Binder

In a first section, different examples will be shown on how to use existing webservices to setup spatial queries:

  • Filter data within a certain community using its geographic borders

  • Filter data within a geographic boundary of a feature in layer “Bekken”

  • Filter data within a geographic boundary of a feature in layer “Hydrogeologische homogene zones”

In other situations, a spatial query is need to be defined by a file (e.g. an ESRI shapefile) containing information on the particular area of interest. In a second section of this notebook, the usage of file based queries is shown.

Note: To run the file based examples, the installation of the fiona package and/or the geopandas package is required. See the installation instructions.

[56]:
%matplotlib inline
from io import BytesIO
import matplotlib.pyplot as plt

from owslib.etree import etree
from owslib.fes import PropertyIsEqualTo
from owslib.wfs import WebFeatureService

from pydov.search.boring import BoringSearch
from pydov.search.grondwaterfilter import GrondwaterFilterSearch

from pydov.util.location import (
    GmlFilter,
    Within,
    GeometryFilter,
    GeopandasFilter
)

from pydov.util.owsutil import get_namespace

# import the necessary modules (not included in the requirements of pydov!)
import folium
from folium.plugins import MarkerCluster
from pyproj import Transformer
import fiona
from fiona.crs import from_epsg
import geopandas

# convert the coordinates to lat/lon for folium
def convert_latlon(x1, y1):
    transformer = Transformer.from_crs("epsg:31370", "epsg:4326", always_xy=True)
    x2,y2 = transformer.transform(x1, y1)
    return x2, y2

# convert to bytes if necessary
def to_bytes(data):
    if isinstance(data, bytes):
        return data
    elif isinstance(data, str):
        return data.encode('utf8')

Spatial querying using web services

Filter data within a certain community using its geographic borders

This example will use a third party WFS service to retrieve to geometry of (the boundary of) a community and subsequently use this boundary as a spatial filter to query boreholes from the DOV service.

We have to start by defining the third party WFS service that contains the featuretype (layer) that contains the geometry we’re interested in:

[57]:
gemeentegrenzen = WebFeatureService(
    'https://geo.api.vlaanderen.be/VRBG/wfs',
    version='1.1.0')

In this case we know in advance which featuretype from the WFS service we need (VRBG:RefGem) but are unsure which field (attribute) we can use to get the community we want.

We can get the schema (i.e. all the available fields) of a layer to find a field to query on:

[58]:
gemeentegrenzen.get_schema('VRBG:Refgem')['properties']
[58]:
{'UIDN': 'decimal',
 'OIDN': 'decimal',
 'TERRID': 'decimal',
 'NISCODE': 'string',
 'NAAM': 'string',
 'DATPUBLBS': 'date',
 'NUMAC': 'string'}

We can now build a search query to get the feature (including geometry) of the community with the NAAM of Lievegem:

[59]:
naam_filter = PropertyIsEqualTo(propertyname='NAAM', literal='Lievegem')

By default, the feature is returned as a GML feature, which stands for Geography Markup Language and is a XML grammar used to express geographical features. We change the outputFormat to GML3.2, since pydov’s GmlFilter expects GML version 3.2.

[60]:
gemeente_gml = gemeentegrenzen.getfeature(
    typename='VRBG:Refgem',
    filter=etree.tostring(naam_filter.toXML()).decode("utf8"),
    outputFormat='text/xml; subtype=gml/3.2').read()

Next, we’ll use the GML feature inside a GmlFilter query to find all boreholes that are located Within the community borders:

[61]:
bs = BoringSearch()
df = bs.search(
    location=GmlFilter(gemeente_gml, Within),
    return_fields=('pkey_boring', 'x', 'y'))
[000/001] .

The df DataFrame now contains the pkey_boring together with the x and y coordinates:

[62]:
df.head()
[62]:
pkey_boring x y
0 https://www.dov.vlaanderen.be/data/boring/1894... 99605.0 198147.0
1 https://www.dov.vlaanderen.be/data/boring/1894... 99209.0 198139.0
2 https://www.dov.vlaanderen.be/data/boring/1894... 99103.0 198002.0
3 https://www.dov.vlaanderen.be/data/boring/1894... 98629.0 198057.0
4 https://www.dov.vlaanderen.be/data/boring/1894... 98518.0 197907.0

We can show these result on the map as well:

[63]:
df['lon'], df['lat'] = zip(*map(convert_latlon, df['x'], df['y']))

# convert to list
loclist = df[['lat', 'lon']].values.tolist()
[64]:
# initialize the Folium map on the centre of the selected locations, play with the zoom until ok
fmap = folium.Map(location=[df['lat'].mean(), df['lon'].mean()], zoom_start=10)
marker_cluster = MarkerCluster().add_to(fmap)
for loc in range(0, len(loclist)):
    folium.Marker(loclist[loc], popup=df['pkey_boring'][loc]).add_to(marker_cluster)
fmap
[64]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Filter data within a geographic boundary of a feature in layer “Bekken”

This example will use a third party WFS service to retrieve to geometry of a waterbody and subsequently use this boundary as a spatial filter to query groundwater screens from the DOV service.

We have to start by defining the third party WFS service that contains the featuretype (layer) that contains the geometry we’re interested in:

[65]:
bekkens = WebFeatureService(
    'https://geo.api.vlaanderen.be/Watersystemen/wfs',
    version='1.1.0'
)

We can list all the available featuretypes (layers) in the service if we are unsure which one to use:

[66]:
list(bekkens.contents)
[66]:
['Watersystemen:WsBekken',
 'Watersystemen:WsDeelbek',
 'Watersystemen:WsStrmgeb']

We can also get the schema (i.e. all the available fields) of a layer to find a field to query on:

[67]:
bekkens.get_schema('Watersystemen:WsBekken')['properties']
[67]:
{'OIDN': 'decimal',
 'UIDN': 'decimal',
 'BEKNR': 'short',
 'BEKNAAM': 'string',
 'STRMGEB': 'string'}

We can get all the distinct values of a specific field:

[68]:
namespace = get_namespace(bekkens, 'Watersystemen:WsBekken')

tree = etree.fromstring(to_bytes(bekkens.getfeature('Watersystemen:WsBekken', propertyname='BEKNAAM').read()))
set((i.text for i in tree.findall('.//{%s}BEKNAAM' % namespace)))
[68]:
{'Bekken van de Brugse Polders',
 'Bekken van de Gentse Kanalen',
 'Benedenscheldebekken',
 'Bovenscheldebekken',
 'Demerbekken',
 'Denderbekken',
 'Dijle- en Zennebekken',
 'IJzerbekken',
 'Leiebekken',
 'Maasbekken',
 'Netebekken'}

We can now build a search query to get the feature (including geometry) of the waterbody with the BEKNAAM of Demerbekken:

[69]:
naam_filter = PropertyIsEqualTo(propertyname='BEKNAAM', literal='Demerbekken')
[70]:
bekken_poly = bekkens.getfeature(
    typename='Watersystemen:WsBekken',
    filter=etree.tostring(naam_filter.toXML()).decode("utf8"),
    outputFormat='text/xml; subtype=gml/3.2').read()

And use this GML feature inside a GmlFilter query to find all groundwater screens that are located Within the waterbody:

[71]:
filter_search = GrondwaterFilterSearch()
df = filter_search.search(
    max_features = 100,  # Note - we limit the results here to 100 for the demo case
    location=GmlFilter(bekken_poly, Within),
    return_fields=('pkey_filter', 'x', 'y')
)
[000/001] .
[72]:
df.head()
[72]:
pkey_filter x y
0 https://www.dov.vlaanderen.be/data/filter/1990... 185017.0 179983.0
1 https://www.dov.vlaanderen.be/data/filter/1991... 185334.0 179400.0
2 https://www.dov.vlaanderen.be/data/filter/1992... 185266.0 179615.0
3 https://www.dov.vlaanderen.be/data/filter/1993... 184583.0 179916.0
4 https://www.dov.vlaanderen.be/data/filter/1993... 184818.0 179818.0

We can show the result on the map:

[73]:
df['lon'], df['lat'] = zip(*map(convert_latlon, df['x'], df['y']))
# convert to list
loclist = df[['lat', 'lon']].values.tolist()
[74]:
# initialize the Folium map on the centre of the selected locations, play with the zoom until ok
fmap = folium.Map(location=[df['lat'].mean(), df['lon'].mean()], zoom_start=10)
marker_cluster = MarkerCluster().add_to(fmap)
for loc in range(0, len(loclist)):
    folium.Marker(loclist[loc], popup=df['pkey_filter'][loc]).add_to(marker_cluster)
fmap
[74]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Filter data within a geographic boundary of a feature in layer “HHZ”

This example will use the DOV WFS service to retrieve to geometry of a HHZ (hydrogeological homogenous area) and subsequently use this boundary as a spatial filter to query groundwater screens from the DOV service.

We can query the HHZ layer from DOV using pydov:

[75]:
from pydov.search.generic import WfsSearch

hhz = WfsSearch('gw_varia:hhz')

Next we can get the field of the layer to find a field to query on:

[76]:
hhz.get_fields()
[76]:
{'id': {'name': 'id',
  'definition': None,
  'type': 'string',
  'notnull': False,
  'query': True,
  'cost': 1},
 'hhz_naam': {'name': 'hhz_naam',
  'definition': None,
  'type': 'string',
  'notnull': False,
  'query': True,
  'cost': 1},
 'opp_m2': {'name': 'opp_m2',
  'definition': None,
  'type': 'integer',
  'notnull': False,
  'query': True,
  'cost': 1},
 'hhz': {'name': 'hhz',
  'definition': None,
  'type': 'string',
  'notnull': False,
  'query': True,
  'cost': 1},
 'geom': {'name': 'geom',
  'definition': None,
  'type': 'geometry',
  'notnull': False,
  'query': False,
  'cost': 1}}

We can now build a search query to get the feature where hhz_naam equals Formatie van Mol:

[77]:
from owslib.fes2 import PropertyIsEqualTo as PropertyIsEqualTo2

naam_filter = PropertyIsEqualTo2(
    propertyname='hhz_naam', literal='Formatie van Mol')
[78]:
hhz_poly = hhz.search(query=naam_filter, return_fields=('id', 'hhz_naam', 'geom'))
hhz_poly
[000/001] .
[78]:
id hhz_naam geom
0 2 Formatie van Mol MULTIPOLYGON (((203474.0966 223036.6511, 20424...

And turn this result into a GeoPandas GeoDataFrame:

[79]:
hhz_poly = geopandas.GeoDataFrame(hhz_poly, geometry='geom', crs='EPSG:31370')
hhz_poly
[79]:
id hhz_naam geom
0 2 Formatie van Mol MULTIPOLYGON (((203474.097 223036.651, 204245....

And subsequently use this geometry in a search for groundwater screens:

[80]:
from pydov.types.fields import GeometryReturnField

filter_search = GrondwaterFilterSearch()
df = filter_search.search(
    max_features = 100,  # Note - we limit the results here to 100 for the demo case
    location=GeopandasFilter(hhz_poly, Within),
    return_fields=('pkey_filter', GeometryReturnField('geom', 4326))
)
df
[000/001] .
[80]:
pkey_filter geom
0 https://www.dov.vlaanderen.be/data/filter/2019... POINT Z (5.1536 51.2177 0)
1 https://www.dov.vlaanderen.be/data/filter/2019... POINT Z (5.1536 51.2178 0)
2 https://www.dov.vlaanderen.be/data/filter/2019... POINT Z (5.1453 51.2135 0)
3 https://www.dov.vlaanderen.be/data/filter/2019... POINT Z (5.1531 51.2133 0)
4 https://www.dov.vlaanderen.be/data/filter/2019... POINT Z (5.1373 51.2176 0)
... ... ...
95 https://www.dov.vlaanderen.be/data/filter/2023... POINT Z (5.1379 51.2183 0)
96 https://www.dov.vlaanderen.be/data/filter/2023... POINT Z (5.0593 51.2267 0)
97 https://www.dov.vlaanderen.be/data/filter/2023... POINT Z (5.134 51.219 0)
98 https://www.dov.vlaanderen.be/data/filter/2023... POINT Z (5.1462 51.2699 0)
99 https://www.dov.vlaanderen.be/data/filter/1900... POINT Z (5.096 51.2891 0)

100 rows × 2 columns

And plot the results on a map:

[81]:
loclist = [[point.xy[1][0], point.xy[0][0]] for point in df.geom]
[82]:
# initialize the Folium map on the centre of the selected locations, play with the zoom until ok
fmap = folium.Map(zoom_start=12)
marker_cluster = MarkerCluster().add_to(fmap)

# add filters
for loc in range(0, len(loclist)):
    folium.Marker(loclist[loc], popup=df['pkey_filter'][loc]).add_to(marker_cluster)

# add HHZ polygon
hhz_poly_series = geopandas.GeoSeries(hhz_poly["geom"]).simplify(tolerance=0.001)
hhz_poly_json = hhz_poly_series.to_json()
hhz_poly_json = folium.GeoJson(data=hhz_poly_series, style_function=lambda x: {"fillColor": "orange"})
hhz_poly_json.add_to(fmap)

fmap.fit_bounds(fmap.get_bounds())
fmap
[82]:
Make this Notebook Trusted to load map: File -> Trust Notebook

File based spatial querying

Using a vector file

Intead of using a webservice, we can also use a file as input to define a spatial query. The pydov package provides the GeometryFilter to convert a vector file (e.g. ESRI shape file) into a GML file which can be used in the same way as the previous examples. In this case, we want all the boringen within the polygons defined in the vector file:

We have an example shape file prepared here, but this could be any other shape file as well:

[83]:
shapefile = "../../tests/data/util/location/polygon_multiple_31370.shp"
[84]:
df = bs.search(
    location=GeometryFilter(shapefile, Within),
    return_fields=('pkey_boring', 'x', 'y'))
[000/001] .

Fiona is used to do the data transformation, which is why this is an additional dependency when using the GeometryFilter functionality.

The resulting data points can be represented on a map:

[85]:
df['lon'], df['lat'] = zip(*map(convert_latlon, df['x'], df['y']))
# convert to list
loclist = df[['lat', 'lon']].values.tolist()
[86]:
# initialize the Folium map on the centre of the selected locations, play with the zoom until ok
fmap = folium.Map(location=[df['lat'].mean(), df['lon'].mean()], zoom_start=10)
marker_cluster = MarkerCluster().add_to(fmap)
for loc in range(0, len(loclist)):
    folium.Marker(loclist[loc], popup=df['pkey_boring'][loc]).add_to(marker_cluster)
fmap
[86]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Interact with GeoPandas

In the previous example, the vector file as available on disk was used to create the spatial query. However, when working on geospatial vector data in Python the usage of GeoPandas is convenient as it combines the data table functionalities of the Pandas package with spatial functionalities. GeoPandas also relies on fiona to read vector files, so using GeoPandas DataFrames can be used to define spatial filters as well.

[87]:
import geopandas as gpd

shapefile = "../../tests/data/util/location/polygon_multiple_31370.shp"

# User reads geometry-file from disk (shapefile, geojson,...)
gdf = gpd.read_file(shapefile)
gdf["name"] = ["site 1", "site 2"]  # add dummy names to each of the polygones in the vector file for the demo case
gdf
[87]:
gml_id geometry name
0 polygon_multiple_31370.0 POLYGON ((108636.150 194960.844, 109195.574 19... site 1
1 polygon_multiple_31370.1 POLYGON ((107485.786 196741.544, 108297.344 19... site 2

Using the GeoDataFrame directly to define a spatial query:

[88]:
my_filter = GeopandasFilter(gdf, Within)

bs = BoringSearch()
df = bs.search(
    location=my_filter,
    return_fields=('pkey_boring', 'x', 'y'))
[000/001] .
[89]:
df
[89]:
pkey_boring x y
0 https://www.dov.vlaanderen.be/data/boring/2018... 108025.00 196593.00
1 https://www.dov.vlaanderen.be/data/boring/2019... 107947.29 196640.52
2 https://www.dov.vlaanderen.be/data/boring/2020... 107991.00 196706.00
3 https://www.dov.vlaanderen.be/data/boring/2022... 107842.53 196371.08
4 https://www.dov.vlaanderen.be/data/boring/2023... 108144.95 196771.09
5 https://www.dov.vlaanderen.be/data/boring/1893... 108900.00 194425.00
6 https://www.dov.vlaanderen.be/data/boring/1894... 107618.00 196709.00
7 https://www.dov.vlaanderen.be/data/boring/1894... 107791.00 196516.00
8 https://www.dov.vlaanderen.be/data/boring/1895... 109050.00 194990.00
9 https://www.dov.vlaanderen.be/data/boring/1895... 108742.00 194936.00

A user can do all sort of spatial and non-spatial operations in Geopandas. For example, one is only interested in the Boreholes of site 1 and selects this polygon only:

Using the GeopandasFilter, we can use this to create a spatial query:

[90]:
gdf_subselection = gdf[gdf["name"] == "site 1"]

As long as the result is a GeoDataFrame with a crs defined, it can be used as input to define a spatial filter:

[91]:
my_filter = GeopandasFilter(gdf_subselection, Within)

bs = BoringSearch()
df = bs.search(
    location=my_filter,
    return_fields=('pkey_boring', 'x', 'y'))
[000/001] .

The returned set of boreholes is now limited to Site 1:

[92]:
df
[92]:
pkey_boring x y
0 https://www.dov.vlaanderen.be/data/boring/1893... 108900.0 194425.0
1 https://www.dov.vlaanderen.be/data/boring/1895... 109050.0 194990.0
2 https://www.dov.vlaanderen.be/data/boring/1895... 108742.0 194936.0

Note: To select a single row (i.e. feature) of a GeoDataFrame, make sure the result is still a valid GeoDataFrame:

[93]:
gdf.iloc[[0]]
[93]:
gml_id geometry name
0 polygon_multiple_31370.0 POLYGON ((108636.150 194960.844, 109195.574 19... site 1

instead of:

[94]:
gdf.iloc[0]
[94]:
gml_id                               polygon_multiple_31370.0
geometry    POLYGON ((108636.150020818 194960.844295764, 1...
name                                                   site 1
Name: 0, dtype: object

The latter won’t be a valid input to define a spatial query.