Example of XML caching for pydov

Binder

Introduction

To speed up subsequent queries involving similar data, pydov uses a caching mechanism where raw DOV XML data is cached locally for later reuse. For regular usage of the package and data requests, the cache will be a convenient feature speeding up the time for subsequent queries. However, in case you want to alter the configuration or cache handling, this notebook illustrates some use cases on the cache handling.

Use cases:

  • Check cached files

  • Speed up subsequent queries

  • Disabling the cache

  • Changing the location of cached data

  • Changing the maximum age of cached data

  • Cleaning the cache

[1]:
# check pydov path
import warnings; warnings.simplefilter('ignore')
import pydov

Use cases

Check cached files

[2]:
from pydov.search.boring import BoringSearch
boring = BoringSearch()

The pydov.cache.cachedir defines the directory on the file system used to cache DOV files:

[3]:
# check the cache dir
import os
import pydov.util.caching
cachedir = pydov.cache.cachedir
print(cachedir)
print('directories: ', os.listdir(cachedir))
/tmp/pydov
directories:  ['filter', 'boring', 'sondering', 'grondmonster']

Speed up subsequent queries

To illustrate the convenience of the caching during subsequent data requests, consider the following request, while measuring the time:

[4]:
from pydov.util.location import Within, Box

# Get all borehole data in a bounding box (llx, llxy, ulx, uly) and timeit
%time df = boring.search(location=Within(Box(150145, 205030, 155150, 206935)))
[000/001] .
[000/255] ..................................................
[050/255] ..................................................
[100/255] ..................................................
[150/255] ..................................................
[200/255] ..................................................
[250/255] .....
CPU times: user 2.86 s, sys: 241 ms, total: 3.11 s
Wall time: 30.3 s
[5]:
# The structure of cachedir implies a separate directory for each query type, since permalinks are not unique across types
# In this example 'boring' will be queried, therefore list xmls in the cache of the 'boring' type
# list files present
print('number of files: ', len(os.listdir(os.path.join(pydov.cache.cachedir, 'boring'))))
print('files present: ', os.listdir(os.path.join(pydov.cache.cachedir, 'boring')))
number of files:  736
files present:  ['2023-206524.xml.gz', '1986-059816.xml.gz', '1966-068248.xml.gz', '2021-196430.xml.gz', '1890-111945.xml.gz', '2023-203136.xml.gz', '2016-141571.xml.gz', '2019-166049.xml.gz', '2022-195501.xml.gz', '2018-170389.xml.gz', '1928-103238.xml.gz', '1882-112987.xml.gz', '2020-169408.xml.gz', '2021-181731.xml.gz', '2020-172695.xml.gz', '1986-005598.xml.gz', '1986-005594.xml.gz', '2019-168260.xml.gz', '2022-201616.xml.gz', '2022-191134.xml.gz', '2020-176580.xml.gz', '2019-168265.xml.gz', '2016-134350.xml.gz', '2022-191184.xml.gz', '1961-068369.xml.gz', '1970-061366.xml.gz', '1952-068355.xml.gz', '2023-203141.xml.gz', '2020-175375.xml.gz', '1936-087893.xml.gz', '2022-191158.xml.gz', '2016-134368.xml.gz', '2021-196427.xml.gz', '2020-175376.xml.gz', '1970-061365.xml.gz', '1988-091630.xml.gz', '1890-111943.xml.gz', '2022-191151.xml.gz', '1879-121412.xml.gz', '2023-203143.xml.gz', '2022-199618.xml.gz', '1970-061443.xml.gz', '1891-091514.xml.gz', '2023-202562.xml.gz', '1936-103122.xml.gz', '1934-068197.xml.gz', '2016-142109.xml.gz', '1976-015780.xml.gz', '2022-191191.xml.gz', '1894-121258.xml.gz', '2022-200455.xml.gz', '2022-191154.xml.gz', '1889-113047.xml.gz', '2021-186963.xml.gz', '1890-113075.xml.gz', '2020-174745.xml.gz', '1986-005596.xml.gz', '1940-068202.xml.gz', '1891-091509.xml.gz', '1970-061442.xml.gz', '2021-184635.xml.gz', '2021-181736.xml.gz', '1938-112188.xml.gz', '2022-191726.xml.gz', '1986-091461.xml.gz', '1908-112155.xml.gz', '1952-002633.xml.gz', '2018-167528.xml.gz', '1941-068352.xml.gz', '2022-191721.xml.gz', '2018-170390.xml.gz', '1924-087778.xml.gz', '1974-010351.xml.gz', '1969-033216.xml.gz', '1959-091378.xml.gz', '1936-094599.xml.gz', '2022-191181.xml.gz', '2022-199273.xml.gz', '2023-206425.xml.gz', '1968-094528.xml.gz', '2022-191164.xml.gz', '1929-091562.xml.gz', '1935-091406.xml.gz', '2016-147823.xml.gz', '2020-172694.xml.gz', '1882-113024.xml.gz', '1935-091232.xml.gz', '2021-196423.xml.gz', '2017-148854.xml.gz', '1924-087766.xml.gz', '2020-168267.xml.gz', '1890-111829.xml.gz', '1891-113078.xml.gz', '1882-112986.xml.gz', '2022-191725.xml.gz', '1969-033208.xml.gz', '1970-018762.xml.gz', '1924-087777.xml.gz', '1985-084552.xml.gz', '2017-152860.xml.gz', '1923-091570.xml.gz', '2021-185819.xml.gz', '1891-113079.xml.gz', '2022-198226.xml.gz', '2020-177881.xml.gz', '1945-121832.xml.gz', '1953-121327.xml.gz', '1952-113085.xml.gz', '2022-191189.xml.gz', '2021-183848.xml.gz', '2023-207903.xml.gz', '2021-196419.xml.gz', '1957-121836.xml.gz', '2021-181737.xml.gz', '1970-061454.xml.gz', '1909-109621.xml.gz', '1891-091530.xml.gz', '2019-160294.xml.gz', '2022-191717.xml.gz', '1890-111948.xml.gz', '2022-191718.xml.gz', '2020-175374.xml.gz', '1879-121292.xml.gz', '2021-186450.xml.gz', '1923-087753.xml.gz', '2022-199619.xml.gz', '1959-094557.xml.gz', '1946-091485.xml.gz', '2022-195502.xml.gz', '2019-168261.xml.gz', '1927-094597.xml.gz', '2021-181730.xml.gz', '1879-121293.xml.gz', '2020-175766.xml.gz', '1881-112999.xml.gz', '1924-087807.xml.gz', '1891-113033.xml.gz', '2022-199051.xml.gz', '1895-091534.xml.gz', '1934-091575.xml.gz', '2022-198227.xml.gz', '2023-203134.xml.gz', '1966-033099.xml.gz', '2022-200938.xml.gz', '2019-168264.xml.gz', '2021-180997.xml.gz', '2021-191079.xml.gz', '1928-091572.xml.gz', '1973-068234.xml.gz', '2022-189875.xml.gz', '1935-087854.xml.gz', '2016-142141.xml.gz', '2022-189945.xml.gz', '2021-196418.xml.gz', '2022-193074.xml.gz', '2020-176579.xml.gz', '2022-191190.xml.gz', '1971-094532.xml.gz', '1927-068216.xml.gz', '2021-182867.xml.gz', '1924-087806.xml.gz', '1969-033213.xml.gz', '1895-121247.xml.gz', '1969-033211.xml.gz', '2022-189876.xml.gz', '2021-187046.xml.gz', '2021-196431.xml.gz', '1879-121809.xml.gz', '1909-087707.xml.gz', '1891-091471.xml.gz', '1969-092689.xml.gz', '1890-091504.xml.gz', '1957-091543.xml.gz', '1995-103203.xml.gz', '2023-203144.xml.gz', '1966-068241.xml.gz', '1970-061450.xml.gz', '2023-206359.xml.gz', '1890-091529.xml.gz', '1891-091516.xml.gz', '2022-191862.xml.gz', '2008-065101.xml.gz', '2022-191155.xml.gz', '1969-033220.xml.gz', '1976-094607.xml.gz', '1932-091250.xml.gz', '1964-087972.xml.gz', '2023-206426.xml.gz', '1952-091336.xml.gz', '1891-091597.xml.gz', '1905-111901.xml.gz', '2020-172700.xml.gz', '1952-088151.xml.gz', '1924-087804.xml.gz', '2021-196420.xml.gz', '2021-188778.xml.gz', '1946-087912.xml.gz', '1952-094522.xml.gz', '2022-191188.xml.gz', '1953-121361.xml.gz', '1969-092687.xml.gz', '1891-091512.xml.gz', '1928-091244.xml.gz', '1913-068213.xml.gz', '1924-087772.xml.gz', '1923-090650.xml.gz', '2022-191855.xml.gz', '2018-155580.xml.gz', '1971-068395.xml.gz', '1978-099973.xml.gz', '2020-172704.xml.gz', '1922-092796.xml.gz', '2022-191153.xml.gz', '1961-091576.xml.gz', '1926-091211.xml.gz', '1970-061364.xml.gz', '1973-104727.xml.gz', '1929-091574.xml.gz', '2023-206203.xml.gz', '1995-103205.xml.gz', '1909-109622.xml.gz', '1938-094603.xml.gz', '1932-091539.xml.gz', '1879-121387.xml.gz', '2023-206413.xml.gz', '1978-068205.xml.gz', '1890-091528.xml.gz', '1923-121200.xml.gz', '1891-091472.xml.gz', '2022-191152.xml.gz', '1952-113090.xml.gz', '2020-171440.xml.gz', '2019-160705.xml.gz', '2018-173275.xml.gz', '2021-190781.xml.gz', '2022-191166.xml.gz', '2018-170439.xml.gz', '2020-176201.xml.gz', '2022-191193.xml.gz', '2019-160757.xml.gz', '2022-192594.xml.gz', '2023-203145.xml.gz', '1882-112988.xml.gz', '1891-091445.xml.gz', '1978-121458.xml.gz', '1891-091631.xml.gz', '2021-183850.xml.gz', '1969-091577.xml.gz', '2019-172227.xml.gz', '2023-206163.xml.gz', '1952-091297.xml.gz', '1984-094526.xml.gz', '2022-191710.xml.gz', '1976-015781.xml.gz', '1986-059814.xml.gz', '2023-203139.xml.gz', '1879-121401.xml.gz', '1925-091571.xml.gz', '1890-111899.xml.gz', '2022-197336.xml.gz', '2021-181728.xml.gz', '1909-091535.xml.gz', '1969-092688.xml.gz', '1891-091343.xml.gz', '2016-134380.xml.gz', '2019-166024.xml.gz', '1940-112192.xml.gz', '2022-191194.xml.gz', '1891-113064.xml.gz', '1882-113025.xml.gz', '2004-103984.xml.gz', '2022-198429.xml.gz', '1924-087768.xml.gz', '2023-203138.xml.gz', '1938-121359.xml.gz', '2007-010646.xml.gz', '1996-088010.xml.gz', '1952-024802.xml.gz', '1926-091214.xml.gz', '2021-191080.xml.gz', '1935-091233.xml.gz', '2019-168259.xml.gz', '1891-091508.xml.gz', '2022-195742.xml.gz', '1892-091517.xml.gz', '2021-196428.xml.gz', '2021-196425.xml.gz', '1938-091287.xml.gz', '1895-091518.xml.gz', '1990-088001.xml.gz', '2019-161547.xml.gz', '1909-087703.xml.gz', '1891-113073.xml.gz', '1957-121785.xml.gz', '1969-033215.xml.gz', '1927-091240.xml.gz', '2023-207349.xml.gz', '2023-207902.xml.gz', '1983-094525.xml.gz', '2016-143137.xml.gz', '2021-183857.xml.gz', '2022-201614.xml.gz', '1909-087731.xml.gz', '1952-024799.xml.gz', '1986-059815.xml.gz', '2019-161338.xml.gz', '1909-091520.xml.gz', '1891-113076.xml.gz', '2020-176860.xml.gz', '1963-068390.xml.gz', '1928-068195.xml.gz', '1908-112164.xml.gz', '1960-121786.xml.gz', '1969-033219.xml.gz', '2018-153957.xml.gz', '1895-121241.xml.gz', '1952-113084.xml.gz', '1978-099972.xml.gz', '1904-103731.xml.gz', '2022-191162.xml.gz', '1938-094519.xml.gz', '1953-121362.xml.gz', '2020-171308.xml.gz', '2023-201422.xml.gz', '1952-068354.xml.gz', '2016-142126.xml.gz', '1894-122153.xml.gz', '1995-010545.xml.gz', '2021-184423.xml.gz', '2022-191720.xml.gz', '2017-151389.xml.gz', '1891-091507.xml.gz', '2021-196422.xml.gz', '2019-168026.xml.gz', '2020-177266.xml.gz', '2023-201562.xml.gz', '1879-122256.xml.gz', '2018-156632.xml.gz', '2022-201627.xml.gz', '2021-181734.xml.gz', '1916-087721.xml.gz', '1955-091542.xml.gz', '2020-172699.xml.gz', '1890-113082.xml.gz', '2021-181004.xml.gz', '2022-191456.xml.gz', '1891-091510.xml.gz', '2023-202585.xml.gz', '2022-201615.xml.gz', '2020-168268.xml.gz', '2022-191180.xml.gz', '2021-183161.xml.gz', '2022-191863.xml.gz', '2022-191187.xml.gz', '2022-201617.xml.gz', '2018-157784.xml.gz', '1938-121360.xml.gz', '1966-033098.xml.gz', '2022-191715.xml.gz', '1958-091402.xml.gz', '1936-068221.xml.gz', '1933-068219.xml.gz', '1922-094529.xml.gz', '1996-103230.xml.gz', '2022-199620.xml.gz', '1966-033097.xml.gz', '1959-091457.xml.gz', '1890-091502.xml.gz', '1894-109609.xml.gz', '1963-087945.xml.gz', '1952-091295.xml.gz', '2022-191185.xml.gz', '1952-094523.xml.gz', '1890-091505.xml.gz', '1935-091235.xml.gz', '1879-121811.xml.gz', '2016-142140.xml.gz', '2019-167509.xml.gz', '2022-198243.xml.gz', '1966-033096.xml.gz', '1879-121808.xml.gz', '2022-191859.xml.gz', '1957-121835.xml.gz', '2017-172309.xml.gz', '1935-091125.xml.gz', '2018-155266.xml.gz', '2022-191170.xml.gz', '1978-099967.xml.gz', '2022-191723.xml.gz', '2021-196436.xml.gz', '1976-015298.xml.gz', '1891-091513.xml.gz', '1931-087878.xml.gz', '1881-113011.xml.gz', '2021-196435.xml.gz', '1895-121244.xml.gz', '1882-113044.xml.gz', '2020-172697.xml.gz', '2006-025147.xml.gz', '1891-121889.xml.gz', '2020-177265.xml.gz', '1891-071422.xml.gz', '1909-068365.xml.gz', '2021-186746.xml.gz', '2020-175377.xml.gz', '1955-068368.xml.gz', '2022-191712.xml.gz', '1891-091531.xml.gz', '2023-206416.xml.gz', '1894-122154.xml.gz', '1908-112154.xml.gz', '1952-113042.xml.gz', '2022-191864.xml.gz', '2022-191215.xml.gz', '2022-191192.xml.gz', '1976-015782.xml.gz', '1973-018152.xml.gz', '2022-191856.xml.gz', '2023-206424.xml.gz', '1931-068218.xml.gz', '1891-113050.xml.gz', '2022-191719.xml.gz', '1969-033207.xml.gz', '1891-091159.xml.gz', '2019-166213.xml.gz', '2018-170089.xml.gz', '1969-092685.xml.gz', '1952-062035.xml.gz', '1958-094600.xml.gz', '1909-087666.xml.gz', '1962-091596.xml.gz', '2018-154057.xml.gz', '1986-005597.xml.gz', '2016-147062.xml.gz', '1909-068404.xml.gz', '2023-206415.xml.gz', '2020-174046.xml.gz', '1970-104899.xml.gz', '1909-091522.xml.gz', '2019-166704.xml.gz', '1982-091333.xml.gz', '1936-091541.xml.gz', '2021-196433.xml.gz', '2020-172698.xml.gz', '1882-113038.xml.gz', '2022-191709.xml.gz', '1931-088084.xml.gz', '2022-201723.xml.gz', '2022-191858.xml.gz', '1923-087761.xml.gz', '1952-112195.xml.gz', '1890-113083.xml.gz', '1944-068245.xml.gz', '2022-191727.xml.gz', '1882-113046.xml.gz', '1927-068194.xml.gz', '1970-068250.xml.gz', '2022-201626.xml.gz', '1996-000906.xml.gz', '2018-157294.xml.gz', '2021-196434.xml.gz', '1953-068356.xml.gz', '2022-196183.xml.gz', '1957-068207.xml.gz', '1976-015297.xml.gz', '1891-091515.xml.gz', '1928-087838.xml.gz', '2023-203135.xml.gz', '1975-010345.xml.gz', '1973-104728.xml.gz', '1969-033212.xml.gz', '2023-203894.xml.gz', '2023-203133.xml.gz', '1909-087673.xml.gz', '2016-146150.xml.gz', '1970-061363.xml.gz', '2021-196426.xml.gz', '2016-141578.xml.gz', '1927-068330.xml.gz', '2021-196417.xml.gz', '1970-061446.xml.gz', '2016-134379.xml.gz', '2020-172703.xml.gz', '2022-191165.xml.gz', '2016-134344.xml.gz', '1908-112143.xml.gz', '2021-187270.xml.gz', '1973-081811.xml.gz', '1977-068239.xml.gz', '1925-091337.xml.gz', '2018-167985.xml.gz', '1962-091594.xml.gz', '1970-061362.xml.gz', '1932-087853.xml.gz', '1895-121242.xml.gz', '1923-091569.xml.gz', '2021-181733.xml.gz', '1882-121827.xml.gz', '2021-181732.xml.gz', '1946-068204.xml.gz', '2023-204993.xml.gz', '2022-191168.xml.gz', '2022-191171.xml.gz', '2003-063628.xml.gz', '1909-087706.xml.gz', '2022-191716.xml.gz', '1959-091194.xml.gz', '2023-206421.xml.gz', '1929-091219.xml.gz', '2022-191169.xml.gz', '1954-094534.xml.gz', '1969-033206.xml.gz', '1978-012352.xml.gz', '1879-121424.xml.gz', '1909-087674.xml.gz', '2020-175378.xml.gz', '2023-206422.xml.gz', '1891-068242.xml.gz', '1894-122155.xml.gz', '2022-196184.xml.gz', '1882-113023.xml.gz', '2023-207905.xml.gz', '2021-181735.xml.gz', '2021-191078.xml.gz', '1969-033214.xml.gz', '2016-141616.xml.gz', '1952-113041.xml.gz', '1958-068217.xml.gz', '2021-188296.xml.gz', '1924-091536.xml.gz', '1958-068224.xml.gz', '1961-002130.xml.gz', '1970-018763.xml.gz', '1969-033218.xml.gz', '2020-176854.xml.gz', '2023-206419.xml.gz', '2018-157193.xml.gz', '2022-196735.xml.gz', '1909-087668.xml.gz', '2023-208716.xml.gz', '1955-094535.xml.gz', '2020-176796.xml.gz', '1895-121248.xml.gz', '2022-203539.xml.gz', '1932-121315.xml.gz', '1973-060207.xml.gz', '1890-111915.xml.gz', '1938-094533.xml.gz', '2022-192704.xml.gz', '2022-195743.xml.gz', '1969-094602.xml.gz', '2016-142142.xml.gz', '1957-068389.xml.gz', '2016-142102.xml.gz', '1891-091470.xml.gz', '2021-185182.xml.gz', '2016-142111.xml.gz', '1890-111954.xml.gz', '1937-068201.xml.gz', '1974-068394.xml.gz', '2022-191724.xml.gz', '2020-169780.xml.gz', '1895-121232.xml.gz', '2023-203146.xml.gz', '2022-191857.xml.gz', '2006-025144.xml.gz', '1970-104897.xml.gz', '1959-094558.xml.gz', '2017-153161.xml.gz', '1905-112246.xml.gz', '1969-092686.xml.gz', '2020-179730.xml.gz', '1952-024801.xml.gz', '1909-091521.xml.gz', '1934-091395.xml.gz', '1990-190466.xml.gz', '2022-191860.xml.gz', '2022-191183.xml.gz', '1952-094521.xml.gz', '1940-068351.xml.gz', '1940-088141.xml.gz', '1984-081833.xml.gz', '2003-002175.xml.gz', '2022-191186.xml.gz', '2016-134398.xml.gz', '1890-111957.xml.gz', '2023-203132.xml.gz', '1926-087792.xml.gz', '2020-177353.xml.gz', '2018-170437.xml.gz', '1963-068392.xml.gz', '2021-191081.xml.gz', '2022-191182.xml.gz', '2022-192972.xml.gz', '2023-207904.xml.gz', '1890-111921.xml.gz', '1947-092550.xml.gz', '1960-094540.xml.gz', '2003-067109.xml.gz', '1970-104898.xml.gz', '1973-104723.xml.gz', '2018-156633.xml.gz', '1890-111944.xml.gz', '2016-142101.xml.gz', '2016-141585.xml.gz', '1970-018757.xml.gz', '1882-113039.xml.gz', '2023-200409.xml.gz', '2021-185164.xml.gz', '2020-172702.xml.gz', '2022-191713.xml.gz', '2020-177041.xml.gz', '2021-178632.xml.gz', '1926-068230.xml.gz', '2018-156634.xml.gz', '1882-121831.xml.gz', '2022-194770.xml.gz', '1933-091443.xml.gz', '1909-113081.xml.gz', '2021-181738.xml.gz', '2016-134345.xml.gz', '2023-206423.xml.gz', '1961-002129.xml.gz', '1929-091246.xml.gz', '1890-111952.xml.gz', '1952-113071.xml.gz', '1984-081834.xml.gz', '1931-088092.xml.gz', '1891-091511.xml.gz', '2020-170885.xml.gz', '1995-010546.xml.gz', '1891-103136.xml.gz', '2022-191195.xml.gz', '1891-091506.xml.gz', '1976-015779.xml.gz', '1908-112163.xml.gz', '1961-094591.xml.gz', '2020-172608.xml.gz', '1882-113045.xml.gz', '1952-091338.xml.gz', '1935-092743.xml.gz', '1970-061447.xml.gz', '2006-103884.xml.gz', '1924-087776.xml.gz', '2019-162035.xml.gz', '1909-087705.xml.gz', '1935-091407.xml.gz', '1978-099971.xml.gz', '1933-094598.xml.gz', '2022-191714.xml.gz', '1979-094542.xml.gz', '1935-091234.xml.gz', '2021-196429.xml.gz', '2016-142251.xml.gz', '1923-121199.xml.gz', '2023-200743.xml.gz', '2023-203137.xml.gz', '1957-121837.xml.gz', '1936-103735.xml.gz', '1890-111941.xml.gz', '2016-142143.xml.gz', '1970-061444.xml.gz', '2020-171309.xml.gz', '2021-196432.xml.gz', '1890-112969.xml.gz', '1954-068387.xml.gz', '1938-068244.xml.gz', '1969-033217.xml.gz', '1976-014856.xml.gz', '1935-119451.xml.gz', '1919-068191.xml.gz', '1909-087667.xml.gz', '1952-024803.xml.gz', '2021-196424.xml.gz', '1934-091540.xml.gz', '1970-104900.xml.gz', '1961-002131.xml.gz', '1973-060208.xml.gz', '1909-087732.xml.gz', '2020-172701.xml.gz', '2018-157386.xml.gz', '2023-206420.xml.gz', '2023-206417.xml.gz', '2016-143136.xml.gz', '2019-168263.xml.gz', '2020-172696.xml.gz', '2023-203126.xml.gz', '2023-206879.xml.gz', '1922-087727.xml.gz', '2021-181729.xml.gz', '2020-174025.xml.gz', '1909-087704.xml.gz', '1996-081802.xml.gz', '2023-203140.xml.gz', '2018-170438.xml.gz', '2021-187101.xml.gz', '1909-087702.xml.gz', '1931-088093.xml.gz', '1929-087846.xml.gz', '1970-061445.xml.gz', '2023-203142.xml.gz', '2017-152011.xml.gz', '1934-090611.xml.gz', '1999-002658.xml.gz', '2020-175379.xml.gz', '1996-021717.xml.gz', '1915-103737.xml.gz', '1905-111902.xml.gz', '2022-201722.xml.gz', '1966-094604.xml.gz', '1926-091212.xml.gz', '1891-091533.xml.gz', '2022-191861.xml.gz', '2023-206414.xml.gz', '2022-191711.xml.gz', '1940-112191.xml.gz', '2023-205273.xml.gz', '1891-113077.xml.gz', '1957-068367.xml.gz', '1929-091538.xml.gz', '1936-122224.xml.gz', '2020-175467.xml.gz', '1891-113049.xml.gz', '2022-193726.xml.gz', '2021-188201.xml.gz', '1890-111949.xml.gz', '1924-091537.xml.gz', '1879-121812.xml.gz', '1969-033209.xml.gz', '1879-119364.xml.gz', '2022-191167.xml.gz', '2019-166707.xml.gz', '1926-087820.xml.gz', '1879-121810.xml.gz', '1987-119382.xml.gz', '1891-076561.xml.gz']

Rerun the previous request and timeit again:

[6]:
%time df = boring.search(location=Within(Box(150145, 205030, 155150, 206935)))
[000/001] .
[000/255] cccccccccccccccccccccccccccccccccccccccccccccccccc
[050/255] cccccccccccccccccccccccccccccccccccccccccccccccccc
[100/255] cccccccccccccccccccccccccccccccccccccccccccccccccc
[150/255] cccccccccccccccccccccccccccccccccccccccccccccccccc
[200/255] cccccccccccccccccccccccccccccccccccccccccccccccccc
[250/255] ccccc
CPU times: user 395 ms, sys: 81.3 ms, total: 476 ms
Wall time: 708 ms

The use of the cache decreased the runtime by a factor 100 in the current example. This will increase drastically if more permalinks are queried since the download takes much longer than the IO at runtime.

Disabling the cache

You can (temporarily!) disable the caching mechanism. This disables both the saving of newly downloaded data in the cache, as well as reusing existing data in the cache. It remains valid for the time being of the instantiated pydov.cache object. It does not delete existing data in the cache.

[7]:
# list number of files
print('number of files: ', len(os.listdir(os.path.join(cachedir, 'boring'))))
number of files:  736
[8]:
# disable caching
cache_orig = pydov.cache
pydov.cache = None
# new query
df = boring.search(location=Within(Box(151000, 205930, 153000, 206000)))
print(df.head())
[000/001] .
[000/002] ..
                                         pkey_boring     boornummer         x  \
0  https://www.dov.vlaanderen.be/data/boring/1895...   kb15d43w-B47  151600.0
1  https://www.dov.vlaanderen.be/data/boring/1984...  kb15d43w-B403  151041.0

          y  mv_mtaw  start_boring_mtaw   gemeente  diepte_boring_van  \
0  205998.0    15.00              15.00  Antwerpen                0.0
1  205933.0    21.07              21.07  Antwerpen                0.0

   diepte_boring_tot datum_aanvang                                uitvoerder  \
0                3.3    1895-01-04                                  onbekend
1                7.0    1984-09-26  Universiteit Gent - Geologisch Instituut

   boorgatmeting  diepte_methode_van  diepte_methode_tot   boormethode
0          False                 0.0                 3.3      onbekend
1          False                 0.0                 7.0  droge boring
[9]:
# list number of files
print('number of files: ', len(os.listdir(os.path.join(cachedir, 'boring'))))
number of files:  736

Hence, no new files were added to the cache when disabling it.

The caching is disabled by removing the pydov.cache object from the namespace. If you want to enable caching again you must instantiate it anew.

[10]:
pydov.cache = cache_orig

Changing the location of cached data

By default, pydov stores the cache in a temporary directory provided by the user’s operating system. On Windows, the cache is usually located in: C:\Users\username\AppData\Local\Temp\pydov\ If you want the cached xml files to be saved in another location you can define your own cache for the current runtime. Mind that this does not change the location of previously saved data. No lookup in the old datafolder will be performed after changing the directory’s location. Besides controlling the cache’s location, this also allows using different scripts or projects.

[11]:
import pydov.util.caching

pydov.cache = pydov.util.caching.GzipTextFileCache(
    cachedir=r'C:\temp\pydov'
    )
[12]:
cachedir = pydov.cache.cachedir
print(cachedir)
C:\temp\pydov
[13]:
# for the sake of the example, change dir location back
pydov.cache = cache_orig
cachedir = pydov.cache.cachedir

Changing the maximum age of cached data

If you work with rapidly changing data or want to control when cached data is renewed, you can do so by changing the maximum age of cached data to be considered valid for the currenct runtime. You can use ‘weeks’, ‘days’ or any other common datetime format. If a cached version exists and is younger than the maximum age, it is used in favor of renewing the data from DOV services. If no cached version exists or is older than the maximum age, the data is renewed and saved in the cache. Note that data older than the maximum age is not automatically deleted from the cache.

[14]:
import pydov.util.caching
import datetime
pydov.cache = pydov.util.caching.GzipTextFileCache(
    max_age=datetime.timedelta(seconds=1)
    )
print(pydov.cache.max_age)
0:00:01
[15]:
from time import ctime
print(os.listdir(os.path.join(cachedir, 'boring'))[0])
ctime(os.path.getmtime(os.path.join(os.path.join(cachedir, 'boring'),
                                    os.listdir(os.path.join(cachedir, 'boring'))[0]
                                   )
                      )
     )
2023-206524.xml.gz
[15]:
'Tue Oct  3 14:28:24 2023'
[16]:
# rerun previous query
%time df = boring.search(location=Within(Box(150145, 205030, 155150, 206935)))
[000/001] .
[000/255] ..................................................
[050/255] ..................................................
[100/255] ..................................................
[150/255] ..................................................
[200/255] ..................................................
[250/255] .....
CPU times: user 2.54 s, sys: 348 ms, total: 2.89 s
Wall time: 35.5 s
[17]:
from time import ctime
print(os.listdir(os.path.join(cachedir, 'boring'))[0])
ctime(os.path.getmtime(os.path.join(os.path.join(cachedir, 'boring'),
                                    os.listdir(os.path.join(cachedir, 'boring'))[0]
                                   )
                      )
     )
2023-206524.xml.gz
[17]:
'Tue Oct  3 14:28:24 2023'

Cleaning the cache

Since we use a temporary directory provided by the operating system, we rely on the operating system to clean the folder when it deems necessary.

To clean the cache, removing all records older than the maximum age

[18]:
from time import sleep
[19]:
print('number of files before clean: ', len(os.listdir(os.path.join(cachedir, 'boring'))))
sleep(2) # remember we've put the caching age on 1 second
pydov.cache.clean()
print('number of files after clean: ', len(os.listdir(os.path.join(cachedir, 'boring'))))
number of files before clean:  736
number of files after clean:  0

Should you want to remove the pydov cache from code yourself, you can do so as illustrated below. Note that this will erase the entire cache, not only the records older than the maximum age:

[20]:
pydov.cache.remove()
# check existence of the cache directory:
print(os.path.exists(os.path.join(cachedir, 'boring')))
False

Disabling stale responses on error

By default, pydov will return stale data (i.e. XML documents still present in the cache, but no longer considered valid) in case it fails to download a fresh copy from the DOV webservices. We believe this behaviour to benefit most users, as we think stale data is still better than no data at all.

If your application cannot afford stale data, you can switch the default behaviour by issuing:

[21]:
pydov.cache.stale_on_error = False

This will cause pydov not to return stale data and instead set the XML fields to NaN, as if the stale data wasn’t available.

Custom caching

By default, pydov caches files on disk as gzipped XML documents. Should you for any reason want to use plain text XML documents instead, you can do so by using the PlainTextFileCache instead of the GzipTextFileCache.
Mind that this can increase the disk usage of the cache by 10x.:
[22]:
import pydov.util.caching
pydov.cache = pydov.util.caching.PlainTextFileCache()

Implementing custom caching

Should you want to implement your own caching mechanism, you can do so by subclassing :class:pydov.util.caching.AbstractCache and implementing its abstract methods get, clean and remove. Hereby you can use the available methods _get_remote to request data from the DOV webservices and _emit_cache_hit to notify hooks a file has been retrieved from the cache.

Note that the get method will be called from multiple threads simultaneously, so implementations must be threadsafe or use locking.

A (naive) implementation for an in-memory cache would be something like:

[23]:
from pydov.util.caching import AbstractCache

class MemoryCache(AbstractCache):
    def __init__(self):
        self.cache = {}

    def get(self, url):
        if url not in self.cache:
            self.cache[url] = self._get_remote(url)
        else:
            self._emit_cache_hit(url)
        return self.cache[url]

    def clean(self):
        self.cache = {}

    def remove(self):
        self.cache = {}

pydov.cache = MemoryCache()