Example of XML caching for pydov¶
Introduction¶
To speed up subsequent queries involving similar data, pydov uses a caching mechanism where raw DOV XML data is cached locally for later reuse. For regular usage of the package and data requests, the cache will be a convenient feature speeding up the time for subsequent queries. However, in case you want to alter the configuration or cache handling, this notebook illustrates some use cases on the cache handling.
Use cases:¶
Check cached files
Speed up subsequent queries
Disabling the cache
Changing the location of cached data
Changing the maximum age of cached data
Cleaning the cache
[1]:
# check pydov path
import warnings; warnings.simplefilter('ignore')
import pydov
Use cases¶
Check cached files¶
[2]:
from pydov.search.boring import BoringSearch
boring = BoringSearch()
The pydov.cache.cachedir
defines the directory on the file system used to cache DOV files:
[3]:
# check the cache dir
import os
import pydov.util.caching
cachedir = pydov.cache.cachedir
print(cachedir)
print('directories: ', os.listdir(cachedir))
/tmp/pydov
directories: ['filter', 'boring', 'sondering', 'grondmonster']
Speed up subsequent queries¶
To illustrate the convenience of the caching during subsequent data requests, consider the following request, while measuring the time:
[4]:
from pydov.util.location import Within, Box
# Get all borehole data in a bounding box (llx, llxy, ulx, uly) and timeit
%time df = boring.search(location=Within(Box(150145, 205030, 155150, 206935)))
[000/001] .
[000/255] ..................................................
[050/255] ..................................................
[100/255] ..................................................
[150/255] ..................................................
[200/255] ..................................................
[250/255] .....
CPU times: user 2.86 s, sys: 241 ms, total: 3.11 s
Wall time: 30.3 s
[5]:
# The structure of cachedir implies a separate directory for each query type, since permalinks are not unique across types
# In this example 'boring' will be queried, therefore list xmls in the cache of the 'boring' type
# list files present
print('number of files: ', len(os.listdir(os.path.join(pydov.cache.cachedir, 'boring'))))
print('files present: ', os.listdir(os.path.join(pydov.cache.cachedir, 'boring')))
number of files: 736
files present: ['2023-206524.xml.gz', '1986-059816.xml.gz', '1966-068248.xml.gz', '2021-196430.xml.gz', '1890-111945.xml.gz', '2023-203136.xml.gz', '2016-141571.xml.gz', '2019-166049.xml.gz', '2022-195501.xml.gz', '2018-170389.xml.gz', '1928-103238.xml.gz', '1882-112987.xml.gz', '2020-169408.xml.gz', '2021-181731.xml.gz', '2020-172695.xml.gz', '1986-005598.xml.gz', '1986-005594.xml.gz', '2019-168260.xml.gz', '2022-201616.xml.gz', '2022-191134.xml.gz', '2020-176580.xml.gz', '2019-168265.xml.gz', '2016-134350.xml.gz', '2022-191184.xml.gz', '1961-068369.xml.gz', '1970-061366.xml.gz', '1952-068355.xml.gz', '2023-203141.xml.gz', '2020-175375.xml.gz', '1936-087893.xml.gz', '2022-191158.xml.gz', '2016-134368.xml.gz', '2021-196427.xml.gz', '2020-175376.xml.gz', '1970-061365.xml.gz', '1988-091630.xml.gz', '1890-111943.xml.gz', '2022-191151.xml.gz', '1879-121412.xml.gz', '2023-203143.xml.gz', '2022-199618.xml.gz', '1970-061443.xml.gz', '1891-091514.xml.gz', '2023-202562.xml.gz', '1936-103122.xml.gz', '1934-068197.xml.gz', '2016-142109.xml.gz', '1976-015780.xml.gz', '2022-191191.xml.gz', '1894-121258.xml.gz', '2022-200455.xml.gz', '2022-191154.xml.gz', '1889-113047.xml.gz', '2021-186963.xml.gz', '1890-113075.xml.gz', '2020-174745.xml.gz', '1986-005596.xml.gz', '1940-068202.xml.gz', '1891-091509.xml.gz', '1970-061442.xml.gz', '2021-184635.xml.gz', '2021-181736.xml.gz', '1938-112188.xml.gz', '2022-191726.xml.gz', '1986-091461.xml.gz', '1908-112155.xml.gz', '1952-002633.xml.gz', '2018-167528.xml.gz', '1941-068352.xml.gz', '2022-191721.xml.gz', '2018-170390.xml.gz', '1924-087778.xml.gz', '1974-010351.xml.gz', '1969-033216.xml.gz', '1959-091378.xml.gz', '1936-094599.xml.gz', '2022-191181.xml.gz', '2022-199273.xml.gz', '2023-206425.xml.gz', '1968-094528.xml.gz', '2022-191164.xml.gz', '1929-091562.xml.gz', '1935-091406.xml.gz', '2016-147823.xml.gz', '2020-172694.xml.gz', '1882-113024.xml.gz', '1935-091232.xml.gz', '2021-196423.xml.gz', '2017-148854.xml.gz', '1924-087766.xml.gz', '2020-168267.xml.gz', '1890-111829.xml.gz', '1891-113078.xml.gz', '1882-112986.xml.gz', '2022-191725.xml.gz', '1969-033208.xml.gz', '1970-018762.xml.gz', '1924-087777.xml.gz', '1985-084552.xml.gz', '2017-152860.xml.gz', '1923-091570.xml.gz', '2021-185819.xml.gz', '1891-113079.xml.gz', '2022-198226.xml.gz', '2020-177881.xml.gz', '1945-121832.xml.gz', '1953-121327.xml.gz', '1952-113085.xml.gz', '2022-191189.xml.gz', '2021-183848.xml.gz', '2023-207903.xml.gz', '2021-196419.xml.gz', '1957-121836.xml.gz', '2021-181737.xml.gz', '1970-061454.xml.gz', '1909-109621.xml.gz', '1891-091530.xml.gz', '2019-160294.xml.gz', '2022-191717.xml.gz', '1890-111948.xml.gz', '2022-191718.xml.gz', '2020-175374.xml.gz', '1879-121292.xml.gz', '2021-186450.xml.gz', '1923-087753.xml.gz', '2022-199619.xml.gz', '1959-094557.xml.gz', '1946-091485.xml.gz', '2022-195502.xml.gz', '2019-168261.xml.gz', '1927-094597.xml.gz', '2021-181730.xml.gz', '1879-121293.xml.gz', '2020-175766.xml.gz', '1881-112999.xml.gz', '1924-087807.xml.gz', '1891-113033.xml.gz', '2022-199051.xml.gz', '1895-091534.xml.gz', '1934-091575.xml.gz', '2022-198227.xml.gz', '2023-203134.xml.gz', '1966-033099.xml.gz', '2022-200938.xml.gz', '2019-168264.xml.gz', '2021-180997.xml.gz', '2021-191079.xml.gz', '1928-091572.xml.gz', '1973-068234.xml.gz', '2022-189875.xml.gz', '1935-087854.xml.gz', '2016-142141.xml.gz', '2022-189945.xml.gz', '2021-196418.xml.gz', '2022-193074.xml.gz', '2020-176579.xml.gz', '2022-191190.xml.gz', '1971-094532.xml.gz', '1927-068216.xml.gz', '2021-182867.xml.gz', '1924-087806.xml.gz', '1969-033213.xml.gz', '1895-121247.xml.gz', '1969-033211.xml.gz', '2022-189876.xml.gz', '2021-187046.xml.gz', '2021-196431.xml.gz', '1879-121809.xml.gz', '1909-087707.xml.gz', '1891-091471.xml.gz', '1969-092689.xml.gz', '1890-091504.xml.gz', '1957-091543.xml.gz', '1995-103203.xml.gz', '2023-203144.xml.gz', '1966-068241.xml.gz', '1970-061450.xml.gz', '2023-206359.xml.gz', '1890-091529.xml.gz', '1891-091516.xml.gz', '2022-191862.xml.gz', '2008-065101.xml.gz', '2022-191155.xml.gz', '1969-033220.xml.gz', '1976-094607.xml.gz', '1932-091250.xml.gz', '1964-087972.xml.gz', '2023-206426.xml.gz', '1952-091336.xml.gz', '1891-091597.xml.gz', '1905-111901.xml.gz', '2020-172700.xml.gz', '1952-088151.xml.gz', '1924-087804.xml.gz', '2021-196420.xml.gz', '2021-188778.xml.gz', '1946-087912.xml.gz', '1952-094522.xml.gz', '2022-191188.xml.gz', '1953-121361.xml.gz', '1969-092687.xml.gz', '1891-091512.xml.gz', '1928-091244.xml.gz', '1913-068213.xml.gz', '1924-087772.xml.gz', '1923-090650.xml.gz', '2022-191855.xml.gz', '2018-155580.xml.gz', '1971-068395.xml.gz', '1978-099973.xml.gz', '2020-172704.xml.gz', '1922-092796.xml.gz', '2022-191153.xml.gz', '1961-091576.xml.gz', '1926-091211.xml.gz', '1970-061364.xml.gz', '1973-104727.xml.gz', '1929-091574.xml.gz', '2023-206203.xml.gz', '1995-103205.xml.gz', '1909-109622.xml.gz', '1938-094603.xml.gz', '1932-091539.xml.gz', '1879-121387.xml.gz', '2023-206413.xml.gz', '1978-068205.xml.gz', '1890-091528.xml.gz', '1923-121200.xml.gz', '1891-091472.xml.gz', '2022-191152.xml.gz', '1952-113090.xml.gz', '2020-171440.xml.gz', '2019-160705.xml.gz', '2018-173275.xml.gz', '2021-190781.xml.gz', '2022-191166.xml.gz', '2018-170439.xml.gz', '2020-176201.xml.gz', '2022-191193.xml.gz', '2019-160757.xml.gz', '2022-192594.xml.gz', '2023-203145.xml.gz', '1882-112988.xml.gz', '1891-091445.xml.gz', '1978-121458.xml.gz', '1891-091631.xml.gz', '2021-183850.xml.gz', '1969-091577.xml.gz', '2019-172227.xml.gz', '2023-206163.xml.gz', '1952-091297.xml.gz', '1984-094526.xml.gz', '2022-191710.xml.gz', '1976-015781.xml.gz', '1986-059814.xml.gz', '2023-203139.xml.gz', '1879-121401.xml.gz', '1925-091571.xml.gz', '1890-111899.xml.gz', '2022-197336.xml.gz', '2021-181728.xml.gz', '1909-091535.xml.gz', '1969-092688.xml.gz', '1891-091343.xml.gz', '2016-134380.xml.gz', '2019-166024.xml.gz', '1940-112192.xml.gz', '2022-191194.xml.gz', '1891-113064.xml.gz', '1882-113025.xml.gz', '2004-103984.xml.gz', '2022-198429.xml.gz', '1924-087768.xml.gz', '2023-203138.xml.gz', '1938-121359.xml.gz', '2007-010646.xml.gz', '1996-088010.xml.gz', '1952-024802.xml.gz', '1926-091214.xml.gz', '2021-191080.xml.gz', '1935-091233.xml.gz', '2019-168259.xml.gz', '1891-091508.xml.gz', '2022-195742.xml.gz', '1892-091517.xml.gz', '2021-196428.xml.gz', '2021-196425.xml.gz', '1938-091287.xml.gz', '1895-091518.xml.gz', '1990-088001.xml.gz', '2019-161547.xml.gz', '1909-087703.xml.gz', '1891-113073.xml.gz', '1957-121785.xml.gz', '1969-033215.xml.gz', '1927-091240.xml.gz', '2023-207349.xml.gz', '2023-207902.xml.gz', '1983-094525.xml.gz', '2016-143137.xml.gz', '2021-183857.xml.gz', '2022-201614.xml.gz', '1909-087731.xml.gz', '1952-024799.xml.gz', '1986-059815.xml.gz', '2019-161338.xml.gz', '1909-091520.xml.gz', '1891-113076.xml.gz', '2020-176860.xml.gz', '1963-068390.xml.gz', '1928-068195.xml.gz', '1908-112164.xml.gz', '1960-121786.xml.gz', '1969-033219.xml.gz', '2018-153957.xml.gz', '1895-121241.xml.gz', '1952-113084.xml.gz', '1978-099972.xml.gz', '1904-103731.xml.gz', '2022-191162.xml.gz', '1938-094519.xml.gz', '1953-121362.xml.gz', '2020-171308.xml.gz', '2023-201422.xml.gz', '1952-068354.xml.gz', '2016-142126.xml.gz', '1894-122153.xml.gz', '1995-010545.xml.gz', '2021-184423.xml.gz', '2022-191720.xml.gz', '2017-151389.xml.gz', '1891-091507.xml.gz', '2021-196422.xml.gz', '2019-168026.xml.gz', '2020-177266.xml.gz', '2023-201562.xml.gz', '1879-122256.xml.gz', '2018-156632.xml.gz', '2022-201627.xml.gz', '2021-181734.xml.gz', '1916-087721.xml.gz', '1955-091542.xml.gz', '2020-172699.xml.gz', '1890-113082.xml.gz', '2021-181004.xml.gz', '2022-191456.xml.gz', '1891-091510.xml.gz', '2023-202585.xml.gz', '2022-201615.xml.gz', '2020-168268.xml.gz', '2022-191180.xml.gz', '2021-183161.xml.gz', '2022-191863.xml.gz', '2022-191187.xml.gz', '2022-201617.xml.gz', '2018-157784.xml.gz', '1938-121360.xml.gz', '1966-033098.xml.gz', '2022-191715.xml.gz', '1958-091402.xml.gz', '1936-068221.xml.gz', '1933-068219.xml.gz', '1922-094529.xml.gz', '1996-103230.xml.gz', '2022-199620.xml.gz', '1966-033097.xml.gz', '1959-091457.xml.gz', '1890-091502.xml.gz', '1894-109609.xml.gz', '1963-087945.xml.gz', '1952-091295.xml.gz', '2022-191185.xml.gz', '1952-094523.xml.gz', '1890-091505.xml.gz', '1935-091235.xml.gz', '1879-121811.xml.gz', '2016-142140.xml.gz', '2019-167509.xml.gz', '2022-198243.xml.gz', '1966-033096.xml.gz', '1879-121808.xml.gz', '2022-191859.xml.gz', '1957-121835.xml.gz', '2017-172309.xml.gz', '1935-091125.xml.gz', '2018-155266.xml.gz', '2022-191170.xml.gz', '1978-099967.xml.gz', '2022-191723.xml.gz', '2021-196436.xml.gz', '1976-015298.xml.gz', '1891-091513.xml.gz', '1931-087878.xml.gz', '1881-113011.xml.gz', '2021-196435.xml.gz', '1895-121244.xml.gz', '1882-113044.xml.gz', '2020-172697.xml.gz', '2006-025147.xml.gz', '1891-121889.xml.gz', '2020-177265.xml.gz', '1891-071422.xml.gz', '1909-068365.xml.gz', '2021-186746.xml.gz', '2020-175377.xml.gz', '1955-068368.xml.gz', '2022-191712.xml.gz', '1891-091531.xml.gz', '2023-206416.xml.gz', '1894-122154.xml.gz', '1908-112154.xml.gz', '1952-113042.xml.gz', '2022-191864.xml.gz', '2022-191215.xml.gz', '2022-191192.xml.gz', '1976-015782.xml.gz', '1973-018152.xml.gz', '2022-191856.xml.gz', '2023-206424.xml.gz', '1931-068218.xml.gz', '1891-113050.xml.gz', '2022-191719.xml.gz', '1969-033207.xml.gz', '1891-091159.xml.gz', '2019-166213.xml.gz', '2018-170089.xml.gz', '1969-092685.xml.gz', '1952-062035.xml.gz', '1958-094600.xml.gz', '1909-087666.xml.gz', '1962-091596.xml.gz', '2018-154057.xml.gz', '1986-005597.xml.gz', '2016-147062.xml.gz', '1909-068404.xml.gz', '2023-206415.xml.gz', '2020-174046.xml.gz', '1970-104899.xml.gz', '1909-091522.xml.gz', '2019-166704.xml.gz', '1982-091333.xml.gz', '1936-091541.xml.gz', '2021-196433.xml.gz', '2020-172698.xml.gz', '1882-113038.xml.gz', '2022-191709.xml.gz', '1931-088084.xml.gz', '2022-201723.xml.gz', '2022-191858.xml.gz', '1923-087761.xml.gz', '1952-112195.xml.gz', '1890-113083.xml.gz', '1944-068245.xml.gz', '2022-191727.xml.gz', '1882-113046.xml.gz', '1927-068194.xml.gz', '1970-068250.xml.gz', '2022-201626.xml.gz', '1996-000906.xml.gz', '2018-157294.xml.gz', '2021-196434.xml.gz', '1953-068356.xml.gz', '2022-196183.xml.gz', '1957-068207.xml.gz', '1976-015297.xml.gz', '1891-091515.xml.gz', '1928-087838.xml.gz', '2023-203135.xml.gz', '1975-010345.xml.gz', '1973-104728.xml.gz', '1969-033212.xml.gz', '2023-203894.xml.gz', '2023-203133.xml.gz', '1909-087673.xml.gz', '2016-146150.xml.gz', '1970-061363.xml.gz', '2021-196426.xml.gz', '2016-141578.xml.gz', '1927-068330.xml.gz', '2021-196417.xml.gz', '1970-061446.xml.gz', '2016-134379.xml.gz', '2020-172703.xml.gz', '2022-191165.xml.gz', '2016-134344.xml.gz', '1908-112143.xml.gz', '2021-187270.xml.gz', '1973-081811.xml.gz', '1977-068239.xml.gz', '1925-091337.xml.gz', '2018-167985.xml.gz', '1962-091594.xml.gz', '1970-061362.xml.gz', '1932-087853.xml.gz', '1895-121242.xml.gz', '1923-091569.xml.gz', '2021-181733.xml.gz', '1882-121827.xml.gz', '2021-181732.xml.gz', '1946-068204.xml.gz', '2023-204993.xml.gz', '2022-191168.xml.gz', '2022-191171.xml.gz', '2003-063628.xml.gz', '1909-087706.xml.gz', '2022-191716.xml.gz', '1959-091194.xml.gz', '2023-206421.xml.gz', '1929-091219.xml.gz', '2022-191169.xml.gz', '1954-094534.xml.gz', '1969-033206.xml.gz', '1978-012352.xml.gz', '1879-121424.xml.gz', '1909-087674.xml.gz', '2020-175378.xml.gz', '2023-206422.xml.gz', '1891-068242.xml.gz', '1894-122155.xml.gz', '2022-196184.xml.gz', '1882-113023.xml.gz', '2023-207905.xml.gz', '2021-181735.xml.gz', '2021-191078.xml.gz', '1969-033214.xml.gz', '2016-141616.xml.gz', '1952-113041.xml.gz', '1958-068217.xml.gz', '2021-188296.xml.gz', '1924-091536.xml.gz', '1958-068224.xml.gz', '1961-002130.xml.gz', '1970-018763.xml.gz', '1969-033218.xml.gz', '2020-176854.xml.gz', '2023-206419.xml.gz', '2018-157193.xml.gz', '2022-196735.xml.gz', '1909-087668.xml.gz', '2023-208716.xml.gz', '1955-094535.xml.gz', '2020-176796.xml.gz', '1895-121248.xml.gz', '2022-203539.xml.gz', '1932-121315.xml.gz', '1973-060207.xml.gz', '1890-111915.xml.gz', '1938-094533.xml.gz', '2022-192704.xml.gz', '2022-195743.xml.gz', '1969-094602.xml.gz', '2016-142142.xml.gz', '1957-068389.xml.gz', '2016-142102.xml.gz', '1891-091470.xml.gz', '2021-185182.xml.gz', '2016-142111.xml.gz', '1890-111954.xml.gz', '1937-068201.xml.gz', '1974-068394.xml.gz', '2022-191724.xml.gz', '2020-169780.xml.gz', '1895-121232.xml.gz', '2023-203146.xml.gz', '2022-191857.xml.gz', '2006-025144.xml.gz', '1970-104897.xml.gz', '1959-094558.xml.gz', '2017-153161.xml.gz', '1905-112246.xml.gz', '1969-092686.xml.gz', '2020-179730.xml.gz', '1952-024801.xml.gz', '1909-091521.xml.gz', '1934-091395.xml.gz', '1990-190466.xml.gz', '2022-191860.xml.gz', '2022-191183.xml.gz', '1952-094521.xml.gz', '1940-068351.xml.gz', '1940-088141.xml.gz', '1984-081833.xml.gz', '2003-002175.xml.gz', '2022-191186.xml.gz', '2016-134398.xml.gz', '1890-111957.xml.gz', '2023-203132.xml.gz', '1926-087792.xml.gz', '2020-177353.xml.gz', '2018-170437.xml.gz', '1963-068392.xml.gz', '2021-191081.xml.gz', '2022-191182.xml.gz', '2022-192972.xml.gz', '2023-207904.xml.gz', '1890-111921.xml.gz', '1947-092550.xml.gz', '1960-094540.xml.gz', '2003-067109.xml.gz', '1970-104898.xml.gz', '1973-104723.xml.gz', '2018-156633.xml.gz', '1890-111944.xml.gz', '2016-142101.xml.gz', '2016-141585.xml.gz', '1970-018757.xml.gz', '1882-113039.xml.gz', '2023-200409.xml.gz', '2021-185164.xml.gz', '2020-172702.xml.gz', '2022-191713.xml.gz', '2020-177041.xml.gz', '2021-178632.xml.gz', '1926-068230.xml.gz', '2018-156634.xml.gz', '1882-121831.xml.gz', '2022-194770.xml.gz', '1933-091443.xml.gz', '1909-113081.xml.gz', '2021-181738.xml.gz', '2016-134345.xml.gz', '2023-206423.xml.gz', '1961-002129.xml.gz', '1929-091246.xml.gz', '1890-111952.xml.gz', '1952-113071.xml.gz', '1984-081834.xml.gz', '1931-088092.xml.gz', '1891-091511.xml.gz', '2020-170885.xml.gz', '1995-010546.xml.gz', '1891-103136.xml.gz', '2022-191195.xml.gz', '1891-091506.xml.gz', '1976-015779.xml.gz', '1908-112163.xml.gz', '1961-094591.xml.gz', '2020-172608.xml.gz', '1882-113045.xml.gz', '1952-091338.xml.gz', '1935-092743.xml.gz', '1970-061447.xml.gz', '2006-103884.xml.gz', '1924-087776.xml.gz', '2019-162035.xml.gz', '1909-087705.xml.gz', '1935-091407.xml.gz', '1978-099971.xml.gz', '1933-094598.xml.gz', '2022-191714.xml.gz', '1979-094542.xml.gz', '1935-091234.xml.gz', '2021-196429.xml.gz', '2016-142251.xml.gz', '1923-121199.xml.gz', '2023-200743.xml.gz', '2023-203137.xml.gz', '1957-121837.xml.gz', '1936-103735.xml.gz', '1890-111941.xml.gz', '2016-142143.xml.gz', '1970-061444.xml.gz', '2020-171309.xml.gz', '2021-196432.xml.gz', '1890-112969.xml.gz', '1954-068387.xml.gz', '1938-068244.xml.gz', '1969-033217.xml.gz', '1976-014856.xml.gz', '1935-119451.xml.gz', '1919-068191.xml.gz', '1909-087667.xml.gz', '1952-024803.xml.gz', '2021-196424.xml.gz', '1934-091540.xml.gz', '1970-104900.xml.gz', '1961-002131.xml.gz', '1973-060208.xml.gz', '1909-087732.xml.gz', '2020-172701.xml.gz', '2018-157386.xml.gz', '2023-206420.xml.gz', '2023-206417.xml.gz', '2016-143136.xml.gz', '2019-168263.xml.gz', '2020-172696.xml.gz', '2023-203126.xml.gz', '2023-206879.xml.gz', '1922-087727.xml.gz', '2021-181729.xml.gz', '2020-174025.xml.gz', '1909-087704.xml.gz', '1996-081802.xml.gz', '2023-203140.xml.gz', '2018-170438.xml.gz', '2021-187101.xml.gz', '1909-087702.xml.gz', '1931-088093.xml.gz', '1929-087846.xml.gz', '1970-061445.xml.gz', '2023-203142.xml.gz', '2017-152011.xml.gz', '1934-090611.xml.gz', '1999-002658.xml.gz', '2020-175379.xml.gz', '1996-021717.xml.gz', '1915-103737.xml.gz', '1905-111902.xml.gz', '2022-201722.xml.gz', '1966-094604.xml.gz', '1926-091212.xml.gz', '1891-091533.xml.gz', '2022-191861.xml.gz', '2023-206414.xml.gz', '2022-191711.xml.gz', '1940-112191.xml.gz', '2023-205273.xml.gz', '1891-113077.xml.gz', '1957-068367.xml.gz', '1929-091538.xml.gz', '1936-122224.xml.gz', '2020-175467.xml.gz', '1891-113049.xml.gz', '2022-193726.xml.gz', '2021-188201.xml.gz', '1890-111949.xml.gz', '1924-091537.xml.gz', '1879-121812.xml.gz', '1969-033209.xml.gz', '1879-119364.xml.gz', '2022-191167.xml.gz', '2019-166707.xml.gz', '1926-087820.xml.gz', '1879-121810.xml.gz', '1987-119382.xml.gz', '1891-076561.xml.gz']
Rerun the previous request and timeit again:
[6]:
%time df = boring.search(location=Within(Box(150145, 205030, 155150, 206935)))
[000/001] .
[000/255] cccccccccccccccccccccccccccccccccccccccccccccccccc
[050/255] cccccccccccccccccccccccccccccccccccccccccccccccccc
[100/255] cccccccccccccccccccccccccccccccccccccccccccccccccc
[150/255] cccccccccccccccccccccccccccccccccccccccccccccccccc
[200/255] cccccccccccccccccccccccccccccccccccccccccccccccccc
[250/255] ccccc
CPU times: user 395 ms, sys: 81.3 ms, total: 476 ms
Wall time: 708 ms
The use of the cache decreased the runtime by a factor 100 in the current example. This will increase drastically if more permalinks are queried since the download takes much longer than the IO at runtime.
Disabling the cache¶
You can (temporarily!) disable the caching mechanism. This disables both the saving of newly downloaded data in the cache, as well as reusing existing data in the cache. It remains valid for the time being of the instantiated pydov.cache object. It does not delete existing data in the cache.
[7]:
# list number of files
print('number of files: ', len(os.listdir(os.path.join(cachedir, 'boring'))))
number of files: 736
[8]:
# disable caching
cache_orig = pydov.cache
pydov.cache = None
# new query
df = boring.search(location=Within(Box(151000, 205930, 153000, 206000)))
print(df.head())
[000/001] .
[000/002] ..
pkey_boring boornummer x \
0 https://www.dov.vlaanderen.be/data/boring/1895... kb15d43w-B47 151600.0
1 https://www.dov.vlaanderen.be/data/boring/1984... kb15d43w-B403 151041.0
y mv_mtaw start_boring_mtaw gemeente diepte_boring_van \
0 205998.0 15.00 15.00 Antwerpen 0.0
1 205933.0 21.07 21.07 Antwerpen 0.0
diepte_boring_tot datum_aanvang uitvoerder \
0 3.3 1895-01-04 onbekend
1 7.0 1984-09-26 Universiteit Gent - Geologisch Instituut
boorgatmeting diepte_methode_van diepte_methode_tot boormethode
0 False 0.0 3.3 onbekend
1 False 0.0 7.0 droge boring
[9]:
# list number of files
print('number of files: ', len(os.listdir(os.path.join(cachedir, 'boring'))))
number of files: 736
Hence, no new files were added to the cache when disabling it.
The caching is disabled by removing the pydov.cache object from the namespace. If you want to enable caching again you must instantiate it anew.
[10]:
pydov.cache = cache_orig
Changing the location of cached data¶
By default, pydov stores the cache in a temporary directory provided by the user’s operating system. On Windows, the cache is usually located in: C:\Users\username\AppData\Local\Temp\pydov\
If you want the cached xml files to be saved in another location you can define your own cache for the current runtime. Mind that this does not change the location of previously saved data. No lookup in the old datafolder will be performed after changing the directory’s location. Besides controlling the
cache’s location, this also allows using different scripts or projects.
[11]:
import pydov.util.caching
pydov.cache = pydov.util.caching.GzipTextFileCache(
cachedir=r'C:\temp\pydov'
)
[12]:
cachedir = pydov.cache.cachedir
print(cachedir)
C:\temp\pydov
[13]:
# for the sake of the example, change dir location back
pydov.cache = cache_orig
cachedir = pydov.cache.cachedir
Changing the maximum age of cached data¶
If you work with rapidly changing data or want to control when cached data is renewed, you can do so by changing the maximum age of cached data to be considered valid for the currenct runtime. You can use ‘weeks’, ‘days’ or any other common datetime format. If a cached version exists and is younger than the maximum age, it is used in favor of renewing the data from DOV services. If no cached version exists or is older than the maximum age, the data is renewed and saved in the cache. Note that data older than the maximum age is not automatically deleted from the cache.
[14]:
import pydov.util.caching
import datetime
pydov.cache = pydov.util.caching.GzipTextFileCache(
max_age=datetime.timedelta(seconds=1)
)
print(pydov.cache.max_age)
0:00:01
[15]:
from time import ctime
print(os.listdir(os.path.join(cachedir, 'boring'))[0])
ctime(os.path.getmtime(os.path.join(os.path.join(cachedir, 'boring'),
os.listdir(os.path.join(cachedir, 'boring'))[0]
)
)
)
2023-206524.xml.gz
[15]:
'Tue Oct 3 14:28:24 2023'
[16]:
# rerun previous query
%time df = boring.search(location=Within(Box(150145, 205030, 155150, 206935)))
[000/001] .
[000/255] ..................................................
[050/255] ..................................................
[100/255] ..................................................
[150/255] ..................................................
[200/255] ..................................................
[250/255] .....
CPU times: user 2.54 s, sys: 348 ms, total: 2.89 s
Wall time: 35.5 s
[17]:
from time import ctime
print(os.listdir(os.path.join(cachedir, 'boring'))[0])
ctime(os.path.getmtime(os.path.join(os.path.join(cachedir, 'boring'),
os.listdir(os.path.join(cachedir, 'boring'))[0]
)
)
)
2023-206524.xml.gz
[17]:
'Tue Oct 3 14:28:24 2023'
Cleaning the cache¶
Since we use a temporary directory provided by the operating system, we rely on the operating system to clean the folder when it deems necessary.
To clean the cache, removing all records older than the maximum age
[18]:
from time import sleep
[19]:
print('number of files before clean: ', len(os.listdir(os.path.join(cachedir, 'boring'))))
sleep(2) # remember we've put the caching age on 1 second
pydov.cache.clean()
print('number of files after clean: ', len(os.listdir(os.path.join(cachedir, 'boring'))))
number of files before clean: 736
number of files after clean: 0
Should you want to remove the pydov cache from code yourself, you can do so as illustrated below. Note that this will erase the entire cache, not only the records older than the maximum age:
[20]:
pydov.cache.remove()
# check existence of the cache directory:
print(os.path.exists(os.path.join(cachedir, 'boring')))
False
Disabling stale responses on error¶
By default, pydov will return stale data (i.e. XML documents still present in the cache, but no longer considered valid) in case it fails to download a fresh copy from the DOV webservices. We believe this behaviour to benefit most users, as we think stale data is still better than no data at all.
If your application cannot afford stale data, you can switch the default behaviour by issuing:
[21]:
pydov.cache.stale_on_error = False
This will cause pydov not to return stale data and instead set the XML fields to NaN, as if the stale data wasn’t available.
Custom caching¶
[22]:
import pydov.util.caching
pydov.cache = pydov.util.caching.PlainTextFileCache()
Implementing custom caching¶
Should you want to implement your own caching mechanism, you can do so by subclassing :class:pydov.util.caching.AbstractCache
and implementing its abstract methods get
, clean
and remove
. Hereby you can use the available methods _get_remote
to request data from the DOV webservices and _emit_cache_hit
to notify hooks a file has been retrieved from the cache.
Note that the get
method will be called from multiple threads simultaneously, so implementations must be threadsafe or use locking.
A (naive) implementation for an in-memory cache would be something like:
[23]:
from pydov.util.caching import AbstractCache
class MemoryCache(AbstractCache):
def __init__(self):
self.cache = {}
def get(self, url):
if url not in self.cache:
self.cache[url] = self._get_remote(url)
else:
self._emit_cache_hit(url)
return self.cache[url]
def clean(self):
self.cache = {}
def remove(self):
self.cache = {}
pydov.cache = MemoryCache()