Retrieve time series data from the SurvStat web service.
Source: R/survstat-loader.R
get_timeseries.RdThis function gets a weekly timeseries of disease count or incidence data
from the Robert Koch Institute SurvStat web service. The timeseries can be
stratified by any combination of age, geography, disease, disease subtype.
Queries to SurvStat are cached and paged, but obviously multidimensional
extracts have the potential to need a lot of downloading.
Arguments
- disease
the disease of interest as a
SurvStatkey, seersurvstat::diseasesfor a current list of these. This is technically optional, and if omitted the counts of all diseases will be returned. Keys are the same as the options in theSurvStatuser interface found here.IfSGandstatevariants of diseases are counts that are reported directly to the Robert Koch Institute or indirectly via state departments.- measure
one of
"Count"(default) or"Incidence"per 100,000 per week or year depending on the context.- ...
not used, must be empty.
- age_group
(optional) the age group of interest as a
SurvStatkey, seersurvstat::age_groupsfor a list of valid options.- age_range
(optional) a length 2 vector with the minimum and maximum ages to consider
- disease_subtype
if
TRUEthe returned count will be broken down by disease or pathogen subtype (assumingdiseasewas provided).- years
(optional) a vector of years to limit the response to. This may be useful to limit the size of returned pages in the event the
SurvStatservice hits a data transfer limit.- geography
(optional) a geographical breakdown. This can be given as a character where it must be one of
state,nuts, orcountyspecifying the 16 regionFedStateKey71Map, 38 regionNutsKey71Map, or 411 regionCountyKey71Mapdata respectively. Alternatively it can be given as a as asfdataframe, subsetting one of these maps, in which case only that subset of regions will be returned.- trim_zeros
get rid of zero counts. Either "both" (from start and end), "leading" (from start only - the default) or "none".
- .progress
by default a progress bar is shown, which may be important if many downloads are needed to fulfil the request. It can be disabled by setting this to
FALSEhere.
Value
a data frame with at least date (weekly), and one of count or
incidence columns. Most likely it will also have disease_name and
disease_code columns, and some of age_name, age_code, age_low,
age_high, geo_code, geo_name, disease_subtype_code,
disease_subtype_name depending on options. The dataframe will be grouped
to make sure each group contains a single timeseries.
Examples
# \donttest{
# age stratified
get_timeseries(
diseases$`COVID-19`,
measure = "Count",
age_group = age_groups$children_coarse
) %>% dplyr::glimpse()
#> Rows: 3,948
#> Columns: 8
#> Groups: age_name, age_code, disease_name, disease_code, age_low, age_high [11]
#> $ age_name <chr> "0–14", "0–14", "0–14", "0–14", "0–14", "0–14", "0–14", "…
#> $ age_code <chr> "[AlterPerson80].[AgeGroupName3].&[A00..14]", "[AlterPers…
#> $ count <dbl> 1, 2, 1, 4, 39, 197, 580, 910, 1018, 813, 575, 531, 401, …
#> $ date <date> 2020-02-03, 2020-02-10, 2020-02-17, 2020-02-24, 2020-03-…
#> $ disease_name <chr> "COVID-19", "COVID-19", "COVID-19", "COVID-19", "COVID-19…
#> $ disease_code <chr> "[KategorieNz].[Krankheit DE].&[COVID-19]", "[KategorieNz…
#> $ age_low <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ age_high <dbl> 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 1…
# geographic
get_timeseries(
diseases$`COVID-19`,
measure = "Count",
geography = "state"
) %>% dplyr::glimpse()
#> Rows: 5,715
#> Columns: 6
#> Groups: geo_name, geo_code, disease_name, disease_code [16]
#> $ geo_name <chr> "Baden-Württemberg", "Baden-Württemberg", "Baden-Württemb…
#> $ geo_code <chr> "[DeutschlandNodes].[Kreise71Web].[FedStateKey71].&[08]",…
#> $ count <dbl> 21, 213, 1337, 4467, 7238, 7523, 5305, 3210, 2438, 1229, …
#> $ date <date> 2020-02-24, 2020-03-02, 2020-03-09, 2020-03-16, 2020-03-…
#> $ disease_name <chr> "COVID-19", "COVID-19", "COVID-19", "COVID-19", "COVID-19…
#> $ disease_code <chr> "[KategorieNz].[Krankheit DE].&[COVID-19]", "[KategorieNz…
# disease stratified, subset of years:
get_timeseries(
measure = "Count",
years = 2024
) %>% dplyr::glimpse()
#> Rows: 3,861
#> Columns: 5
#> Groups: disease_name, disease_code [80]
#> $ disease_name <chr> "Acinetobacter", "Acinetobacter", "Acinetobacter", "Acine…
#> $ disease_code <chr> "[PathogenOut].[KategorieNz].[Krankheit DE].&[Acinetobact…
#> $ count <dbl> 16, 22, 26, 16, 14, 14, 16, 9, 14, 11, 18, 9, 13, 13, 14,…
#> $ date <date> 2024-01-08, 2024-01-15, 2024-01-22, 2024-01-29, 2024-02-…
#> $ year <dbl> 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024, 202…
# }