SurvStat can be queried for count or incidence. From the combination of
these metrics queried across the whole range of disease notifications for any
given year we can infer a stratified population size, that SurvStat is using
to calculate it's incidence. This is simply modelled with a local polynomial
over time to allow us to fill in weekly population denominators.
Usage
fit_population(count_df, .progress = TRUE)
infer_population(
age_group = NULL,
geography = NULL,
years = NULL,
.progress = TRUE
)Arguments
- count_df
a dataframe from the output of
get_timeseries()orget_snapshot()- .progress
by default a progress bar is shown, which may be important if many downloads are needed to fulfil the request. It can be disabled by setting this to
FALSEhere.- age_group
(optional) the age group of interest as a
SurvStatkey, seersurvstat::age_groupsfor a list of valid options.- geography
(optional) one of
"state","nuts", or"county"to define the resolution of the query. Does not accept asfmap or subset of (unlikeget_timeseries()).- years
(optional) a vector of years to limit the response to. This may be useful to limit the size of returned pages in the event the
SurvStatservice hits a data transfer limit.
Value
the count_df dataframe with an additional population column
a dataframe with geography, age grouping, year and population columns
Examples
# \donttest{
# snapshot:
get_snapshot(
disease = diseases$`COVID-19`,
geography = "state",
season=2024
) %>%
fit_population() %>%
dplyr::glimpse()
#> Rows: 16
#> Columns: 8
#> $ count <dbl> 35131, 55904, 11793, 14524, 2034, 7622, 22942, 8962, 2856…
#> $ geo_name <chr> "Baden-Württemberg", "Bayern", "Berlin", "Brandenburg", "…
#> $ geo_code <chr> "[DeutschlandNodes].[Kreise71Web].[FedStateKey71].&[08]",…
#> $ year <dbl> 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024, 202…
#> $ start_week <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
#> $ disease_name <chr> "COVID-19", "COVID-19", "COVID-19", "COVID-19", "COVID-19…
#> $ disease_code <chr> "[KategorieNz].[Krankheit DE].&[COVID-19]", "[KategorieNz…
#> $ population <dbl> 11245936.7, 13248882.4, 3685260.2, 2556744.0, 704877.2, 1…
# timeseries
# A weekly population estimate is inferred from the yearly data:
get_timeseries(
diseases$`COVID-19`,
measure = "Count",
age_group = age_groups$children_coarse
) %>%
fit_population() %>%
dplyr::glimpse()
#> Rows: 3,594
#> Columns: 9
#> Groups: age_name, age_code, disease_name, disease_code, age_low, age_high [10]
#> $ age_name <chr> "0–14", "0–14", "0–14", "0–14", "0–14", "0–14", "0–14", "…
#> $ age_code <chr> "[AlterPerson80].[AgeGroupName3].&[A00..14]", "[AlterPers…
#> $ disease_name <chr> "COVID-19", "COVID-19", "COVID-19", "COVID-19", "COVID-19…
#> $ disease_code <chr> "[KategorieNz].[Krankheit DE].&[COVID-19]", "[KategorieNz…
#> $ age_low <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ age_high <dbl> 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 1…
#> $ date <date> 2020-02-03, 2020-02-10, 2020-02-17, 2020-02-24, 2020-03-…
#> $ count <dbl> 1, 2, 1, 4, 39, 197, 580, 910, 1018, 813, 575, 531, 401, …
#> $ population <dbl> 11458705, 11459860, 11460989, 11462092, 11463170, 1146422…
# }
# \donttest{
infer_population(years=2020:2025) %>% dplyr::glimpse()
#> Rows: 6
#> Columns: 2
#> $ population <dbl> 83576914, 83577199, 84669145, 84358845, 83237155, 83155010
#> $ year <int> 2025, 2024, 2023, 2022, 2021, 2020
# }