Summarise data from a line list to a time-series of counts.
Source:R/time-summarise.R
time_summarise.Rd
This principally is designed to take a record of single events and produce a summary time-series count of events by group, class and date. The default behaviour is to guess the cadence of the input data and summarise the event line list to a (set of) regular time-series counts for use in incidence and growth rate estimates.
Usage
time_summarise(
df = i_dated,
unit,
anchor = "start",
rectangular = FALSE,
...,
.fill = list(count = 0)
)
Arguments
- df
a line list of data you want to summarise, optionally grouped. If this is grouped then each group is treated independently. The remaining columns must contain a
date
column and may contain aclass
column. If acount
column is present the counts will be summed, otherwise each individual row will be counted as a single event (as a linelist)- unit
a period e.g. "1 week"
- anchor
one of a date, "start" or "end" or a weekday name e.g. "mon" this will always be one of the start of the time periods we are cutting into
- rectangular
should the resulting time series be the same length for all groups. This is only the case if you can be sure that your data is complete for all subgroups, otherwise missing data will be treated as zero counts. This is important if leading and trailing missing data in one subgroup can be due to a reporting delay in that subgroup, in which case a rectangular time series will erroneously fill in zero counts for this missing data.
- ...
a spec for a dplyr::summary(...) - optional, and if not provided a
count = dplyr::n()
or acount = sum(count)
is performed.- .fill
a list similar to tidyr::complete for values to fill variables with
Value
The output depends on whether or not the input was grouped and
had a class
column. The most detailed output will be:
A dataframe containing the following columns:
denom (positive_integer) - Total test counts associated with the specified timeframe
count (positive_integer) - Positive case counts associated with the specified timeframe
time (as.time_period + group_unique) - A (usually complete) set of singular observations per unit time as a
time_period
No mandatory groupings.
No default value.
or a more minimal output if the input is only a plain list of dated events:
A dataframe containing the following columns:
count (positive_integer) - Positive case counts associated with the specified timeframe
time (as.time_period + group_unique) - A (usually complete) set of singular observations per unit time as a
time_period
No mandatory groupings.
No default value.
Details
If the data is given with a class
column the time series are interpreted as
having a denominator, consisting of all the different classes within a time period.
This may be subtypes (e.g. variants, serotypes) or markers for test positivity.
In either case the resulting time series will have counts for all classes and
denominators for the combination.
There is flexibility for other kinds of summarisation if the raw data is not
count based (e.g. means of continuous variables) but in this case a the slider
package is usually going to be better, as time summarise will only look at non
overlapping time periods with fixed lengths.
There is another use case where an existing timeseries on a particular
frequency is aggregated to another less frequent basis (e.g. moving from a
daily timeseries to a weekly one). In this case the input will contain a
count
column. In this mode no checks are made that the more frequent events
are all present before summarisation so the result may include different numbers
of input periods (e.g. going from weeks to months may be 4 or 5 weeks in each
month)