Extract one or more comparisons for inserting into text.
Source:R/compare_population.R
group_comparison.Rd
At some point we need to take information from the tables produced by
tableone
and place it into the main text of the document. It is annoying
if this cannot be done automatically. the group_comparison()
function enables
extraction of one or more head to head comparisons and provides a fairly
flexible mechanism for building the precise format desired.
Usage
group_comparison(
t1_signif,
variable = NULL,
subgroup = NULL,
intervention = NULL,
percent_fmt = "%1.1f%%",
p_format = names(.pvalue.defaults),
no_summary = FALSE,
summary_glue = NULL,
summary_arrange = NULL,
summary_sep = ", ",
summary_last = " versus ",
no_signif = FALSE,
signif_glue = NULL,
signif_sep = NULL,
signif_last = NULL
)
Arguments
- t1_signif
a
t1_signif
as produced byas_t1_signif()
orcompare_population(..., raw_output = TRUE)
.- variable
a variable or set of variables to compare. If missing a set of approriate values is displayed based on the columns of
t1_signif
- subgroup
a subgroup or set of subgroups to compare.
- intervention
the side or sides of the intervention to select. N.b. using this effectively prevents any statistical comparison as only one side will be available.
- percent_fmt
a
sprintf
format string that is applied to probability fields in the summary data to convert to percentages.- p_format
the format of the p-values: one of "sampl", "nejm", "jama", "lancet", "aim" but any value here is overridden by the
option("tableone.pvalue_formatter"=function(...))
- no_summary
only extract significance test values
- summary_glue
a glue specification that maps the summary statistics to a readable string.
- summary_arrange
an expression by which to order the summary output
- summary_sep
a separator to combine the summary output (see
glue::glue_collapse()
)- summary_last
a separator to combine the last 2 summary outputs (see
glue::glue_collapse()
)- no_signif
do not try and include significance in the output. Sometimes this is the only option if there is not enough of the comparison to retained by the
variable
,subgroup
, andintervention
filters. (Specifically if there is only a comparison between different subgroups, as the p-values will be for the different comparison between intervention groups.)- signif_glue
a glue specification that maps the combined summary output with the result of the significance tests, to given a complete comparison.
- signif_sep
a separator to combine complete comparisons (see
glue::glue_collapse()
)- signif_last
a separator to combine the last 2 complete comparisons (see
glue::glue_collapse()
)
Value
ideally a single string but various things will be returned depending on hos much input is constrained, and sometimes will provide guidance about what next to do. The intention is the function to be used interactively until a satisfactory result is obtained.
Examples
tmp = diamonds %>%
dplyr::group_by(is_colored) %>%
set_units(price,units="£") %>%
compare_population(-color, raw_output=TRUE)
#> median_iqr summary for carat
#> subtype_count summary for cut
#> subtype_count summary for clarity
#> median_iqr summary for depth
#> median_iqr summary for table
#> median_iqr summary for price
#> median_iqr summary for x
#> median_iqr summary for y
#> median_iqr summary for z
#> 2-sided wilcoxon test on carat
#> chi-sq trend test on cut
#> chi-sq trend test on clarity
#> 2-sided wilcoxon test on depth
#> 2-sided wilcoxon test on table
#> 2-sided wilcoxon test on price
#> 2-sided wilcoxon test on x
#> 2-sided wilcoxon test on y
#> 2-sided wilcoxon test on z
# The tabular output is retrieved by converting to a huxtable
# as_huxtable(tmp, layout="simple")
# An unqualified group_comparison call gives informative messages
# about what can be compared:
tmp %>% group_comparison()
#> * `variable` can be any of:
#> `carat`,`cut`,`clarity`,`depth`,`table`,`price`,`x`,`y`,`z`
#> * `subgroup` can be any of:
#> `Fair`,`Good`,`Very Good`,`Premium`,`Ideal`,`I1`,`SI2`,`SI1`,`VS2`,`VS1`,`VVS2`,`VVS1`,`IF`
#> * `intervention` can be any of:
#> `clear`,`colored`
#> * `summary_glue` can use any of the following variables:
#> `variable`, `unit`, `is_colored`, `q.0.025`, `q.0.05`, `q.0.25`, `q.0.5`, `q.0.75`, `q.0.95`, `q.0.975`, `n`, `N`, `.order2`, `subgroup`, `x`, `prob.0.5`, `prob.0.025`, `prob.0.975`, `N_total`, `N_present`
# filtering down the data gets us to a specific comparison:
tmp %>% group_comparison(variable = "cut", subgroup="Fair") %>% dplyr::glimpse()
#> * `intervention` can be any of:
#> `clear`,`colored`
#> * `summary_glue` can use any of the following variables:
#> `variable`, `unit`, `is_colored`, `subgroup`, `n`, `x`, `prob.0.5`, `prob.0.025`, `prob.0.975`, `.order2`, `N`, `N_total`, `N_present`
#> Rows: 2
#> Columns: 13
#> Groups: variable, subgroup, is_colored [2]
#> $ variable <chr> "cut", "cut"
#> $ unit <chr> "", ""
#> $ is_colored <fct> clear, colored
#> $ subgroup <fct> Fair, Fair
#> $ n <int> 16572, 37368
#> $ x <int> 387, 1223
#> $ prob.0.5 <chr> "2.3%", "3.3%"
#> $ prob.0.025 <chr> "2.1%", "3.1%"
#> $ prob.0.975 <chr> "2.6%", "3.5%"
#> $ .order2 <int> 1, 1
#> $ N <int> 16572, 37368
#> $ N_total <int> 53940, 53940
#> $ N_present <int> 53940, 53940
# With further interactive exploration the
# data available for that comparison can be made into a glue string
tmp %>% group_comparison(variable = "cut", subgroup="Fair", intervention = "clear",
summary_glue = "{is_colored}: {x}/{n} ({prob.0.5}%)",
signif_glue = "{variable}={subgroup}; {text}; Overall p-value for '{variable}': {p.value}.")
#> [1] "clear: 387/16572 (2.3%%)"
# group comparisons above using many individual subgroups are a bit confusing because
# the p-value is at the variable level. This is less of an issue for continuous
# or binary values.
tmp %>% group_comparison(
variable = "price",
summary_glue = "{is_colored}: {unit}{q.0.5}; IQR: {q.0.25} \u2014 {q.0.75} (n={n})",
signif_glue = "{variable}: {text}; P-value {p.value}.")
#> * `intervention` can be any of:
#> `clear`,`colored`
#> price: clear: £1781; IQR: 894.75 — 4092 (n=16572) versus colored: £2777.5; IQR: 990 — 6006.25 (n=37368); P-value <0.001.
# Sometimes we only want to extract a p-value:
tmp %>%
group_comparison(variable = "cut", subgroup="Fair", no_summary=TRUE) %>%
dplyr::glimpse()
#> * `intervention` can be any of:
#> `clear`,`colored`
#> * `signif_glue` can use any of the following variables:
#> `variable`, `p.value`, `p.method`
#> Rows: 1
#> Columns: 3
#> Groups: variable [1]
#> $ variable <chr> "cut"
#> $ p.value <chr> "0.6"
#> $ p.method <chr> "Chi-squared Test for Trend in Proportions"