This function helps construct group wise cross-correlation matrices and other between column
comparisons from a dataframe. We assume we have a data with a major grouping
and then data columns we wish to compare to each other. We specify the
columns to compare to each other as a formula or as a tidyselect using a var_grp_df
and using this we use these a set of columns to compare.
Arguments
- var_grp_df
a data frame with major and data groupings
- ...
a set of named functions. The functions must take 2 vectors of the type of the columns being compared and generate a single result (which may be a complex S3 object such as a
lm
). Such functions might be for example bechisq.test
for factor columns orcor
for numeric columns.- .diagonal
should a column be compared with itself? this is usually
FALSE
Value
a dataframe containing the major z
groupings and unique binary
combinations of y
and x
columnsas y
and x
columns. The named
comparisons provided in ...
form the other columns. If these are not
primitive types this will be a list column.
Details
Although the examples here are functional we generally expect these to be wrapped
within a function within a package where the comparisons are pre-defined, and
the var_group
framework is hidden from the user.
Examples
iris %>% dplyr::group_by(Species) %>% var_group(~ .) %>%
var_group_compare(
correlation = cor
)
#> 3 group(s): Species.
#> (subgroup) y ~ x + correlation (data)
#> # A tibble: 36 × 4
#> Species y x correlation
#> * <fct> <chr> <chr> <dbl>
#> 1 setosa Petal.Length Petal.Width 0.332
#> 2 setosa Petal.Length Sepal.Length 0.267
#> 3 setosa Petal.Length Sepal.Width 0.178
#> 4 setosa Petal.Width Petal.Length 0.332
#> 5 setosa Petal.Width Sepal.Length 0.278
#> 6 setosa Petal.Width Sepal.Width 0.233
#> 7 setosa Sepal.Length Petal.Length 0.267
#> 8 setosa Sepal.Length Petal.Width 0.278
#> 9 setosa Sepal.Length Sepal.Width 0.743
#> 10 setosa Sepal.Width Petal.Length 0.178
#> # ℹ 26 more rows
ggplot2::diamonds %>% var_group(tidyselect::where(is.factor)) %>%
var_group_compare(
chi.p.value = ~ stats::chisq.test(.x,.y)$p.value
)
#> 1 group(s): .
#> (subgroup) y ~ x + chi.p.value (data)
#> # A tibble: 6 × 3
#> y x chi.p.value
#> * <chr> <chr> <dbl>
#> 1 clarity color 0
#> 2 clarity cut 0
#> 3 color clarity 0
#> 4 color cut 1.39e-51
#> 5 cut clarity 0
#> 6 cut color 1.39e-51