Formula versus tidyselect
interface
tableone
is implemented to allow a subset of columns in
a large dataset to be pulled into a table without any fuss. It is also
designed with a workflow in mind that involves building statistical
models from the data later. We assume the data follows a general pattern
in that there are one observation per row, individual columns are
specific data points in those observations and may be one of:
-
outcome
: something that we will be assessing in a statistical model, maybe a continuous outcome, or a time measure, or a logical measure. -
intervention
: the thing that is varied between the different observations -
covariates
: the other factors that may influence the outcome that we want to control for.
In the end we will want to construct a model that takes the following high level structure:
outcome ~ intervention + covariate_1 + covariate_2 + ... + covariate_n
Simple population description example
Before we build a model we need to firstly compare the distribution
of the covariates in the population and secondly compare them in the
intervention and non-intervention groups, usually done without reference
to outcome. To demonstrate this we are using the
survival::cgd
data set.
cgd = survival::cgd %>%
# filter to include only the first visit
dplyr::filter(enum==1) %>%
# make the steroids and propylac columns into a logical value
# see later for a better way of doing this.
dplyr::mutate(
steroids = as.logical(steroids),
propylac = as.logical(propylac)
)
# A basic unstratified population description table is as follows:
cgd %>% describe_population(tidyselect::everything())
#> Warning: Unknown or uninitialised column: `level`.
Variable | Characteristic | Value | Count (N=128) |
---|---|---|---|
id | Median [IQR] | 64.5 [32.8—96.2] | 128 |
center | Harvard Medical Sch % [95% CI] | 3.1% [1.2%—7.8%] | 4/128 |
Scripps Institute % [95% CI] | 12.5% [7.8%—19.3%] | 16/128 | |
Copenhagen % [95% CI] | 3.1% [1.2%—7.8%] | 4/128 | |
NIH % [95% CI] | 20.3% [14.3%—28.1%] | 26/128 | |
L.A. Children's Hosp % [95% CI] | 6.2% [3.2%—11.8%] | 8/128 | |
Mott Children's Hosp % [95% CI] | 7.0% [3.7%—12.8%] | 9/128 | |
Univ. of Utah % [95% CI] | 3.1% [1.2%—7.8%] | 4/128 | |
Univ. of Washington % [95% CI] | 3.1% [1.2%—7.8%] | 4/128 | |
Univ. of Minnesota % [95% CI] | 4.7% [2.2%—9.8%] | 6/128 | |
Univ. of Zurich % [95% CI] | 12.5% [7.8%—19.3%] | 16/128 | |
Texas Children's Hosp % [95% CI] | 6.2% [3.2%—11.8%] | 8/128 | |
Amsterdam % [95% CI] | 14.8% [9.7%—22.0%] | 19/128 | |
Mt. Sinai Medical Ctr % [95% CI] | 3.1% [1.2%—7.8%] | 4/128 | |
random | — | 128 | |
treat | placebo % [95% CI] | 50.8% [42.2%—59.3%] | 65/128 |
rIFN-g % [95% CI] | 49.2% [40.7%—57.8%] | 63/128 | |
sex | male % [95% CI] | 81.2% [73.6%—87.1%] | 104/128 |
female % [95% CI] | 18.8% [12.9%—26.4%] | 24/128 | |
age | Median [IQR] | 12 [7—22] | 128 |
height | Median [IQR] | 141 [116—170] | 128 |
weight | Median [IQR] | 34.8 [20.7—59.2] | 128 |
inherit | X-linked % [95% CI] | 67.2% [58.7%—74.7%] | 86/128 |
autosomal % [95% CI] | 32.8% [25.3%—41.3%] | 42/128 | |
steroids | false % [95% CI] | 97.7% [93.3%—99.2%] | 125/128 |
true % [95% CI] | 2.3% [0.8%—6.7%] | 3/128 | |
propylac | false % [95% CI] | 13.3% [8.5%—20.2%] | 17/128 |
true % [95% CI] | 86.7% [79.8%—91.5%] | 111/128 | |
hos.cat | US:NIH % [95% CI] | 20.3% [14.3%—28.1%] | 26/128 |
US:other % [95% CI] | 49.2% [40.7%—57.8%] | 63/128 | |
Europe:Amsterdam % [95% CI] | 14.8% [9.7%—22.0%] | 19/128 | |
Europe:other % [95% CI] | 15.6% [10.3%—22.9%] | 20/128 | |
tstart | Median [IQR] | 0 [0—0] | 128 |
enum | Median [IQR] | 1 [1—1] | 128 |
tstop | Median [IQR] | 269 [197—304] | 128 |
status | Median [IQR] | 0 [0—1] | 128 |
Normal distributions determined by the Anderson-Darling test (P>0.005) |
This could have been specified using the formula interface. In this
example we have taken an example of the formula we might wish to use for
a survival model and we reuse it to give us a more targetted descriptive
table. It is also possible to supply tableone
with a
relabelling function that maps column names to printable labels, as
demonstrated here:
# define a formula - this might be reused in model building later
formula = Surv(tstart, tstop, status) ~ treat +
sex + age + height + weight + inherit + steroids + hos.cat
# set a table relabelling function
rename_cols = function(col) {
dplyr::case_when(
col == "hos.cat" ~ "Location",
col == "steroids" ~ "Steroid treatment",
TRUE ~ stringr::str_to_sentence(col)
)
}
options("tableone.labeller"=rename_cols)
# create a simple description
cgd %>% describe_population(formula)
Variable | Characteristic | Value | Count (N=128) |
---|---|---|---|
Treat | placebo % [95% CI] | 50.8% [42.2%—59.3%] | 65/128 |
rIFN-g % [95% CI] | 49.2% [40.7%—57.8%] | 63/128 | |
Sex | male % [95% CI] | 81.2% [73.6%—87.1%] | 104/128 |
female % [95% CI] | 18.8% [12.9%—26.4%] | 24/128 | |
Age | Median [IQR] | 12 [7—22] | 128 |
Height | Median [IQR] | 141 [116—170] | 128 |
Weight | Median [IQR] | 34.8 [20.7—59.2] | 128 |
Inherit | X-linked % [95% CI] | 67.2% [58.7%—74.7%] | 86/128 |
autosomal % [95% CI] | 32.8% [25.3%—41.3%] | 42/128 | |
Steroid treatment | false % [95% CI] | 97.7% [93.3%—99.2%] | 125/128 |
true % [95% CI] | 2.3% [0.8%—6.7%] | 3/128 | |
Location | US:NIH % [95% CI] | 20.3% [14.3%—28.1%] | 26/128 |
US:other % [95% CI] | 49.2% [40.7%—57.8%] | 63/128 | |
Europe:Amsterdam % [95% CI] | 14.8% [9.7%—22.0%] | 19/128 | |
Europe:other % [95% CI] | 15.6% [10.3%—22.9%] | 20/128 | |
Normal distributions determined by the Anderson-Darling test (P>0.005) |
The relabelling function can either be passed to each invocation of
tableone
functions or as an option as shown here, which
makes the labeller available to all subsequent calls. This is useful if
you are generating many tables from a single dataset.
We will generally use the formula interface from here on but for
exploration of larger datasets with more covariates the
tidyselect
interface may be more useful.
Comparing the population by intervention
In this example a more useful table compares the treatment groups. We
can use the same formula syntax for this, but in this case the first
predictor is assumed to be the intervention and the data set is compared
by intervention (in this case the treat
column). From this
we can conclude that the population is well distributed between placebo
and treatment groups and there is no major bias in the randomisation
process:
# same as above
formula = Surv(tstart, tstop, status) ~ treat +
sex + age + height + weight + inherit + steroids + hos.cat
# labelling function is still active
cgd %>% compare_population(formula)
placebo | rIFN-g | |||
---|---|---|---|---|
Variable | Characteristic | Value (N=65) | Value (N=63) | P value |
Sex | male % [95% CI] (n) | 81.5% [70.4%—89.1%] (53) | 81.0% [69.6%—88.8%] (51) | 1 † |
female % [95% CI] (n) | 18.5% [10.9%—29.6%] (12) | 19.0% [11.2%—30.4%] (12) | ||
Age | Median [IQR] | 14 [7—24] | 12 [7—19.5] | 0.56 †† |
Height | Median [IQR] | 143 [115—171] | 139 [119—167] | 0.45 ††† |
Weight | Median [IQR] | 36.1 [21.6—63.7] | 34.4 [20.6—53.7] | 0.4 ††† |
Inherit | X-linked % [95% CI] (n) | 63.1% [50.9%—73.8%] (41) | 71.4% [59.3%—81.1%] (45) | 0.35 † |
autosomal % [95% CI] (n) | 36.9% [26.2%—49.1%] (24) | 28.6% [18.9%—40.7%] (18) | ||
Steroid treatment | false % [95% CI] (n) | 96.9% [89.5%—99.2%] (63) | 98.4% [91.5%—99.7%] (62) | 1 † |
true % [95% CI] (n) | 3.1% [0.8%—10.5%] (2) | 1.6% [0.3%—8.5%] (1) | ||
Location | US:NIH % [95% CI] (n) | 16.9% [9.7%—27.8%] (11) | 23.8% [15.0%—35.6%] (15) | 0.7 † |
US:other % [95% CI] (n) | 49.2% [37.5%—61.1%] (32) | 49.2% [37.3%—61.2%] (31) | ||
Europe:Amsterdam % [95% CI] (n) | 15.4% [8.6%—26.1%] (10) | 14.3% [7.7%—25.0%] (9) | ||
Europe:other % [95% CI] (n) | 18.5% [10.9%—29.6%] (12) | 12.7% [6.6%—23.1%] (8) | ||
†, Fisher's exact test (categorical); ††, 2 sample Wilcoxon Rank Sum test (continuous); †††, 2 sample Kolmogorov-Smirnov test (continuous) Normal distributions determined by the Anderson-Darling test (P>0.005) An adjusted P value of 0.00714 may be considered significant. |
Alternatively if we were using the tidyselect
interface
this alternate syntax would have given us the same table. Note that we
must group the data by intervention, for the tidyselect
to
work as intended:
cgd %>% dplyr::group_by(treat) %>%
compare_population(sex,age,height,weight,inherit,steroids,hos.cat)
Analysis of missing data
We need to make sure that not only is the data equivalent between the intervention groups but also that missing data is not unevenly distributed or excessive. Reporting on the frequency of missing data stratified by intervention is also easy, and to demonstrate this we make a data set with 10% of the placebo arm having missing values, but 25% of the treatment arm:
# generate a dataset with values missing not at random compared to the intervention:
cgd_treat = cgd %>% dplyr::mutate(treat = as.character(treat)) %>% dplyr::filter(treat != "placebo")
cgd_placebo = cgd %>% dplyr::mutate(treat = as.character(treat)) %>% dplyr::filter(treat == "placebo")
set.seed(100)
mnar_cgd = dplyr::bind_rows(
cgd_placebo %>% .make_missing(p_missing = 0.1),
cgd_treat %>% .make_missing(p_missing = 0.25)
)
Comparing this new data set we see that there is significant
differences in some of the data (but not the steroids
variable). As this is quite a small dataset it is not sufficiently
powered to reliably detect the difference in missingness at this level
(15% difference).
# compare the MNAR dataset against the intervention:
formula = Surv(tstart, tstop, status) ~ treat +
sex + age + height + weight + inherit + steroids + hos.cat
mnar_cgd %>% compare_missing(formula)
placebo | rIFN-g | ||
---|---|---|---|
variable | missing % (N) | missing % (N) | P value |
Sex | 12.3% (8/65) | 23.8% (15/63) | 0.11 |
Age | 4.6% (3/65) | 23.8% (15/63) | 0.002 |
Height | 6.2% (4/65) | 20.6% (13/63) | 0.019 |
Weight | 10.8% (7/65) | 23.8% (15/63) | 0.062 |
Inherit | 12.3% (8/65) | 28.6% (18/63) | 0.028 |
Steroid treatment | 13.8% (9/65) | 28.6% (18/63) | 0.052 |
Location | 9.2% (6/65) | 19.0% (12/63) | 0.13 |
More than 10% of data is missing for variables Sex, Age, Height, Weight, Inherit, Steroid treatment, Location. Data is missing not at random (compared to Treat) at a p-value<0.007 (0.05 over 7 comparisons) for variables Age. |
with this analysis it is useful to be able to update the analysis formula removing the variables with missing data so that we are confident the models are based on reasonable data.
# formula can also be a list of formulae
new_formula = mnar_cgd %>% tableone::remove_missing(formula)
#> More than 10% of data is missing for variables Sex, Age, Height, Weight, Inherit, Steroid treatment, Location.
#> Data is missing not at random (compared to Treat) at a p-value<0.007 (0.05 over 7 comparisons) for variables Age.
print(new_formula)
#> [[1]]
#> Surv(tstart, tstop, status) ~ treat
Conversion of discrete data
Using this new data set with missing data it may be necessary to discretise some or all of the data, or convert logical values into properly named factors.
decade = function(x) sprintf("%d-%d",x-(x%%10),x-(x%%10)+9)
discrete_cgd = mnar_cgd %>%
# pick out the first episode
dplyr::filter(enum == 1) %>%
# convert data
make_factors(
steroids,propylac,age,weight,height,
.logical = c("received","not received"),
.numeric = list(
age="{decade(value)}",
weight="{ifelse(value<20,'<20','20+')}",
height="{ifelse(value<mean(value, na.rm=TRUE),'below average','above average')}"
)
)
formula = Surv(tstart, tstop, status) ~ treat +
sex + age + height + weight + inherit + steroids + hos.cat
old = options("tableone.show_pvalue_method"=TRUE)
# This comparison implicitly ignores missing values.
t = discrete_cgd %>% compare_population(formula)
options(old)
t
placebo | rIFN-g | |||
---|---|---|---|---|
Variable | Characteristic | Value (N=61) | Value (N=49) | P value |
Sex | male % [95% CI] (n) | 77.8% [65.1%—86.8%] (42) | 80.6% [65.0%—90.2%] (29) | — † |
female % [95% CI] (n) | 22.2% [13.2%—34.9%] (12) | 19.4% [9.8%—35.0%] (7) | ||
Age | 0-9 % [95% CI] (n) | 31.0% [20.6%—43.8%] (18) | 47.4% [32.5%—62.7%] (18) | — †† |
10-19 % [95% CI] (n) | 31.0% [20.6%—43.8%] (18) | 23.7% [13.0%—39.2%] (9) | ||
20-29 % [95% CI] (n) | 31.0% [20.6%—43.8%] (18) | 13.2% [5.8%—27.3%] (5) | ||
30-39 % [95% CI] (n) | 6.9% [2.7%—16.4%] (4) | 13.2% [5.8%—27.3%] (5) | ||
40-49 % [95% CI] (n) | 0.0% [0.0%—6.2%] (0) | 2.6% [0.5%—13.5%] (1) | ||
Height | below average % [95% CI] (n) | 43.9% [31.8%—56.7%] (25) | 52.5% [37.5%—67.1%] (21) | — †† |
above average % [95% CI] (n) | 56.1% [43.3%—68.2%] (32) | 47.5% [32.9%—62.5%] (19) | ||
Weight | <20 % [95% CI] (n) | 21.4% [12.7%—33.8%] (12) | 20.5% [10.8%—35.5%] (8) | — †† |
20+ % [95% CI] (n) | 78.6% [66.2%—87.3%] (44) | 79.5% [64.5%—89.2%] (31) | ||
Inherit | X-linked % [95% CI] (n) | 58.5% [45.1%—70.7%] (31) | 68.4% [52.5%—80.9%] (26) | — † |
autosomal % [95% CI] (n) | 41.5% [29.3%—54.9%] (22) | 31.6% [19.1%—47.5%] (12) | ||
Steroid treatment | received % [95% CI] (n) | 1.9% [0.3%—9.9%] (1) | 0.0% [0.0%—10.4%] (0) | — † |
not received % [95% CI] (n) | 98.1% [90.1%—99.7%] (52) | 100.0% [89.6%—100.0%] (33) | ||
Location | US:NIH % [95% CI] (n) | 19.3% [11.1%—31.3%] (11) | 20.5% [10.8%—35.5%] (8) | — † |
US:other % [95% CI] (n) | 50.9% [38.3%—63.4%] (29) | 51.3% [36.2%—66.1%] (20) | ||
Europe:Amsterdam % [95% CI] (n) | 14.0% [7.3%—25.3%] (8) | 12.8% [5.6%—26.7%] (5) | ||
Europe:other % [95% CI] (n) | 15.8% [8.5%—27.4%] (9) | 15.4% [7.2%—29.7%] (6) | ||
†, Not calculated due to missing values (categorical); ††, Not calculated due to missing values (ordered) An adjusted P value of 0.00714 may be considered significant. |
# N.B. The following option is involved when converting integer data
# which decides how many levels of integer data are considered discrete
# and when to decide integer data can be treated as continuous:
options("tableone.max_discrete_levels"=0)
# and is described in the documentation for make_factors().
Making missing factors explicit:
In the comparison above missing values were not included, and we
should be cautious of the findings. Because of the missingness
tableone
will not calculate p-values. If factor values are
missing (as in this case) then we can include them as a new group and
get a more robust comparison which includes the distribution of
missingness, and for which we can calculate a p-value. However
previously ordered variables, are now regarded as unordered as we cannot
determine the value of a missing level.
discrete_cgd %>% explicit_na() %>% compare_population(formula)
placebo | rIFN-g | |||
---|---|---|---|---|
Variable | Characteristic | Value (N=61) | Value (N=49) | P value |
Sex | male % [95% CI] (n) | 68.9% [56.4%—79.1%] (42) | 59.2% [45.2%—71.8%] (29) | 0.13 † |
female % [95% CI] (n) | 19.7% [11.6%—31.3%] (12) | 14.3% [7.1%—26.7%] (7) | ||
<missing> % [95% CI] (n) | 11.5% [5.7%—21.8%] (7) | 26.5% [16.2%—40.3%] (13) | ||
Age | 0-9 % [95% CI] (n) | 29.5% [19.6%—41.9%] (18) | 36.7% [24.7%—50.7%] (18) | 0.0064 † |
10-19 % [95% CI] (n) | 29.5% [19.6%—41.9%] (18) | 18.4% [10.0%—31.4%] (9) | ||
20-29 % [95% CI] (n) | 29.5% [19.6%—41.9%] (18) | 10.2% [4.4%—21.8%] (5) | ||
30-39 % [95% CI] (n) | 6.6% [2.6%—15.7%] (4) | 10.2% [4.4%—21.8%] (5) | ||
40-49 % [95% CI] (n) | 0.0% [-0.0%—5.9%] (0) | 2.0% [0.4%—10.7%] (1) | ||
<missing> % [95% CI] (n) | 4.9% [1.7%—13.5%] (3) | 22.4% [13.0%—35.9%] (11) | ||
Height | below average % [95% CI] (n) | 41.0% [29.5%—53.5%] (25) | 42.9% [30.0%—56.7%] (21) | 0.12 † |
above average % [95% CI] (n) | 52.5% [40.2%—64.5%] (32) | 38.8% [26.4%—52.8%] (19) | ||
<missing> % [95% CI] (n) | 6.6% [2.6%—15.7%] (4) | 18.4% [10.0%—31.4%] (9) | ||
Weight | <20 % [95% CI] (n) | 19.7% [11.6%—31.3%] (12) | 16.3% [8.5%—29.0%] (8) | 0.21 † |
20+ % [95% CI] (n) | 72.1% [59.8%—81.8%] (44) | 63.3% [49.3%—75.3%] (31) | ||
<missing> % [95% CI] (n) | 8.2% [3.6%—17.8%] (5) | 20.4% [11.5%—33.6%] (10) | ||
Inherit | X-linked % [95% CI] (n) | 50.8% [38.6%—62.9%] (31) | 53.1% [39.4%—66.3%] (26) | 0.3 † |
autosomal % [95% CI] (n) | 36.1% [25.2%—48.6%] (22) | 24.5% [14.6%—38.1%] (12) | ||
<missing> % [95% CI] (n) | 13.1% [6.8%—23.8%] (8) | 22.4% [13.0%—35.9%] (11) | ||
Steroid treatment | received % [95% CI] (n) | 1.6% [0.3%—8.7%] (1) | 0.0% [0.0%—7.3%] (0) | 0.019 † |
not received % [95% CI] (n) | 85.2% [74.3%—92.0%] (52) | 67.3% [53.4%—78.8%] (33) | ||
<missing> % [95% CI] (n) | 13.1% [6.8%—23.8%] (8) | 32.7% [21.2%—46.6%] (16) | ||
Location | US:NIH % [95% CI] (n) | 18.0% [10.4%—29.5%] (11) | 16.3% [8.5%—29.0%] (8) | 0.33 † |
US:other % [95% CI] (n) | 47.5% [35.5%—59.8%] (29) | 40.8% [28.2%—54.8%] (20) | ||
Europe:Amsterdam % [95% CI] (n) | 13.1% [6.8%—23.8%] (8) | 10.2% [4.4%—21.8%] (5) | ||
Europe:other % [95% CI] (n) | 14.8% [8.0%—25.7%] (9) | 12.2% [5.7%—24.2%] (6) | ||
<missing> % [95% CI] (n) | 6.6% [2.6%—15.7%] (4) | 20.4% [11.5%—33.6%] (10) | ||
†, Fisher's exact test (categorical) An adjusted P value of 0.00714 may be considered significant. |
Non biomedical data
Beyond the bio-medical example tableone
can make any
more general comparison between data that has a structure like:
~ group + observation_1 + observation_2 + ... + observation_n
We will use the iris
and the diamonds
datasets to demonstrate this more general use case for
tableone
.
# revert the labeller setting to the default
# and additionally hide the footer.
old = options(
"tableone.labeller"=NULL,
"tableone.show_pvalue_method"=FALSE,
"tableone.hide_footer"=TRUE)
# the heuristics detect that Petals in the iris data set are not normally
# distributed and hence report median and IQR:
iris %>% dplyr::group_by(Species) %>% compare_population(tidyselect::everything())
setosa | versicolor | virginica | |||
---|---|---|---|---|---|
Variable | Characteristic | Value (N=50) | Value (N=50) | Value (N=50) | P value |
Sepal.Length | Mean ± SD | 5.01 ± 0.352 | 5.94 ± 0.516 | 6.59 ± 0.636 | <0.001 |
Sepal.Width | Mean ± SD | 3.43 ± 0.379 | 2.77 ± 0.314 | 2.97 ± 0.322 | <0.001 |
Petal.Length | Median [IQR] | 1.5 [1.4—1.58] | 4.35 [4—4.6] | 5.55 [5.1—5.88] | <0.001 |
Petal.Width | Median [IQR] | 0.2 [0.2—0.3] | 1.3 [1.2—1.5] | 2 [1.8—2.3] | <0.001 |
options(old)
The missing_diamonds
data set which is included in this
package has 10% of the values removed. This demonstrates the need for
reporting the denominator.
# The counts sometimes seem redundant if there is no missing information:
# however in a data set with missing values the denominators are important:
missing_diamonds %>% describe_population(tidyselect::everything())
Variable | Characteristic | Value | Count (N=53940) |
---|---|---|---|
Carat | Median [IQR] | 0.7 [0.4—1.04] | 48682 |
Cut | Fair % [95% CI] | 3.0% [2.8%—3.2%] | 1454/48553 |
Good % [95% CI] | 9.2% [8.9%—9.5%] | 4462/48553 | |
Very Good % [95% CI] | 22.3% [21.9%—22.6%] | 10816/48553 | |
Premium % [95% CI] | 25.7% [25.3%—26.1%] | 12460/48553 | |
Ideal % [95% CI] | 39.9% [39.4%—40.3%] | 19361/48553 | |
Color | D % [95% CI] | 12.5% [12.2%—12.8%] | 6079/48569 |
E % [95% CI] | 18.3% [18.0%—18.6%] | 8886/48569 | |
F % [95% CI] | 17.7% [17.4%—18.1%] | 8613/48569 | |
G % [95% CI] | 20.9% [20.5%—21.2%] | 10137/48569 | |
H % [95% CI] | 15.4% [15.1%—15.7%] | 7466/48569 | |
I % [95% CI] | 10.0% [9.8%—10.3%] | 4876/48569 | |
J % [95% CI] | 5.2% [5.0%—5.4%] | 2512/48569 | |
Clarity | I1 % [95% CI] | 1.4% [1.3%—1.5%] | 664/48527 |
SI2 % [95% CI] | 17.0% [16.7%—17.4%] | 8265/48527 | |
SI1 % [95% CI] | 24.2% [23.8%—24.6%] | 11756/48527 | |
VS2 % [95% CI] | 22.7% [22.3%—23.1%] | 11020/48527 | |
VS1 % [95% CI] | 15.2% [14.8%—15.5%] | 7355/48527 | |
VVS2 % [95% CI] | 9.4% [9.2%—9.7%] | 4570/48527 | |
VVS1 % [95% CI] | 6.8% [6.6%—7.0%] | 3298/48527 | |
IF % [95% CI] | 3.3% [3.1%—3.5%] | 1599/48527 | |
Depth | Median [IQR] | 61.8 [61—62.5] | 48584 |
Table | Median [IQR] | 57 [56—59] | 48707 |
Price | Median [IQR] | 2.41e+03 [952—5.33e+03] | 48675 |
X | Median [IQR] | 5.69 [4.72—6.54] | 48577 |
Y | Median [IQR] | 5.71 [4.72—6.54] | 48578 |
Z | Median [IQR] | 3.52 [2.91—4.03] | 48559 |
Is_colored | clear % [95% CI] | 30.7% [30.3%—31.1%] | 16572/53940 |
colored % [95% CI] | 69.3% [68.9%—69.7%] | 37368/53940 | |
Normal distributions determined by the Anderson-Darling test (P>0.005) |