figshare
Browse
1/1
4 files

A quantile shift approach to main effects and interactions in a 2-by-2 design reproducibility package

Version 2 2023-12-06, 15:44
Version 1 2023-05-19, 13:30
online resource
posted on 2023-12-06, 15:44 authored by Rand WilcoxRand Wilcox, Guillaume RousseletGuillaume Rousselet

Data, knitted notebooks and figures in pdf format for the article.

The code to reproduce the figures, analyses and simulations in the article is available on GitHub.

arXiv preprint


Below is a breakdown of the different notebooks.  


## Method 1: compare quantiles of marginal distributions


sim_fp.Rmd

Estimate FWER = probability of at least one false negative among 9 deciles. Use percentile bootstrap and bootstrap-t, with the same samples for all deciles. Use a percentile bootstrap approach as in Wilcox et al. (2014), but with the same samples for all deciles. 

Compare HD (Harrell & Davis, 1982) to quantile(type=7), the default in R (Hyndman & Fan, 1996; Wicklin, 2017).

Compare two methods to correct for multiple comparisons: Hochberg (1988), as used in Wilcox et al. (2014) and FDR (Benjamini & Hochberg, 1995). Compare to standard ANOVA on means and 20% trimmed means (no bootstrap, see chapter 7 in Wilcox, 2017).


Conclusion: All methods tested are a bit too conservative. FDR gives FWER closer to nominal level than Hochberg across all conditions. FDR dominates QT7 in all situations. Recommendation: use HD + FDR as default.


Generated article figures:

- fig_art_fp_norm_lnorm.pdf  = Figure 2  

- fig_art_fp_pois_bbr9.pdf = Figure 3  

- fig_art_fp_ind_norm_bbr9.pdf  = Figure 4  


sim_fp_b1b9.Rmd

Check whether using separate bootstrap samples for every decile improves coverage.


Conclusion: No evidence for better performance using separate bootstrap samples for every decile.

Keep using the same bootstrap samples for all deciles for computational efficiency.

Overall the approach is conservative, with FWER below the nominal level at all sample sizes, for all distributions, and for main effects and interaction.


sim_tp.Rmd

Estimate family-wise power = probability of at least one true positive among 9 deciles.

Use percentile bootstrap with the same samples for all deciles. 

Compare Harrell-Davis quantile estimator to quantile(type=7).

Compare to standard ANOVA on means and 20% trimmed means (no bootstrap).


Conclusion: Overall, no method dominates: situations are found in which ANOVA on means outperforms quantile methods; in other situations power for ANOVA on means collapses while quantile methods retain high power. FDR performs better or as well as Hochberg in all situations. HD is always more powerful than QT7, especially when dealing with tied values.  

Recommend to use HD + FDR by default.  


Generated article figures:

- fig_art_tp_norm_lnorm.pdf = Figure 5  

- fig_art_tp_pois_bb.pdf = Figure 6  


## Method 2: compare quantiles of distributions of all pairwise differences


sim_fp_apd.Rmd

Strategy: for each level of A, compute all pairwise differences between B1 and B2. Then compare the quantiles (here deciles) of these two distributions. Estimate FWER = probability of at least one false negative among 9 deciles.

Use percentile bootstrap only and boot1 method = use the same bootstrap samples for all deciles. 


Conclusion: FWER is lower than the nominal level in all situations. Type I error rate is close to the expected level at individual quantiles, particularly for central quantiles and n>20. HD and QT7 perform similarly for continuous distributions. When sampling from distributions with tied values, HD is much closer to the nominal level than QT7. FDR outperforms Hochberg in all situations.

Recommend to use HD + FDR by default.  


Generated article figures:  

- fig_art_fp_apd.pdf = Figure 7  

- fig_art_fp_apd_ind_norm_clnorm.pdf = Figure 8  

- fig_art_fp_apd_ind_pois_bbr1.pdf = Figure 9    


sim_fp_apd_b1b9.Rmd

Strategy: for each level of A, compute all pairwise differences between B1 and B2. Then compare the quantiles (here deciles) of these two distributions. Estimate FWER = probability of at least one false negative among 9 deciles. Use HD only; compare bootstrap with the same samples for all deciles vs. different samples. Do that for one sample size only. With n=40, the distribution of all pairwise differences = 1600 observations!


Conclusion: No evidence for better performance using separate bootstrap samples for every decile.

Keep using the same bootstrap samples for all deciles for computational efficiency.

Overall the approach is conservative for all distributions. Looking at individual deciles, we're closer to the nominal level nearer the centre of the distribution, a bit conservative at the extremes, but overall close to 0.05.


sim_tp_apd.Rmd 

Estimate familywise power = probability of at least one true positive among 9 deciles.

Use percentile bootstrap with the same samples for all deciles. 

Compare HD to QT7.  

Compare to standard ANOVA on means and 20% trimmed means (no bootstrap).


Conclusion: No method dominates. In some situations ANOVA on means outperforms the others, yet in other situations, power completely collapses for the ANOVA on means but not for the quantile method. HD and QT7 perform similarly for continuous distributions, but in the presence of tied values, HD dominates QT7.  


Generated article figure:

- fig_art_tp_apd.pdf = Figure 10  


## Illustrations


hd.Rmd

Illustrate the beta weights used to compute the Harrell-Davis quantile estimates.  


Generated article figure:

- fig_ex_hd_beta_weights.pdf  = Figure 1  


apd_ex.Rmd

Demonstrate that when dealing with distributions of all pairwise differences, interchanging the rows and columns can yield different interaction results.  


kurtosis_estimation.Rmd

Estimate the kurtosis of samples from a lognormal distribution and a contaminated lognormal distribution.  


examples.Rmd

Illustrate distributions used in the power simulations. Generate samples at maximum sample size. Plot marginals, shift functions for main effects and interaction, distributions of all pairwise differences, their quantiles, and their quantile differences (interaction).

Illustrate health example presented in the article. 


Generated article figures:

- fig_ex_A1B1C_dec.pdf = Figure 11  

- fig_ex_A1B1C_apd.pdf = Figure 12  


History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC