Determining the Sample Size for a Cluster-Randomised Trial Using Knowledge Elicitation: Bayesian Hierarchical Modelling of the Intracluster Correlation Coefficient

Tishkovskaya, Svetlana orcid iconORCID: 0000-0003-3087-6380, Sutton, Chris, Thomas, Lois Helene orcid iconORCID: 0000-0001-5218-6546 and Watkins, Caroline Leigh orcid iconORCID: 0000-0002-9403-3772 (2023) Determining the Sample Size for a Cluster-Randomised Trial Using Knowledge Elicitation: Bayesian Hierarchical Modelling of the Intracluster Correlation Coefficient. Clinical Trials . ISSN 1740-7745

[thumbnail of VOR]
PDF (VOR) - Published Version
Available under License Creative Commons Attribution.


Official URL:


The intracluster correlation coefficient is a key input parameter for sample size determination in cluster-randomised trials. Sample size is very sensitive to small differences in the intracluster correlation coefficient, so it is vital to have a robust intracluster correlation coefficient estimate. This is often problematic because either a relevant intracluster correlation coefficient estimate is not available or the available estimate is imprecise due to being based on small-scale studies with low numbers of clusters. Misspecification may lead to an underpowered or inefficiently large and potentially unethical trial.

We apply a Bayesian approach to produce an intracluster correlation coefficient estimate and hence propose sample size for a planned cluster-randomised trial of the effectiveness of a systematic voiding programme for post-stroke incontinence. A Bayesian hierarchical model is used to combine intracluster correlation coefficient estimates from other relevant trials making use of the wealth of intracluster correlation coefficient information available in published research. We employ knowledge elicitation process to assess the relevance of each intracluster correlation coefficient estimate to the planned trial setting. The team of expert reviewers assigned relevance weights to each study, and each outcome within the study, hence informing parameters of Bayesian modelling. To measure the performance of experts, agreement and reliability methods were applied.

The 34 intracluster correlation coefficient estimates extracted from 16 previously published trials were combined in the Bayesian hierarchical model using aggregated relevance weights elicited from the experts. The intracluster correlation coefficients available from external sources were used to construct a posterior distribution of the targeted intracluster correlation coefficient which was summarised as a posterior median with a 95% credible interval informing researchers about the range of plausible sample size values. The estimated intracluster correlation coefficient determined a sample size of between 450 (25 clusters) and 480 (20 clusters), compared to 500–600 from a classical approach. The use of quantiles, and other parameters, from the estimated posterior distribution is illustrated and the impact on sample size described.

Accounting for uncertainty in an unknown intracluster correlation coefficient, trials can be designed with a more robust sample size. The approach presented provides the possibility of incorporating intracluster correlation coefficients from various cluster-randomised trial settings which can differ from the planned study, with the difference being accounted for in the modelling. By using expert knowledge to elicit relevance weights and synthesising the externally available intracluster correlation coefficient estimates, information is used more efficiently than in a classical approach, where the intracluster correlation coefficient estimates tend to be less robust and overly conservative. The intracluster correlation coefficient estimate constructed is likely to produce a smaller sample size on average than the conventional strategy of choosing a conservative intracluster correlation coefficient estimate. This may therefore result in substantial time and resources savings.

Repository Staff Only: item control page