### this is just the code that creates the "copy to clipboard" function in the code blockshtmltools::tagList(xaringanExtra::use_clipboard( button_text ="<i class=\"fa fa-clone fa-2x\" style=\"color: #301e64\"></i>", success_text ="<i class=\"fa fa-check fa-2x\" style=\"color: #90BE6D\"></i>", error_text ="<i class=\"fa fa-times fa-2x\" style=\"color: #F94144\"></i>"),rmarkdown::html_dependency_font_awesome())
Introduction
Show code
set.seed(12345)# to get stable resultsnPractitioners<-15# 15 practitionersvalRate<-.55# fixed recovery ratevecN<-sample(18:50, nPractitioners)# now create different ### simulate across those nPractitioners with sizes vecN and all same recovery rate, valRatevecNrecovered<-rbinom(length(vecN), vecN, valRate)# numbers recovered per practitionervecRates<-vecNrecovered/vecN# convert to rates### now get to percentages and format nicelyvecPercRatesTxt<-paste0(sprintf("%4.1f", 100*vecRates), "%")
The issue is that when you rank things to create a league table you must come out with “winners and losers” (unless everyone had exactly the same score/value on whatever it is that you are ranking). That doesn’t mean that there is any meaningful, systematic, replicable difference, in the extreme, not even between the highest ranked and the lowest.
Here is a little simulation of 15 practitioners ranked on (though I don’t really approve of this!) the rates of clients achieving “reliable improvement”. Lets say that after few months the practitiones have see these numbers of clients
31, 36, 33, 43, 45, 41, 47, 28, 46, 19, 39, 44, 23, 24 and 27 and, simulating, have recovery rates of
64.5%, 41.7%, 54.5%, 58.1%, 42.2%, 51.2%, 53.2%, 57.1%, 52.2%, 52.6%, 61.5%, 54.5%, 47.8%, 79.2% and 63.0%. These can be ranked to create a league table.
Show code
tibble(PractID =1:length(vecN), # create IDs for the practitioners n =vecN, # pull in their dataset sizes nRecovered =vecNrecovered, # and their recovered numbers RateRecovered =vecRates, # as rates tmp =1-vecRates, # useful to rank propnRecovered =sprintf("%4.2f", vecRates), # nice format of proportions### and now the already nicely formatted percentage recovery rates percRecovered =vecPercRatesTxt)->tmpTibtmpTib%>%arrange(desc(propnRecovered))%>%mutate(position =rank(tmp))%>%select(-c(tmp, RateRecovered))%>%flextable()%>%autofit()
That looks like a pretty clear league table: only one tie and a huge spread of recovery rates from 79.2 to 42.2%: the best almost twice the rate of the least.
However, the reality is that the rates all arose from simulating sampling from a population in which the rate is .55, i.e. 55%. These differences are simply down to sampling vagaries.
This can be shown by adding 95% confidence intervals (CIs) around the rates.
That makes it easy to see that the precision of estimation of any long term recovery rate for each of these practitioners is low given the fairly low numbers of clients each saw by the time the first league table was constructed. For only one of the practitioners (ID 14) does the 95% confidence interval not embrace the rate overall across all
526 seen across all 15 practitioners. That rate was
0.55 and its 95% CI was from
0.5 to
0.59 which reminds us that even with an n of
526 the precision of estimation of a rate is probably much less, i.e. the 95% CI is much wider, than perhaps we imagine it will be.
Summary/moral
Beware league tables with small individual dataset sizes and without 95% CIs.
Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
Citation
For attribution, please cite this work as
Evans (2025, Feb. 23). Chris (Evans) R SAFAQ: League tables. Retrieved from https://www.psyctc.org/R_blog/posts/2025-02-23-league-tables/
BibTeX citation
@misc{evans2025league,
author = {Evans, Chris},
title = {Chris (Evans) R SAFAQ: League tables},
url = {https://www.psyctc.org/R_blog/posts/2025-02-23-league-tables/},
year = {2025}
}