We want to be able to communicate how effective each strategy in our evidence toolkit is so decision makers can quickly determine what does and doesn't work.
The cut-offs for and interpretations of effect sizes for INSPIRE are taken from the works of Funder and Ozer (2019). Information about these cut-offs and interpretations is below. When more than one meta-analysis is used in an evidence summary, the main effect from the meta-analysis that was rated as having the highest quality is reported.
➕ Indicates a likely inconsequential effect in the short and long run (Cohen's d < 0.1 or equivalent)
➕➕ Indicates an effect that is very small for the explanation of single events but potentially consequential in the not-very long run (Cohen's d = 0.1 or equivalent)
➕➕➕ Indicates an effect that is still small at the level of single events but potentially more ultimately consequential (Cohen's d = 0.2 or equivalent)
➕➕➕➕ Indicates an effect of medium size that is of some explanatory and practical use even in the short run and therefore even more important (Cohen's d =0.4 or equivalent)
➕➕➕➕➕ Indicates an effect that is large and potentially powerful in both the short and the long run (Cohen's d > 0.6 or equivalent)
For negative effects, the ➕ icon is replaced with the equivalent number of ➖ icons (e.g., if d = -0.2, the rating will appear as ➖➖➖)
What we want to know is "how sure are we that this works as well as we say it does?"
We use two established approaches to assessing quality, the GRADE and AMSTAR2 approaches. We’ve combined these approaches to make our quality ratings robust, relevant for educational research, and transparent.
All meta-analyses start with ➕➕➕➕➕
They lose points for the following problems:
1. The included studies are high risk of bias based on meeting less than 4 of the following:
1. Included a focused question and inclusion criteria based on PICO
2. Included study designs were clearly specified and explained
3. Excluded articles were clearly specified and justified
4. The literature search was comprehensive
5. Articles were screened and selected in duplicate
6. Data extraction was performed in duplicate
7. All included articles were described in adequate detail
2. The number of included studies is low (k = <5) or the confidence intervals are wide (width = d ± .3)
3. If the reliability of the effect sizes is poor (specifically, unexplained heterogeneity; has an I2 greater than or equal to 0.75)
4. If the study only looked at one sample (e.g., only psychology students; only postgraduate) and so might not generalise to other samples