How we write summaries

Literature selection

Our aim

We want to provide a communicate best-available evidence in higher education. We exclusively search for meta-analyses because they are the most systematic and robust ways to answer 'big' questions, but acknowledge the limitations of this kind of research too.

How we do it

We search databases for meta-analyses in higher education (or where there are moderators with higher education samples) where experimental (including quasi-experimental) findings have been summaries. We priorities research that uses learning or academic performance as the outcome, but also consider related outcomes such as student engagement, student satisfaction, and student motivation. We also priorities research topics that are generally relevant to a wide audience and multiple disciplines

Composing summaries

Our aim

We want to provide digestible summaries of the best available evidence so busy educators can apply robust research finding quickly.

How we do it

We have a standardised format for each summary that begins with, at most, three applications based on the evidence. We prioritise the best quality meta-analyses from each topic when deciding on which applications to include in the summary. The rest of the summary includes information to deepen knowledge and understanding of the topic, including information about the topic, more detail about the research findings, reference to relevant underlying theories and frameworks, and a description of the quality of the research that makes up the summary. To ensure consistency across the summaries, educational researchers work with a purpose-built AI agent to compose the summaries. Humans select the research and do a first-pass summary before comparing their work to the output from the AI agent. This ensures human experts are driving the work and their work is being validated by a trained AI tool.

Impact

Our aim

We want to be able to communicate how effective each strategy in our evidence toolkit is so decision makers can quickly determine what does and doesn't work.

How we do it

The cut-offs for and interpretations of effect sizes for INSPIRE are taken from the works of Funder and Ozer (2019). Information about these cut-offs and interpretations is below. When more than one meta-analysis is used in an evidence summary, the main effect from the meta-analysis that was rated as having the highest quality is reported.

Cut-offs and interpretations

➕ Indicates a likely inconsequential effect in the short and long run (Cohen's d < 0.1 or equivalent)
➕➕ Indicates an effect that is very small for the explanation of single events but potentially consequential in the not-very long run (Cohen's d = 0.1 or equivalent)
➕➕➕ Indicates an effect that is still small at the level of single events but potentially more ultimately consequential (Cohen's d = 0.2 or equivalent)
➕➕➕➕ Indicates an effect of medium size that is of some explanatory and practical use even in the short run and therefore even more important (Cohen's d =0.4 or equivalent)
➕➕➕➕➕ Indicates an effect that is large and potentially powerful in both the short and the long run (Cohen's d > 0.6 or equivalent)

For negative effects, the ➕ icon is replaced with the equivalent number of ➖ icons (e.g., if d = -0.2, the rating will appear as ➖➖➖)

Quality

Our aim

What we want to know is "how sure are we that this works as well as we say it does?"

How we do it

We use two established approaches to assessing quality, the GRADE and AMSTAR2 approaches. We’ve combined these approaches to make our quality ratings robust, relevant for educational research, and transparent.

Approach

All meta-analyses start with ➕➕➕➕➕
They lose points for the following problems:
1. The included studies are high risk of bias based on meeting less than 4 of the following:
      1. Included a focused question and inclusion criteria based on PICO
      2. Included study designs were clearly specified and explained
      3. Excluded articles were clearly specified and justified
      4. The literature search was comprehensive
      5. Articles were screened and selected in duplicate
      6. Data extraction was performed in duplicate
      7. All included articles were described in adequate detail
2. The number of included studies is low (k = <5) or the confidence intervals are wide (width = d ± .3)
3. If the reliability of the effect sizes is poor (specifically, unexplained heterogeneity; has an I² greater than or equal to 0.75)
4. If the study only looked at one sample (e.g., only psychology students; only postgraduate) and so might not generalise to other samples