Today we're publishing the results of a NESTA-funded research project we carried out in 2019-20 to compare assessing primary writing with Comparative Judgement (CJ) and the Teacher Assessment Framework (TAF).
We compared the grades teachers gave to 349 pieces of Year 6 writing. First, we got a panel of experienced teachers led by a local authority moderator to repeatedly grade each piece of writing using the TAF. Then we assessed each piece of writing using CJ.
We graded pieces with one of the three TAF grades: Working Towards, Expected Standard, Greater Depth. We called these the three 'wide' grades. We also subdivided each grade into three, to allow us to assess accuracy more precisely. We called these the nine 'fine' grades.
Q1: How reliable is each assessment method? We found that for any individual piece of writing, the chance that two teachers would agree on the wide grade was 64%. For CJ, the agreement was 86%.
We calculated the size of these disagreements using the 9 grade system. With the TAF, the disagreement between two markers was +/- 2 grades. A script getting EXS-B with marker 1 could get from WTS-A to GDS-C from marker 2. With CJ, the disagreement was +/- 1 grade.
Q2: How efficient is each method? You can make the TAF more reliable by aggregating the grades of lots of teachers. However, this increases the time taken.
You would need to aggregate four separate individual TAF grades – that is, to quadruple-mark each piece of writing – in order to reach the level of reliability equivalent to that of a single Comparative Judgement session. This would take twice as long as a CJ session.
Q3: How valid is each method? If we assume the average TAF grade is the most reliable one, then this grade does agree quite closely with the CJ grade. 79% of scripts got the same wide grade with both methods. 92% were within one fine grade of each other.
This suggests teachers apply a broadly similar understanding of good writing when they use CJ as when they use the TAF. However, further analysis of script content would be useful here: there are some differences in grades and it may be they reflect slightly different emphases.
Perhaps the real insight is the power of aggregation. No one individual marker or judger holds the true standard within their head, but by combining their judgements we can reach a consensus. This is true whether we are using the TAF or CJ - but it is quicker with CJ!
We have also published the full set of 349 scripts together with their median-TAF grade, every grade they were given by the individual teachers and the CJ grade. This is a really valuable resource for all primary teachers. https://observablehq.com/@nmm/the-reliability-of-grading-writing-using-the-teacher-asses
So far as we know it is the most extensive & rigorous exemplification exercise carried out on the new TAF - it consists of writing of real children from representative schools, completed in consistent conditions and then assessed many times by experienced, trained teachers.
Over the next few days we will also blog about some more of the detail from this study. Enjoy!
You can follow @daisychristo.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: