Back in 2002 US academic, Richard Stiggins argued that if the US wished to maximise student achievement, they would need to pay far greater attention to the improvement of classroom assessment. He went on to argue that both assessment of learning and assessment for learning were essential and whilst one was currently in place, the other was not.

It was a call that reverberated internationally and in 2008, as part of the Melbourne Declaration, the Australian Ministers for Education agreed that assessment in Australia would be rigorous and comprehensive. It would draw on a combination of professional judgements of teachers and testing, including national testing. The ink from the Minsters’ signatures could not have dried before NAPLAN was launched. But what happened to the rigorous and comprehensive use of teacher professional judgements?

Nearly a decade later, and after considerable research through the University of Western Australia (UWA), WA is leading the nation in reinstating the status of teacher professional judgment. The initiative was initially led by UWA and the WA Primary Principals’ Association (WAPPA), and supported by the WA Independent Primary School Heads of Australia (IPSHA). The School Curriculum and Standards Authority has now taken charge and there is considerable collaboration across all education sectors in WA. At the risk of sounding twee, this is our story!

Nearly 15 years ago my colleague Dr Stephen Humphry and I began two broad on-going, and large-scale research studies. The first examined anomalous patterns in the data obtained from the writing component of the WA Literacy and Numeracy assessment program. This work was central to the State’s decision in pulling back from Student Outcome Statements but it also led to the development of a marking guide which later became the template for the NAPLAN writing marking guide.

The second field of study is the one that excited us and it is this work that led to the discovery that teachers make highly reliable judgements of students’ performance. With the help of WAPPA, we drew on this research to develop a two stage assessment process that is accessible to classroom teachers. First we calibrate student performances to develop assessment scales. Teachers then assess their students using the calibrated assessment scales.
To calibrate the scales, we:
1    Devise assessment tasks that elicit information about the developmental aspects of student performance and we use these tasks to collect a large sample of student work.

2    A small team of dedicated teachers compare pairs of student performances each time selecting which is the better. In over 100 studies from early childhood through to tertiary courses we have found that teachers are highly consistent in judging the relative differences of student work. We typically obtain a reliability of 0.95, where 1 indicates perfect agreement of the judgements.

3    We analyse all the calibrated performances examining qualitative differences between the performances to develop performance descriptors or learning progressions. This work is not that dissimilar to previous iterations of describing student development such as work done to develop the First Steps Continua. However there is one essential difference. Empirical data from teacher judgements allow us to calibrate a very large sample of student work and it is the calibrated work that is used to inform the writing of the descriptors.

4    The final step in developing a scale for teachers to use, is the selection of a subset of performances which function as calibrated exemplars.

It is worth pausing for a moment to draw your attention to an important observation. The assessment scale and the learning progressions are developed concurrently. Traditionally, curriculum experts devise the curriculum or the learning progressions and then work is done to develop assessments of the curriculum or the progressions. In our work, the learning progressions and the assessment scales emerge simultaneously.

Once the assessment scales are developed:

1    Classroom teachers collect their students’ performances using typical classroom tasks.

2    They compare a student’s performance to the calibrated exemplars. They decide which exemplar the performance is closest to or which two exemplars it falls between to place their student on the scale.

Software named Brightpath has been developed to make the assessment process readily available to schools and a particular advantage of the software is that the judgements or scores are captured in the reports as teachers work. Virtually instantaneously teachers and school leaders can access comprehensive reports about student performance. Figure 1 shows the assessment facility and Figures 2 and 3 show the types of reports that can be provided because a scaled score is obtained from the teachers’ judgments.

Nearly 200 primary schools across all three sectors are now using Brightpath and in 2017 all WA primary schools will be invited to use Brightpath. It is anticipated that from 2018 lower secondary schools will also have access to the assessment scales (or rulers as they called in Brightpath).

For the first time teacher judgements can be used to evaluate student growth in learning and to evaluate student programs in much the same way that NAPLAN data are used. Teachers are also provided with grades derived from the empirical data on student performance, which is a significant step for primary education in Australia.

As part of the licence agreement with UWA, we will continue to research the assessment process and our particular focus for the next year will be investigating the extent to which the assessment process leads to more effective teaching. Early evidence is very promising but it is important that we study this closely. After all, this is our opportunity ensure that classroom assessment does in fact maximise student achievement.