Algorithms Alone Can’t Solve Complex Human Problems – UK Exam Results U-Turn

Algorithms Alone Can’t Solve Complex Human Problems - UK Exam Results U-Turn
Algorithms Alone Can’t Solve Complex Human Problems – UK Exam Results U-Turn

Exam results decided by an Ofqual algorithm for up to 97% of A-level and GCSE students in England will now be scrapped and the projected ‘mock’ grades decided by individual teachers will be reinstated, in a humiliating U-turn for the government.

The controversial algorithm, implemented to give standardized grades to students in lieu of exams, marked down approximately 40% of all A-level results and disproportionately affected students from poorer backgrounds and state schools compared to privately educated students.

WHAT WENT WRONG?
If there was ever going to be a suitable algorithm for accurately and fairly predicting exam results, that algorithm would have to be trained on years worth of personal educational data and account for a huge amount of mitigating factors, particularly considering the erratic emotional state of your average A-level student. Without access to this pool of granular training data, Ofqual’s algorithm instead relied on the historical grades of individual test centers, the rankings of students given by test centers and teachers, and previous exam results. Teachers’ projected grades (which will now stand as official grades for both A-levels and GCSE results) were also used as a secondary filter after the school’s historical attainment, but there were massive discrepancies between the projected results and those decided by the algorithm.

The controversy arose because teachers’ grades or individual past attainment were far from the primary factors influencing the algorithm’s decisions. Instead, the two most important factors for the algorithm to work from were the grade distribution of an individual’s test center between 2017-2019, and the ‘ranking’ of each student based on how that test center thought you would perform among your peers, and the teacher’s estimate of your rank. After these two factors, a student’s past test results were factored into the prediction, with A-level students being judged on GCSE performance, and GCSE students based on their key stage two assessments taken at the age of eleven.

This structure meant that grades were calculated primarily based upon the previous attainment of your class or subject, and an individual’s grade calculated based on their position in that previous year’s ranking – if the rankings said you were middle of the class, you would receive a grade equivalent to last year’s middle of the class – with previous results factored in to adjust for each individual. Using a child’s previous exam results may have gone some way towards achieving more personalized predictions, but using this only to fine-tune the final grade suggests that this environmental, peer-based approach was far more influential, and does not account for the changes that any human goes through between the ages of 11 and 17.

This is concerning for a number of reasons:

  • Calculating grades based upon the historical attainment of one test center assumes that location has a profound effect on grades, implying that students from historically disadvantaged backgrounds are not capable of achieving higher grades than previous years and that students from higher-achieving schools will always get better results,
  • It assumes a normal distribution of grades throughout a child’s educational career and a linear progression of every student within a class or subject area (on the contrary, research suggests that C students have greater long-term success than A students), and
  • It perpetuates the already biased educational framework that means smaller classes get better results, precisely the kind of bias that this approach towards standardization was purported to fix.

SMALL SUBJECTS, A BIG ADVANTAGE
According to data analysis by social mobility charity, UpReach, the “small subject” effect alone led to a 4.7% increase in A and A* grades in private schools compared to 0.3% in Sixth Form Colleges. This effect also meant that Sixth Form College students were 20% more likely to have their teacher-assessed grade downgraded than private schools – with some Sixth Form students experiencing drops of up to six grades below their projected results. Smaller classes lead to a lower grade distribution between students as there is more likely to be a narrower range of abilities in a smaller class – extra attention afforded to students is also touted as a major benefit of smaller classes in private schools. Smaller classes also appear more frequently in niche subjects such as Latin that are not necessarily taught in state schools, and as such the inclusion of these subjects contradicts the idea of ‘standardizing’ grades across schools that do not have the same curriculum.

Classical subjects also saw 10.4% more A* or A grades than the prior year, whereas subjects more commonly taught at Sixth Form colleges, such as Psychology and Sociology, saw a meager increase of between 0.2% and 0.5% in A or A* grades. This “rampant grade inflation” according to UpReach, is another example of discriminatory bias inherent in this algorithm and implies a level of trust in private school teachers that is not afforded to sixth form teachers in the state system – despite evidence that suggests private schools do not necessarily provide a better education. This classist sentiment is evident in UpReach’s findings: in Latin, 97.4% of schools factored in teacher-assessed grades, whereas in Sociology, only 35.2% of schools factored in teacher’s projections rather than just rankings. When considering that around 30 times more students studied Sociology A-Level at Sixth Form and FE colleges than at private schools, and 70% of schools offering Latin and History of Art at A-level were independent schools, this issue of trust and equity is put into stark contrast.

TRUSTING TOO MUCH
Until such a time that we can train an algorithm with all the data it needs, factoring in every possible circumstance and mitigating factor, humans will always be needed to temper the raw calculations of a robot, and we will need to put our own biases aside to be able to trust those adjustments.

What we can learn from this situation is that for AI to be beneficial to everyone, we need to make sure that the humans deciding when to rely on statistics and when to trust human instinct can also be trusted to act in everyone’s best interests. For this generation, about to enter the UK workforce in earnest, it will take a lot to win that trust back.

originally posted on forbes.com by Charles Towers-Clark