Judging Bias in a Nutshell

This chart is a snapshot of psychological biases and other factors that can lead to inaccurate scores in gymnastics judging. Nearly all judges experience many of these at some point in their judging career with some factors being more conscious than others. When judges and rule-makers are more aware of when and how different biases are apt to occur they can take steps to reduce and combat their effects.

 

The intention here isn't to place blame on judges, or make excuses for scoring blunders, but to inform stakeholders of the paths for reducing inaccuracies. For more details on how each of these phenomenon occur, and how to reduce them, follow the links to the deep dive on each subject as we bring each of those articles online. Which ones would you like to see prioritized? Tell us through the feedback form at the bottom of the page.

Note: This is better viewed on the desktop site, rather than mobile.

Bias Name(s)

What is it?

More info or examples

Conscious or unconscious?

What can be (or has been) done about it?

What else should you know?

Nationality - Group affiliation bias

Showing favoritism to a particular nation, team, or group; or unjustly penalizing a rival nation team or group.

Cold war rivalies. Eastern Bloc communist nationalities giving higher scores to their own gymnasts and lower scores to gymnasts from western capitalist countries and vice versa.

Mostly conscious

Judges are not allowed to judge competitions/session for which they have a close affiliation.

The FIG Judges' Evaluation Programme (JEP) reviews judges' scores for this bias. Judges who are shown to have a consistent bias are sanctioned.

The first thing most people think of when they think of bias.

Reputation bias

Giving scores based on expectation of performance for a recognized or well-known gymnast.

When judges judge the gymnast, not the routine. One of the reasons why gymnasts get higher scores domestically than internationally is because they are better known within the country than abroad.

Can be conscious or sub-conscious

Awareness of when this tends to occur. 
Mental separation of the identity of the gymnast from the routine. 
Experience judging at higher levels of competitions.

Can influence D score with well known gymnasts getting credit for poorly performed skills/connections that a lesser known gymnast would not.

Home advantage

When judges give higher scores to the home team than what they deserve.

NCAA Competitions. When a team wins more than 50% of their competitions at home than away in a balanced schedule.

Can be conscious or sub-conscious

Judge assignments by a neutral party, not the host organization. If the host organization is "hiring" the judges then there is a perceived expectation for favoritism.

Home Advantage also affects athlete performance. (Familiarity with arena/equipment etc.) Sometimes better home scores are justified.

Audience influenced bias

When the audience, coach, or team reactions influence a judge's score, or the judge gets "swept up in the moment", and fails to remain objective.

2004 Athens Olympics High Bar Final. Alexi Nemov performed an audience wowing routine. The spectators protested his score, delaying the competition for nearly 20 minutes until the score was changed. The score was raised, but the final placement was not affected.

Mostly conscious

Greater physical distance between the spectators/coaches and officials.
Security measures for officials.
Judges scoring consistent with what they have written.

It is also possible that darkening the spectator seating in the arena (2015, 2017, & 2019 Worlds) helps to make judges less aware of audience reactions, but this has not been scientifically studied.

Difficulty bias

When judges are more lenient with E score deductions for gymnasts performing more difficult routines.

Difficulty bias was very prevalent when judging in the 10.0 scoring system. In the 10.0 scoring system judges can not differentiate between routines based on difficulty, so they compensate with lighter judging on the execution of the routine.

Conscious rationalization

This has been greatly reduced at the international level by splitting D and E jury duties into different panels and going to an open-ended scoring system.

1964 MAG Code of Points actually institutionalized a difficulty bias stating: "Series of value presenting great risks or marked originality will be judged more favorably, in the matter of small faults in execution than those lacking originality, risk and value."

Halo error

Judging (or making an adjustment in judging) based on an overall (global) impression instead of specific deductions.

After a routine is completed, a judge looks at their deductions and makes an alteration to the math because it "seemed" a little harsh or light.

Conscious rationalization

Halo errors are reduced by having specific judging criteria and fewer global deductions. Judges need to trust the deductions they have written.

This is a common bias for spectators to experience since they "judge" routines based on an overall impression instead of specific, detailed criteria for each element performed.

Memory bias

Assuming a skill is executed in a way you have previously seen it done, even though it may not have occurred that way in the judged competition.

Used as an argument for not permitting judges to watch podium training.

Subconscious in nature

Mentally separating the gymnast's identity from the element. Watch multiple gymnasts perform the same element lessens the association with one particular gymnast.

More common with less experienced judges who rely on heuristics (rules of thumb). They don't know where to focus on an element, and assume that it has "typical" errors.

Overall order bias

Judges sometimes supress scores early in a competition to "leave room" for possible better routines later on. Occurs over the scope of the event/day. Caused by fatigue or poor calibration.

Fewer gymnasts qualify for finals from the first subdivisions of Worlds than any other sessions (for WAG - This does not hold true for MAG).

Subconscious

Effective calibration. Judges having a general idea of the level of competition before the competition begins. This can be done through watching podium training. Take measures to avoid fatigue (frequent breaks, stay hydrated, stay as active as possible between rotations/sessions).

Typically thought of as "score inflation" as the competition progresses, but can go in a downward direction too.

(Direct) Sequential order bias / Within-team order bias

When judges expect routines to be better as the competition progresses and let that affect their observation of those routines.

Frequently happens in team competitions where the coach can select the order of the competitors, or in seeded competitions (e.g. AA finals or some World AA Cup events).

Subconscious

Random draw of athletes. Alternating athletes from different teams (as has been done at World and Olympic Team Finals since 2018.

Strategy is apart of sport, in which gymnastics is limited in its opportunities to employ. A savvy coach who really understands how this bias functions can use it to their advantage beyond just having the best gymnasts go last/toward the end of the line up.

Conformity bias

Conformity: The tendency to align one's behavior to those around them. Occurs most frequently in panel situations.

When judges align their scores to what they anticipate the other judges will give. Is stronger if judgments are difficult or if the judge is insecure about their judgments.

Semi-conscious

"Closed scoring systems" where judges are completely unaware of other judges scores. (Although this is rarely possible and judges still make adjustments based on what they anticipate/assume other judges will do.) Alleviating pressure on judges to score in an expected range. Pre-competition calibration as a panel.

Also includes when judges feel pressured to judge in a certain way. (e.g. a meet director is asking judges to judge a novice competition lighter, or to really differentiate between routines to avoid ties at a championship competition).

Implicit bias

Judging gymnasts according to stereotypes associated with a specific personal trait (e.g. race, religion, body-type etc.).

Judging heavier-set gymnasts more harshly than thin gymnasts.

Mostly subconscious

Judges' education. Review of scores to bring awareness to the issue. Punitive measures against judges who are conscious about this bias but do not take measures to change.

No research studies have measured the effect of implicit bias in gymnastics judging.

Noise

A random variation in evaluations by different people, or at different times. Not a bias, but never-the-less can cause inaccurate scores.

Encompasses things from genuine mistakes to inaccuracies due to fatigue.

Subconscious

Awareness of when "noisy" situations tend to occur. Precautions to fend off fatigue. Using judging pairs or panels to double check scores. Not mixing multiple levels in the same session.

We've written an article about this very topic!

Cheating

A conscious decision not to adhere to the rules in order to achieve a desired result.

Paying off judges for favorable scores. Score trading.

Post competition adjustment of scores.

Conscious

Selection of judges by a neutral organization.
Judges as paid professionals who can be fired for unethical behavior.
Use of the JEP to detect questionable activity.
Sanctions against offenders.

Goes against the Judge's Oath to "Officiate with complete impartiality, respecting and abiding by the rules… in the true spirit of sportsmanship."

Domestic scoring

When gymnasts receive higher scores at national competitions than they would at international competitions.

Can have multiple causes ranging from NGBs and coaches putting pressure on judges to give certain scores to some countries having limited numbers of FIG Brevet judges. Reputation bias is also very strong at domestic competitions due to how well known gymnasts are within their country.

Can be conscious or sub-conscious

Effective systems for selecting judges for national championship competitions.
Have some neutral judges at national championship competitions.
Oversight and review of scores by neutral persons.

Nations with few FIG Brevets rely on judges with little or no international experience to judge national championship competitions. In such cases the Brevet judges are usually the D jurors and the E jurors are those with less international experience leading to a multitude of subconscious biases.

Inexperience

When a judge is judging at a level or competitive situation that they have not been in many times before.

Inexperience is different from incompetence. You can have a very competent judge who has not had the experience of judging certain levels of competition, and there can be judges with much experience, but are still not competent (i.e. they have not kept up with rule changes).

Mostly subconscious

Pairing of judges to gain experience alongside a mentor judge.
Allowing judges to shadow judge at higher level competitions.

Can also affect judges who have been judging for several years, but find themselves judging at a higher level of competition for the first time (e.g., first time judges at National, Continental, or World Championship competitions).

What is your experience or do you have examples of when you have seen any of these phenomenon occur?
(You can use the feedback form to share your experience.)