Author Topic: Nats judging scores. (Read 1220 times)

Dick Byron · « **on:** March 11, 2022, 10:38:17 AM »

I am just wondering. What would be normal or abnormal in the spread between judges on the same circle. Example, (judge one scores 500) (judge 2 scores 420). Just wondering.
Dick Byron

EricV · « **Reply #1 on:** March 11, 2022, 11:45:40 AM »

Quote from: Dick Byron on March 11, 2022, 10:38:17 AM

I am just wondering. What would be normal or abnormal in the spread between judges on the same circle. Example, (judge one scores 500) (judge 2 scores 420). Just wondering.
Dick Byron

Hi Dick,

Since I have the least to lose since I'm not competing these days, I'll be brave an answer your question.

There is no "Normal or Abnormal" spread on scores. It is what it is. As long as the low judge is consistently low on whatever his bailiwick is that you do wrong, and equally harsh for that particular thing you do wrong with all the other pilots, then his is doing fine.

Now, a pilot can still get a high score from that low judge if they don't "trigger" him on that area that he is harsh on, but that is not to say he is "wrong". Consistency is key, and just looking at one or two other scores never tells the whole story. The reverse is true on the high judge as well, he may reward a particular thing well done more than others. Consistency is the key again.

Trust me, I went down this rabbit hole once, and I only got better scores when I just looked at my own issues to fix and ignored everything else.

Hope that helps,
EricV

Howard Rush · « **Reply #2 on:** March 11, 2022, 12:58:12 PM »

Eric is right, as usual. Paul Walker and Charles Buffalano came up with a method (attached) for evaluating judge performance that compares each judge's ranking of contestants (not scores) to the scoreboard ranking. Bill Lee did something similar; it gives similar results.

Sean McEntee · « **Reply #3 on:** March 12, 2022, 08:59:42 AM »

With all of the talk in MLB about robotic umpires, imagine if we were able to have robotic judging.

Tim Wescott · « **Reply #4 on:** March 12, 2022, 09:44:04 AM »

Quote from: Sean McEntee on March 12, 2022, 08:59:42 AM

With all of the talk in MLB about robotic umpires, imagine if we were able to have robotic judging.

And then robotic pilots, and robotic spectators! We could sit around and watch the Nats on TV, while robots cursed and threw beer cans at the screen!

Ken Culbertson · « **Reply #5 on:** March 12, 2022, 10:05:21 AM »

Quote from: Howard Rush on March 11, 2022, 12:58:12 PM

Eric is right, as usual. Paul Walker and Charles Buffalano came up with a method (attached) for evaluating judge performance that compares each judge's ranking of contestants (not scores) to the scoreboard ranking. Bill Lee did something similar; it gives similar results.

This was a big deal locally in the late 70's with the advent of the PAMPA tier system. Somehow it has faded into irrelevancy as it probably should. With the 500 floor for Expert there was a lot of talk about regional differences, especially locally where a 500 had you close to Gieseke.

My daughter was a competitive ice skater. Back then they used an Ordinal system nearly identical to what Paul and Charles suggest. The system throws out the high and low score then converts the middle remaining into ordinals then totals them into a single ordinal. It takes a minimum of 5 judges to wash out bias but it does work. It adds a bunch to manual tabulation but with today's technology that is easy to overcome. It is something to consider for National level judging. Finding 5 qualified judges to use locally would mean that no one flew! By the way, becoming a judge involved written exams and several "shadow judging" competitions where you needed to closely match the remaining judges. We don't need any of that but the ordinal system is worth looking at.

Ken

Brett Buck · « **Reply #6 on:** March 12, 2022, 12:30:41 PM »

Quote from: Ken Culbertson on March 12, 2022, 10:05:21 AM

This was a big deal locally in the late 70's with the advent of the PAMPA tier system. Somehow it has faded into irrelevancy as it probably should. With the 500 floor for Expert there was a lot of talk about regional differences, especially locally where a 500 had you close to Gieseke.

My daughter was a competitive ice skater. Back then they used an Ordinal system nearly identical to what Paul and Charles suggest. The system throws out the high and low score then converts the middle remaining into ordinals then totals them into a single ordinal. It takes a minimum of 5 judges to wash out bias but it does work. It adds a bunch to manual tabulation but with today's technology that is easy to overcome. It is something to consider for National level judging. Finding 5 qualified judges to use locally would mean that no one flew! By the way, becoming a judge involved written exams and several "shadow judging" competitions where you needed to closely match the remaining judges. We don't need any of that but the ordinal system is worth looking at.

Ken

There's a lot of stuff you can do along these lines. One defect with most of them, including ordinals, is that you have to wait until a round is over to the necessary calculations, because you don't know where someone might rank until the end. That's not fatal, but it is quite different.

But the kicker to any of these methods is that *they never result in a significant change* to the results. I and numerous others have taken sets of scores and processed them in any number of ways, and as long at the systems were entirely objective, it rarely made any difference in the answer over what we do normally.

One of the consistent fallacies of almost all these methods is the premise that the absolute (as opposed to relative) scores mean something relevant, or that the scores are somehow a random/stochastic process in which the "correct" answer is hidden, to be extracted with clever use of statistics. Any system that counts on either premise is inherently faulty.

My opinion is this - the ranking each judge gives the pilots is the relevant part of the scores they give, the absolute value means absolutely nothing out of this context. Think of it this way - the judges scores are all correct, by definition, but they report them in different units - like someone measuring pencils with various rulers, one calibrated in inches, one in centimeters, and one in feet. Even a perfect measurement yields a different absolute number from each judge - say, 8.125, 20.6375, 0.67798333, and in our case, we have no idea and no way to determine what "units" the judges are using and convert them to a consistent set. But you certainly cannot use statistics on it, and you certainly cannot say, a posteriori that the 20.6375 is "obviously out to lunch" and toss it.

Ordinal scoring, or normalizing the set of scores from each judge, has the effect of washing out some of this - but averageing them together and seeing which one comes out the biggest ALSO comes pretty close to normalizing it in most cases, assuming that the differences are "bias" differences (the high judge is always 4 points/maneuver higher than the rest) instead of "scale factor" errors (where the range is larger for the same level of error).

Having beat various score sets to death over the years using any system that avoids the faulty premises noted above ends up giving you *the same answer* as what we already do - in every case I looked at, the winner still won, and the top 3 were at most swapped around with each other. It got less good as the ranking went down - but at most the contest is to pick the winner and top 3, not to rank #35 and #36 with unerring accuracy. That's what *skill classes* were originally intended for.

One of the systems we still use that assumes faulty premise #1 (that the absolute number means something important) are skill class score ranges, which are obviously flawed and obviously not working as intended*. But no one seems to care about that, and even if they did (like me), they can't come up with anything better.

Brett

*if for no other reason than the "average" flight quality has gotten *drastically* better with the widespread availability of good engines with "overkill" performance. I mentioned it in another thread, but we have a young man locally that did the usual - got a good (but very modest) airplane, and jumped straight from "not able to complete an outside loop" to an advanced flier in *a few flying sessions* over the period of a few months, also with no signficant bad habits.

The corollary (if it has not been obvious from virtually every post I ever made) is that if you want to succeed in stunt in the 21st century, you have to *forget about vintage-type engines and techniques*, that is, anything before about 1988. Anything else is a difficult/nearly impossible row to hoe, and *you will not be able to out-work and out-practice anyone* to overcome it, because they are working just as hard as you are, just without crippling limitations.

Ken Culbertson · « **Reply #7 on:** March 12, 2022, 01:57:01 PM »

Quote from: Brett Buck on March 12, 2022, 12:30:41 PM

..... but at most the contest is to pick the winner and top 3, not to rank #35 and #36 with unerring accuracy.

This is about the only part of your response that I disagree with. In any contest I have attended with any significant number of fliers there are multiple contests going on within that contest. Success to #36 may have been to beat #35 and so on. I also see the potential in ordinals to improve qualifying. I am willing to wager that at least 5, more or less, of the fliers that miss the cut are better than the last 5 or so that made the cut. IMHO, by merging the ordinals from the various groups all the way to last you have a better chance of truly getting the top 20. But I see your point that trading one flawed system for another flawed system is pointless.

Ken

Brett Buck · « **Reply #8 on:** March 12, 2022, 02:40:49 PM »

Quote from: Ken Culbertson on March 12, 2022, 01:57:01 PM

This is about the only part of your response that I disagree with. In any contest I have attended with any significant number of fliers there are multiple contests going on within that contest. Success to #36 may have been to beat #35 and so on. I also see the potential in ordinals to improve qualifying. I am willing to wager that at least 5, more or less, of the fliers that miss the cut are better than the last 5 or so that made the cut. IMHO, by merging the ordinals from the various groups all the way to last you have a better chance of truly getting the top 20. But I see your point that trading one flawed system for another flawed system is pointless.

If you are finishing 35/36, that is about the same as 1/2/3 in Advanced (at least as originally intended), where, again, it works pretty well. The problem is that there is so much compression of essentially all the competitors into Advanced and Expert, and essentially no real competition in Intermediate or Beginner. So this is where the original goals of skill classes have been lost.

This used to be a big problem with age classes, you have 50-60 entries and, because it was patently hopeless for most people to beat Gieseke, McFarland, Gialdini, etc, people actually cared where you finished relative to your peer group, even when it was duking it out for 30th place. the idea of skill classes is to break up this group so that instead of fighting it out to 30th in Open, you could do the same for 1/2/3 in Advanced and maybe even get recognized for it with a trophy. Right now, that appears to be hopelessly broken,

Almost everyone skips beginner and goes straight to intermediate, and if you use modern techniques with some knowledgeable help, and haven't spent decades learning bad habits with ancient equipment, you might skip straight to Advanced (as per my example). So, you will have literal beginners in their first contest in Advanced. It seems rather ridiculous to me to have people who couldn't fly a pattern 3 months ago going straight to the "one step below David Fitzgerald" class, it seems ridiculous to everyone else, so they move up or get forced up by local peer pressure to Expert, packing the two classes where there used to be a reasonable distribution into four, and at least locally, into 1 class, expert.

So, what level of flying constitutes a "beginner" (which is literally true in my example) has *drastically changed* and now puts you in the middle of Advanced.

I would also note that there was a second issue with Beginner and Intermediate back when there was genuine competition in those classes. The problem is that it is devastatingly difficult to judge which array of unrecognizable maneuvers were worse than another array of unrecognizable maneuvers. But at least you had a chance when it was broken out by itself.

But *people don't do that any more*, "beginner" can mean reasonable shapes, reasonable sizes, with only occasional blown maneuvers, at least in decent conditions. Or rather, people who pay attention and have decent help start there. That used to be what happed at the high end of Advanced/Low Expert, now we have literal beginners doing it, and we have them entering Advanced to find meaningful competition.

If I was king of the universe, and I wanted to even it all out, I would reshuffle it and some people currently flying Expert might wind up in Intermediate - or Beginner. As it turns out, I am *not* the King of the Universe, and no one else seems to care much about this issue (this thread notwithstanding), so I am not going to do anything about it. It doesn't affect me in the least.

So we have wound up, again, with 40ish expert entries and people having their own battles over 30th who care what the answer is, and a not-very-good system for coming up with the result. And no particularly sound way of doing any better.

Brett

p.s. anyone else can see what I mean, take the results from any big local contests, figure there are 40 total unique entries, 25 experts, 12 advanced, 2 intermediate, 1 beginner, typically. Assume the results are all exactly right, put them all together, divide it into 4 groups of 10. 5 Expert entries wind up in Intermediate, based on reality and trying to balance the field. People might take immediate offense because "I am an Expert, not Intermediate" - but that's not what the results say, *because I am redefining what "Intermediate" means* That's why my "reshuffle" idea included redefinition to provide "cover" for doing it.

Note that it is also *exactly equivalent" to my A Main/B Main NATS qualifying idea from about 10 years ago (that, to be fair, Scott Riese first suggested - when we were having exactly the same discussion **22 years ago**).

p.p.s. I also note that this might have another side effect- guys that are still struggling with old techniques or have not taken/don't want to take the effort to learn or unlearn the problems they have developed, might be flying expert now, and would be *uncompetitive in the newly redefined beginner* after. Part of the psychological problem with this idea is that it would make people face up to some uncomfortable facts.

Doug Moon · « **Reply #9 on:** March 16, 2022, 01:35:46 PM »

Quote from: Brett Buck on March 12, 2022, 02:40:49 PM

Note that it is also *exactly equivalent" to my A Main/B Main NATS qualifying idea from about 10 years ago (that, to be fair, Scott Riese first suggested - when we were having exactly the same discussion **22 years ago**).

This has me curious...care to expand? I am always wondering about different ideas for qualifying etc...

Very nice write up above about the reshuffle. Makes very logical sense.

Brett Buck · « **Reply #10 on:** March 16, 2022, 02:44:40 PM »

Quote from: Doug Moon on March 16, 2022, 01:35:46 PM

This has me curious...care to expand? I am always wondering about different ideas for qualifying etc...

Very nice write up above about the reshuffle. Makes very logical sense.

Thanks! I am glad to hear someone actually read it, instead of (like a few) was "change beginner" as "another elitist that doesn't care about beginners" and start venting. Quite the contrary, of course - the reason beginner serves very little purpose as it stands, and there are very few entries, is that we care about and help beginners so much that they can just skip straight to advanced from nothing to pretty darn good, in a few months.

The side observation, which should be a surprise to no one, is that if someone, for whatever reason is not aware of or not interested in modern techniques, and/or has no expert assistance, they have been falling behind to the point that some expert pilots we have now would logically fall into the lower quartile of the skill distribution, i.e. might wind up in Beginner. It shows how far the event has advanced, in particular, how much better the "average" flights are than they used to be. This is of course a huge psychological issue with changing anything.

For those who might object to this observation, the solution is obvious. But there is also the hurdle that if you haven't been flying good equipment from the start, you have learned many very bad habits that mostly have to be unlearned, which might be pretty difficult. It can be done, most of the current top experts started in the dark ages and had to change, but it is a hard problem to overcome.

And plenty of people will vociferously argue the entire premise and think they are going to take their Nobler/Fox 35 out in the pasture, practice harder than everyone else, and teach us all a lesson! It's certainly not impossible, just very unlikely. I would add, I am in the same boat, trying to compete with something bordering on obsolete myself. But at least I'm not fooling myself about it.

The previous discussion of this topic of A Main/B Main qualifying is here

https://stunthanger.com/smf/rules-discussions/eliminate-advanced-and-expert-at-the-nats/msg370966/#msg370966

It is more or less the same as what I suggested earlier in this thread, - the difference being that you sort people the same way, but on-the-fly at a contest, based on their performance that day, rather than choosing it before the fact. I am convinced it would work pretty well with enough contestants (like the NATs), but there are some reasonable counter-arguments. I like it far better than trying to put skill classes as the only official events, all I see happening there is a bunch of people whining and fighting over the dreaded "sandbagging" for the next 30 years. It's already kind of like that with just Advanced.

Anyone worried about sandbagging ought to instead to practice, learn to trim, and learn to set up engines, and get out of Advanced into Open or Expert - don't have to worry about it there!

Brett

Ken Culbertson · « **Reply #11 on:** March 16, 2022, 04:54:37 PM »

Interesting idea. That is close to how it is done at many levels in swimming. There is an "A" and "B" final. The top 8 qualifiers go to the "A" 9=16 go to "B". At the NCAA national level the finals the "A" is your "All American" the "B" and DQ's from the "A" group become "All American - Honorable Mention" In other cases, they are split by qualifying times. The slower ones go to the "B" final and the faster ones to the "A" then the times are combined.

Ken

Brett Buck · « **Reply #12 on:** March 16, 2022, 06:48:01 PM »

Quote from: Ken Culbertson on March 16, 2022, 04:54:37 PM

Interesting idea. That is close to how it is done at many levels in swimming. There is an "A" and "B" final. The top 8 qualifiers go to the "A" 9=16 go to "B". At the NCAA national level the finals the "A" is your "All American" the "B" and DQ's from the "A" group become "All American - Honorable Mention" In other cases, they are split by qualifying times. The slower ones go to the "B" final and the faster ones to the "A" then the times are combined.

For those who missed the other thread, the basic idea is to take all the current NATs events (Open, Advanced and the unofficial Intermediate), and fly them all together for qualifying, with no consideration for the skill class or any predisposition aside from the usual seeding when you break the field in to 4 groups.

After qualifying, you take the top 5 from each circle, call that the A Main, that gets you the Top 20 for Friday - more or less like it is now, except that guys that would have entered Advanced or Intermediate have the same shot to make it, based on the qualification flights.

Also, for Friday, you take finishers 6-10 from each circle. they go into a different Top 20 called the B Main. This replaces the current Advanced Top 20 - you might have erstwhile "Experts" who didn't fly so well. or guys who would have been in Intermediate who performed better than they expected make this flyoff/

Optionally, you take 11-15, have a C Main.

Top 5 from the A Main go to the flyoff on Saturday more or less like always.

There's really no change for the potential winners who would have made the flyoff under the current system. Where it is different is that you select the Top 20, then the second 20, and maybe the third 20. This will more-or-less correctly rank everyone in the contest - where, truth be told, the guy who finished 21st in Open is probably a lot better than the best Advanced pilot, now #21 doesn't get as shafted as before. It also holds out the prospect that someone who would have flown Advanced making it to the Top 20 in the A Main, where now, the division prevents that.

Additionally, the guys who flew the unofficial Intermediate event get 4 flights on the L-Pad with the same judges and the same conditions as everyone else, instead of two flights on the grass. Again, the skill variability in Intermediate is huge, there might be guys who fly way better than the rest and make it all the way to the B Main. You also know exactly how you stacked up relative to the potential winners, there is no ambiguity or guessing about it.

Everything else runs more-or-less as normal, you get two flights on Wednesday, Two on Thursday, add the best Wednesday and best Thursday scores, you still only compete against the guys in your circle, and if you were to finish 6th in Advanced or Open, you are almost a qualifier - in the current system your week is over, in this system you are still competing on Friday, just for a lesser goal. The workload is *slightly* increased, because there will be about 3-4 additional entrants (former Intermediate fliers) per circle.

Note also that, depending on the turnout, you could select 16 (4 per circle) instead of 20 per A or B Main, to make it more reasonable and maintain the competitiveness.

For the winners, the system looks more-or-less unchanged. It rewards the guys who miss cuts (sometimes by fractions of a point) and recognizes that they flew pretty good, too, much better than separating people into pre-ordained distinctions.

The other issue, that I consider almost irrelevant, is that you would actually enter Open, which requires BOM. You are going into a contest where if you do really well, you might win the Open National Championship - so, you have to do it the same way everyone else did.

Brett

Ken Culbertson · « **Reply #13 on:** March 16, 2022, 09:02:22 PM »

Quote from: Brett Buck on March 16, 2022, 06:48:01 PM

For the winners, the system looks more-or-less unchanged. It rewards the guys who miss cuts (sometimes by fractions of a point) and recognizes that they flew pretty good, too, much better than separating people into pre-ordained distinctions.

The other issue, that I consider almost irrelevant, is that you would actually enter Open, which requires BOM. You are going into a contest where if you do really well, you might win the Open National Championship - so, you have to do it the same way everyone else did.

Brett

This is a really good idea (all of the post, I thought I would save space). Using Ordinals for scoring would make it perfect, but I will leave that to those with skin in the game. With computers it takes no more time than averaging. Make this happen, it's time has come. Only drawback will be the BOM.

Ken

Author Topic: Nats judging scores. (Read 1220 times)

Dick Byron

Nats judging scores.

EricV

Re: Nats judging scores.

Howard Rush

Re: Nats judging scores.

Sean McEntee

Re: Nats judging scores.

Tim Wescott

Re: Nats judging scores.

Ken Culbertson

Re: Nats judging scores.

Brett Buck

Re: Nats judging scores.

Ken Culbertson

Re: Nats judging scores.

Brett Buck

Re: Nats judging scores.

Doug Moon

Re: Nats judging scores.

Brett Buck

Re: Nats judging scores.

Ken Culbertson

Re: Nats judging scores.

Brett Buck

Re: Nats judging scores.

Ken Culbertson

Re: Nats judging scores.