stunthanger.com
General control line discussion => Open Forum => Topic started by: RC Storick on July 22, 2014, 04:02:18 PM
-
Why is it the program has the head judge and the experienced judges ,judging advanced instead of open? Seems to me there is amiss in the program. Not saying it would make any difference but it sure seems odd to me. Anyone else notice this?
Seems that the judges that track well and don't use the full range are rewarded and the ones who judge accordingly are penalized.
Too many anomalies for instance (not saying it would make a difference) But I started out in group D with Jose and then the re shuffle and ended up in group C to fly against all the NATS champs. Water under the bridge but I watch everything.
-
Why is it the program has the head judge and the experienced judges ,judging advanced instead of open? Seems to me there is amiss in the program. Not saying it would make any difference but it sure seems odd to me. Anyone else notice this?
Seems that the judges that track well and don't use the full range are rewarded and the ones who judge accordingly are penalized.
Too many anomalies for instance (not saying it would make a difference) But I started out in group D with Jose and then the re shuffle and ended up in group C to fly against all the NATS champs. Water under the bridge but I watch everything.
It's always something, huh Bob?
-
It's always something, huh Bob?
Yep. This was told to my by two different NATS Judges.
-
You'd need to ask Bob and Curt how they assigned judges.
You seem to have an erroneous idea of how the program works. Whether a judge uses the full range of score does not affect his ranking. His ranking is determined by how well his ordering of contestants agrees with the consensus.
That program is public. I don't know if I've sent you a copy, but I have sent it to a bunch of people, particularly critics, and I have explained many times on these fora how the judge ranking works. I have also put out many requests to tell me if it is statistically valid and to review how scores are normalized with the new "How the Nats Will Be Ran" process introduced last year. I have received no input. Hey, I'm not asking folks to do the math, just to read what's there. If you think something is wrong, at least look at what I've written.
-
It would be so refreshing if just one time there could be a Nats without someone implying some sort of "conspiracy theory" regarding the outcome. Wouldn't that be cool Robert?
-
You'd need to ask Bob and Curt how they assigned judges.
You seem to have an erroneous idea of how the program works. Whether a judge uses the full range of score does not affect his ranking. His ranking is determined by how well his ordering of contestants agrees with the consensus.
That program is public. I don't know if I've sent you a copy, but I have sent it to a bunch of people, particularly critics, and I have explained many times on these fora how the judge ranking works. I have also put out many requests to tell me if it is statistically valid and to review how scores are normalized with the new "How the Nats Will Be Ran" process introduced last year. I have received no input. Hey, I'm not asking folks to do the math, just to read what's there. If you think something is wrong, at least look at what I've written.
Seems that every time something is questioned the guy who is doing Thar questioning is labeled a critic. I guess that's why the job landed on me. If no one speaks out on anything everything will remain the same. Thanks for your work on the program Howard not discounting that. But because this is a subjective process of judging should not the selection of the judges be subjective and at the Head Judges discretion?
Meaning the judges with the most experience to judge on top 20 day open and not advanced.
It would be so refreshing if just one time there could be a Nats without someone implying some sort of "conspiracy theory" regarding the outcome. Wouldn't that be cool Robert?
Bill when was the last NATS you attended? I am not saying there is any conspiracy to the Judging. I guess no one is allowed to question status Que?
Definition of Insanity is doing the same thing over and over expecting different results.
-
Seems that the judges that track well and don't use the full range are rewarded and the ones who judge accordingly are penalized.
Biggest pile of bonk! Instead of "I'm watching" or "some judge who wants to remain nameless told me" drivel, you should actually get trained and spend ONE, just ONE NATS as a judge.
For YEARS, I've been reading on SSW and SH about conspiracies and other drivel about NATS judging. I am a thinking man and I REFUSE to base my own perceptions on someone else's opinion. Instead, I spent FOUR years volunteering to judge every contest I could get to and then ACTUALLY volunteering to go judge at the NATS. I spent close to 3 THOUSAND dollars on that trip, stood for 8 hours in the blistering sun while pilots hung out in the shade, relaxing and talking. In the end, I learned for myself what NATS judging is about. What you are suggesting is COMPLETELY UNFOUNDED AND 100% HEARSAY.
-
(I use the word you in this post. I am using that as a general term not directed at anyone in particular)
I think the program picks the judges correctly.
It takes the judges who agreed with the majority and groups them.
You should want the judges who see the pattern the same or close to it judging you. It will create consistency. If you have judges who are all using different criteria then you will have no clue what is going on when you look at your scores/sheets. Your frustration will only multiply at that point. Judges who are agreeing across the board with the overall outcome should be using similar criteria. Their scoring range is not and should not be taken into account. Their years of experience I would think would also not be taken into account as well. At the point where the program is picking the judges for the next rounds they have all been through the nats judging meeting and warms ups and discussions etc.
I feel the judging was fair. On Friday my first flight was so so. My second flight may have been my best judged flight of the week and they scored me for it too.
Thank you Howard for creating a program that can track all of that information and create a judging corp based on their production and not their name/experience.
Also as a judge working the nats you know you are selected based on your output and not your name and or years of experience in the game. The program cannot be biased in that way. It only knows numbers and that's it. As a judge you have to be honest and fair and call it like you see it and let the program decide the group of judges for each class of flying. Its a great situation to be in if you ask me.
-
Definition of Insanity is doing the same thing over and over expecting different results.
Who said we need different results? There has not been ONE instance of the wrong pilot getting the win since the current judging program been instituted!
Please don't hide behind the innocent of "Am I not allowed to ask questions?". Your questions a similar to asking regular folk "How often do you expose yourself to little children?" or "When was the last time you beat up your wife?".
-
As a judge you have to be honest and fair and call it like you see it and let the program decide the group of judges for each class of flying. Its a great situation to be in if you ask me.
I guess Mark, Steve and Dale (don't forget Bruce) must not have chosen the right numbers as opposed to the new judges. I can see this is leading nowhere. Status quo shall remain in play. I have learned my lesson.
Anytime someone questions status quo it goes against the Holy Grail of stunt.
Where the pilots chosen in the right order? Probably I was just speaking up on how they got there.
-
You guys are unfairly jumping on Sparky. I know what he is talking about but to even mention that something is amiss makes you look like a poor sport. There truly is a problem with a formula that puts the Head Judge anywhere except with the best pilots. The head judge is the person responsible for training all the other judges. If we cannot trust his judgment, then who can we trust?
Derek
P.S. Mark had to be entered manually into the finals on Saturday. Thankfully the ED felt that it was wrong to have him anywhere else.
I have also offered a solution to make the formula work better but it was met with resistance...
-
Who said we need different results? There has not been ONE instance of the wrong pilot getting the win since the current judging program been instituted!
Please don't hide behind the innocent of "Am I not allowed to ask questions?". Your questions a similar to asking regular folk "How often do you expose yourself to little children?" or "When was the last time you beat up your wife?".
I am stating FACTS! Read this
You guys are unfairly jumping on Sparky. I know what he is talking about but to even mention that something is amiss makes you look like a poor sport. There truly is a problem with a formula that puts the Head Judge anywhere except with the best pilots. The head judge is the person responsible for training all the other judges. If we cannot trust his judgment, then who can we trust?
Derek
P.S. Mark had to be entered manually into the finals on Saturday. Thankfully the ED felt that it was wrong to have him anywhere else.
I have also offered a solution to make the formula work better but it was met with resistance...
I had no dog in the fight so as a outside observer I can only report what I saw.
-
I just want to state that I am not angry about anything that happened at the Nats, I could have flown a little better and made the cut. I just do honestly feel like the formula does not work as it was intended. You know what the say about good intentions...
Derek
-
Just as an observer from the other events, Stunt is fortuante to have enough volunteers to even think about accepting some and rejecting others. Most events are hard-pressed to fill the miniumum requirement.
-
You guys are unfairly jumping on Sparky. I know what he is talking about but to even mention that something is amiss makes you look like a poor sport. There truly is a problem with a formula that puts the Head Judge anywhere except with the best pilots. The head judge is the person responsible for training all the other judges. If we cannot trust his judgment, then who can we trust?
Derek
P.S. Mark had to be entered manually into the finals on Saturday. Thankfully the ED felt that it was wrong to have him anywhere else.
I have also offered a solution to make the formula work better but it was met with resistance...
Derek is making a very good point!!!!!!
-
Derek is making a very good point!!!!!!
No one listens to me. It took a top 5 flier to chime in before it was a good point. If I am watching so is everyone else, Interesting.
I also find it interesting that some one said they had a idea to get Ball State students to judge (like it was a new idea) when I mentioned that 6 years ago. They would be good at it just for that fact they are 1/2 the age of most of the current field and can see twice as good. I had then heard they don't know what to look for (I say good) but they could be trained just like the judging pool we have now.
Could we get a pool from them? I don't know but if I got a GO I would be willing to make the phone calls.
-
Derek is making a very good point!!!!!!
Great! Now all that is needed is a formula for the program. We went to a formula because people didn't like the subjective picks.
No matter what is done someone will complain.
-
No matter what is done someone will complain.
It was a observation not a complaint . Here is a novel idea Use something like this http://www.random.org/integers/ then no one would complain. It would choose flight order and circle selection in a matter of seconds not hours. and its totally random.
Thanks to everyone who made this contest possible.
-
Just as an observer from the other events, Stunt is fortuante to have enough volunteers to even think about accepting some and rejecting others. Most events are hard-pressed to fill the miniumum requirement.
You are absolutely correct! And we had a great pool of judges, great tabulators, great ED and assistant ED, and great pull testers. We could not have a contest without these people!
Derek
-
Derek is making a very good point!!!!!!
Great! Now all that is needed is a formula for the program. We went to a formula because people didn't like the subjective picks.
No matter what is done someone will complain.
-
Great! Now all that is needed is a formula for the program. We went to a formula because people didn't like the subjective picks.
No matter what is done someone will complain.
You are correct, someone will always complain.
A few years ago I offered Howard some suggestions but they didn't go very far.
Derek
-
Let me make this Krystal Clear! I am not complaining I am only giving a observation on what goes on. Anytime someone questions status quo its against the HOLY GRAIL of stunt. http://www.random.org/integers/ If this system was used on the true luck of the draw the end results may or may not change. It does not matter to me how you want to play. My lessons have been learned. Play with that link and envision how it would work. I am sure it all the top guys were put on one circle there would be some complaining then.
-
There truly is a problem with a formula that puts the Head Judge anywhere except with the best pilots. The head judge is the person responsible for training all the other judges. If we cannot trust his judgment, then who can we trust?
P.S. Mark had to be entered manually into the finals on Saturday. Thankfully the ED felt that it was wrong to have him anywhere else.
I have also offered a solution to make the formula work better but it was met with resistance...
As I remember, your suggestion had some subjective stuff in it. My initial agreement with Paul when he ran the Nats was that I would help him with the tabulation program only if everything in it was open and objective, because I didn't want any of the noisy, chronic losers to have anything to complain about when I won. I had seen a lot of trouble at previous Nats caused by those losers exploiting the subjective seeding and judge selection used at the time because objective formulas weren't available. I think the objective judge selection we have could use improvement, but even a mediocre objective method beats any subjective method. As I have said many times, the "How the Nats Will Be Ran" proposal that introduced "Expert" to the Nats may have created normalization errors in the judge-picking formula. So lets see some math.
If the head judge didn't rate as well as the others, I wouldn't have intervened.
-
Let me make this Krystal Clear! I am not complaining I am only giving a observation on what goes on. Anytime someone questions status quo its against the HOLY GRAIL of stunt. http://www.random.org/integers/ If this system was used on the true luck of the draw the end results may or may not change.
That's how we pick the flight order for local contests. Some may see this as mysterious and giving opportunity for manipulation, so we use pingpong balls at the Nats. Paul didn't even want to use Betty Adamisin's method to map the first round draw into the other three rounds, even though it's open and objective.
Peabody thinks we should pick judges randomly, as I think you are suggesting. I think that idea has merit.
-
As I remember, your suggestion had some subjective stuff in it.
You are probably correct. I understand and respect your stance on the matter.
And as Paul stated, whatever method we use there will be some that do not like it. All in all, what we have now is better than what we use to have...
Derek
-
You guys are unfairly jumping on Sparky. I know what he is talking about but to even mention that something is amiss makes you look like a poor sport. There truly is a problem with a formula that puts the Head Judge anywhere except with the best pilots. The head judge is the person responsible for training all the other judges. If we cannot trust his judgment, then who can we trust?
Derek,
I disagree. Last year Mark WAS NOT the most consistent judge. Granted, he was No.2 but not because he is a head judge. If Mark was off this year, the system would have identified it.
-
Great! Now all that is needed is a formula for the program. We went to a formula because people didn't like the subjective picks.
No matter what is done someone will complain.
Let's not jump to conclusions here. I am in no way shape or form complaining. Just thought he made a good point.
The contest was awesome! The weather was great. It was cool and I got to wear my favorite ElectriFly.com t-shirt many times! :) I flew good and others flew good and David flew best when it mattered. This is all discussion stuff.
I do find it odd when people come up with ideas that were already used and complained about in the past as a new way to do it. Robert says use Ball State students. That would be similar to Navy judges back in the day yes? I have heard they were partial to the kiddos or navy painted planes etc. etc. etc....
Its all just dicussion from here. No complaining!!!!!!! NONE ZIP ZERO NADDA!!!!
-
That would be similar to Navy judges back in the day yes?
No
-
Derek,
I disagree. Last year Mark WAS NOT the most consistent judge. Granted, he was No.2 but not because he is a head judge. If Mark was off this year, the system would have identified it.
Hey Steve,
I guess we will just have to agree to disagree. I don't care if he was not the most consistent if he was the most correct.
This is not worth arguing over anyways, the guys that come to judge every year always do their best and that is all we can ask of any of them.
Derek
-
I don't care if he was not the most consistent if he was the most correct.
So come up with a formula that measures correctness.
-
So come up with a formula that measures correctness.
Correctness is subjective.
Derek
-
I've looked at stuff like full-scale aerobatics scoring methods. Folks seem to think that the method we use measures correctness. I'm not so sure, and I haven't seen any convincing statistical justification of it. Other sports use it, so you'd think somebody would have written a scholarly paper sometime. I would be interested in discussing the math after August.
-
I've looked at stuff like full-scale aerobatics scoring methods. Folks seem to think that the method we use measures correctness. I'm not so sure, and I haven't seen any convincing statistical justification of it. Other sports use it, so you'd think somebody would have written a scholarly paper sometime. I would be interested in discussing the math after August.
Sounds good Howard, keep practicing and go kick some ass in Poland.
Derek
-
I've looked at stuff like full-scale aerobatics scoring methods. Folks seem to think that the method we use measures correctness. I'm not so sure, and I haven't seen any convincing statistical justification of it. Other sports use it, so you'd think somebody would have written a scholarly paper sometime. I would be interested in discussing the math after August.
Look at Olympics gymnastics or figure skating,diving or any of the 100 Olympic judging aspects. =640
-
I've looked at stuff like full-scale aerobatics scoring methods. Folks seem to think that the method we use measures correctness. I'm not so sure, and I haven't seen any convincing statistical justification of it. Other sports use it, so you'd think somebody would have written a scholarly paper sometime. I would be interested in discussing the math after August.
Howard,
I am not arguing with you or anybody else here. George Buffalano developed a methodology for our stunt contests based on the program used for full scale aerobatics. Bob Baron was pushing hard for this. It is based on normalizing scores by each judge for each maneuver. The system can detect incorrect scoring by individual judges and to some degree, bias, and can offer a ranking of the judges. George performed analyses for the CIAM after several World Championships using his program. On my recommendation and urging from Bob Baron, George wrote a paper in Stunt News several years ago that presented this concept. I wrote an introduction for George on that article. The article received absolutely zero response from the PAMPA membership. I made reference to that article and when it was published on one of these forums a year or so ago and again, zero response.
One characteristic of this program, as I understand it, is that no rankings or scores are available until all of the flights are completed, the scores entered into the computer, the computer does its thing, and then the final scores are printed and the rankings of the pilots are available. I do not know if our event is ready for that process.
The full scale aerobatic community has embraced this program as bias in the international aerobatic judging community was previously causing considerable strife and dissention to the point of eliminating the international program.
Keith
-
Interesting thread. As one of the judges this year I can say this. The training wasn't what you'd call "ground up", it was really more of a discussion session to point out safety and procedural issues with some discussion about where maneuvers begin and end, and other nuances. All the judges were experienced. The most interesting point was that in the consistency rating, there was a very small variation, the least consistent being a 2.25 which to my understanding is a very narrow range. This, also to my understanding would have qualified any of this years judges as top 20 last year which had a range nearing 5. The Head Judge is not necessarily the most qualified, he is an organizer, not a boss. Mark did an exceptional job in my opinion and himself admitted he is not an "expert", whatever that might be.
Personally it was a very rewarding however exhausting experience. the daily warm-up flights put everyone on the same page first thing. For me, it was a chance to compare my style and observations with judges other then our local guys. I was pleased to find that I fit in very closely with the group, seeing the same things an scoring very consistently.
-
Let me make this Krystal Clear! I am not complaining I am only giving a observation on what goes on. Anytime someone questions status quo its against the HOLY GRAIL of stunt.
Not at all, people question it all the time. Almost all of the time the questions are carefully and clearly answered, but do not resolve the issue, because the answer is at odds with the questioner's pre-concieved notions, or doesn't make mathematical sense. Usually because they try to determine what needs to be changed first, and then work backwards to come up with a problem and the justification for the change.
http://www.random.org/integers/ If this system was used on the true luck of the draw the end results may or may not change. It does not matter to me how you want to play. My lessons have been learned. Play with that link and envision how it would work. I am sure it all the top guys were put on one circle there would be some complaining then.
No one involved fails to understand random numbers, or how "random" they need to be.
Random placement was done for years, there was bitter complaints afterwards. Manual seeding was used, there were fewer complaints, although it indirectly led to *death threats*. Now, automatic seeding is used, which more-or-less implements the manual method, and there are very few complaints. It was a little off this year because Billy and I had some gaps in our attendance, so seeded lower than normal.
Same with judge placement, now no one can plausibly accuse anyone else of rigging the system to favor particular fliers. It wasn't happening with the manual method, and it is not happening now, so of course the results of the contest are largely as before.
Brett
-
I'm so thankful that we have Mark and the rest of the judges. I think the judging is now more consistent than ever.
I judged classic with Steve Smith as my partner and after the contest was over I compared some of his scores with mine and was some what amazed at how close we were with our numbers., In the end I think we got it right. Bill Rutherford
-
Hey Howard... as far as picking judges randomly has anyone considered of having them grabbing a marked Ping Pong ball from a small paper bag as the pilots do?
John 8)
-
Hey Howard... as far as picking judges randomly has anyone considered of having them grabbing a marked Ping Pong ball from a small paper bag as the pilots do?
As much work as it would be to judge the Nats, maybe it would be more appropriate to have the ED draw a ping-pong ball out of a paper bag and then chase the guy down before he much judge.
-
I am not arguing with you or anybody else here. George Buffalano developed a methodology for our stunt contests based on the program used for full scale aerobatics. Bob Baron was pushing hard for this. It is based on normalizing scores by each judge for each maneuver. The system can detect incorrect scoring by individual judges and to some degree, bias, and can offer a ranking of the judges. George performed analyses for the CIAM after several World Championships using his program. On my recommendation and urging from Bob Baron, George wrote a paper in Stunt News several years ago that presented this concept. I wrote an introduction for George on that article. The article received absolutely zero response from the PAMPA membership. I made reference to that article and when it was published on one of these forums a year or so ago and again, zero response.
One characteristic of this program, as I understand it, is that no rankings or scores are available until all of the flights are completed, the scores entered into the computer, the computer does its thing, and then the final scores are printed and the rankings of the pilots are available. I do not know if our event is ready for that process.
The full scale aerobatic community has embraced this program as bias in the international aerobatic judging community was previously causing considerable strife and dissention to the point of eliminating the international program.
I think we had this conversation before. I don't remember reading Dr. Buffalano's piece, but I got the reference to the full-scale aerobatics method, which is equivalent to what we now use except for the difference in how contestants are scored. I remain sceptical about its statistical validity. I had some correspondence with Dr. Buffalano. As I recall, he was more interested in the problem of the differences between the top fliers flying being less than what judges can distinguish. That's an interesting problem, too. Phil Cartier has also studied this. The upshot is to use as many judges as possible on finals day.
I think our current judge ranking method is erroneous because of changes brought by the addition of Expert. I continue to ask for help fixing it.
-
Look at Olympics gymnastics or figure skating,diving or any of the 100 Olympic judging aspects. =640
Send me their formulae. What I've seen of them is that they emphasize minimizing the effect of dishonest judges. We don't have that problem.
-
Look at Olympics gymnastics or figure skating,diving or any of the 100 Olympic judging aspects. =640
In the Olympics they use high-res stop action photography to micro-analyze every little move and second-guess the jury.
In model airplanes we don't have it, couldn't afford it, and are better off without it. I expect the Olympic events have enough cash to make judging really worthwhile, unlike us.
Maybe the chance of catching a lucky break from the judges is what keeps some people interested.
-
Send me their formulae. What I've seen of them is that they emphasize minimizing the effect of dishonest judges. We don't have that problem.
There are some who don't buy that argument, as recently as last week. They are wrong, but sincere.
Our problem is fundamentally different. We have the judges we have, and they are pretty uniformly good. But, you have to have some of them do Open and the others do Advanced (now that Expert will be gone again), Junior, and Senior. You have to decide somehow which do which. Picking them by hand using the tracking method led to false but persistent accusations of cheating, so we now accomplish the tracking with a computer algorithm that does mostly the same thing. The one consequential difference is that there is no check and no distinction that would select out "high" or "low" judges, which was definitely a feature of the old manual system.
The one legitimate issue with the current system is that it counts on what amounts to majority voting logic to determine "consistency". But there are only 3 judges per circle when the important tracking decisions are made. It's entirely possible that you get a group with one highly experienced judge and two relative newbies. You might expect the experienced judge to frequently be inconsistent with the newbies, which will tend to defeat the tracking system. I think this explains Derek's problem. You really need a lot of people judging the same flights to make it work, but we can't do that. I think the manual tracking method, with the added personal judgement possible, might have been less prone to these sorts of seeming anomalies. And got someone to threaten to shoot Gary McClellan.
Brett
-
all deleted. the discussion really doesn't concern or involve me.
Floyd
-
The one legitimate issue with the current system is that it counts on what amounts to majority voting logic to determine "consistency". But there are only 3 judges per circle when the important tracking decisions are made. It's entirely possible that you get a group with one highly experienced judge and two relative newbies. You might expect the experienced judge to frequently be inconsistent with the newbies, which will tend to defeat the tracking system. I think this explains Derek's problem. You really need a lot of people judging the same flights to make it work, but we can't do that. I think the manual tracking method, with the added personal judgement possible, might have been less prone to these sorts of seeming anomalies. And got someone to threaten to shoot Gary McClellan.
Brett
Thank you Brett, that is exactly my problem with the formula.
No Howard, I do not have a mathematical solution to fix it either. The only thing I could come up with was to remove the top two pilots from each circle. This would take the "Big Names" out of the equation just in case there was any unintentional bias. It still does not fix the problem that Brett described which, in my opinion, is the real problem.
Derek
-
Anything objective that you can do manually you should be able to write a formula for.
-
It's entirely possible that you get a group with one highly experienced judge and two relative newbies. You might expect the experienced judge to frequently be inconsistent with the newbies, which will tend to defeat the tracking system.
I would expect the experienced judge to be more closely correlated with either newbie than the newbies would be with each other.
-
The one consequential difference is that there is no check and no distinction that would select out "high" or "low" judges, which was definitely a feature of the old manual system.
Nor should there be. The high or low judge may be the best judge.
-
I think the word "newbie" sounds funny.... :)
-
I would expect the experienced judge to be more closely correlated with either newbie than the newbies would be with each other.
Possibly, but not certainly.
Derek
-
Nor should there be. The high or low judge may be the best judge.
Agreed. Shareen and I ran down one of our walkie-talkie batteries arguing about this between Cheyenne, WY and North Platte, NE one year, and apparently it was a lively topic of discussion inside their van every year as well. They selected for a total score range of around 535 for the best pilots. The guys averaging 570 were frequently excluded regardless of how well it correlated to the other rankings. I think this has the effect of selecting for narrow scoring range as well as a total score target, which is why we routinely had a few points total difference from 1-5th place.
Brett
-
Agreed. Shareen and I ran down one of our walkie-talkie batteries arguing about this between Cheyenne, WY and North Platte, NE one year, and apparently it was a lively topic of discussion inside their van every year as well. They selected for a total score range of around 535 for the best pilots. The guys averaging 570 were frequently excluded regardless of how well it correlated to the other rankings. I think this has the effect of selecting for narrow scoring range as well as a total score target, which is why we routinely had a few points total difference from 1-5th place.
Brett
Which is not necessarily a bad thing. Top guys usually fly within a few points of each other. It really takes a stand out pattern to score 10 points higher than the competition in top 5 situations.
Derek
-
Dave Cook has some great thoughts on judging, and judging the computers of today....
As I recall, he advocated tossing the high and low judges' score per maneuver when more than three were used...
He has some cool ideas, although a bit contrary....
-
Dave Cook has some great thoughts on judging, and judging the computers of today....
As I recall, he advocated tossing the high and low judges' score per maneuver when more than three were used...
He has some cool ideas, although a bit contrary....
I do not think you will ever see more than 3 per circle until Saturday used, and dumping the high and low judge, or some of their scores is the wrong thing to do, many times you are dumping the best , most consistent judge, and keeping the one who is bracketing.
What does judging the computers mean? and what does that have do to with judges scoring a stunt contest? computers just do whatever they are told to do, and calculate really fast.
Randy
-
Dave Cook has some great thoughts on judging, and judging the computers of today....
As I recall, he advocated tossing the high and low judges' score per maneuver when more than three were used...
He has some cool ideas, although a bit contrary....
Tossing out high/low is never a good idea, it operates on the presumption that there is a correct score. It will automatically toss out judges that run "high" or "low" normally, whether they are doing a good job or not. Buffalano's method (which involves normalizing the scores first, and is very similar to the methods I used to evaluate some previous results) solves this issue but has the fatal flaw of having to wait until all the scores are available. This method is more-or-less doing the same tracking method used to select judges like we have already done, but looks at an entire round and then excludes them for that round itself, at which point the others are used.
Again, this might work for Top 20 and Top 5, but there aren't going to ever be enough judges per circle to do it for qualifying. And, when I have taken individual sets of results and applied various methods, it never once changed the winner, and it only occasionally changed any of the placements by one or two.
If you process the scores with 3-4-5 different methods, and they all yield the same results, it tells you that all the fiddling or tweaking with format and processing is more-or-less irrelevant, AND, that you should probably choose the simplest version.
Brett
-
Here is a question. How can you accurately track a judge until the contest is over? Or is it done on a whim and what is being tracked?
-
I would expect the experienced judge to be more closely correlated with either newbie than the newbies would be with each other.
On average, that is probably true. It does raise the likelihood of occasional anomalies in the selection process. It seems unlikely to have any significant effect on the results.
Brett
-
Here is a question. How can you accurately track a judge until the contest is over? Or is it done on a whim and what is being tracked?
We track them during qualification rounds. That ranking information can be used to assign judges on Friday. Saturday judge assignments are (or can be) based on judge rankings through Friday.
-
Here is a question. How can you accurately track a judge until the contest is over? Or is it done on a whim and what is being tracked?
You have the same sort of information at any point in the contest, including when it is over. So Top 20 selections are made based on the tracking from Qualifying, and Top 5 from the Top 20 day tracking.
It is certainly not being done on a whim, it is entirely algorithmic and you can have the algorithm to examine for yourself. What is being tracked is as described, it is a comparison of each individual's flight score VS the others in the judging panel.
Brett
-
So what is being tracked? If a judge gives a bad score to a top guy. If a Judge is too far from the rest? If a judge is not in a certain range?
So if 2 judges give a 32-35 and the low man gives a 22 ? Who's right? Is it not suppose to be subjective?
-
So what is being tracked? If a judge gives a bad score to a top guy. If a Judge is too far from the rest? If a judge is not in a certain range?
I don't know how to explain it any better, it compares each individual total flight score to the average of the others for the same flight, without regard to who it is, or the raw absolute value. Those with the best correlation to the others are ranked higher, those who deviate ranked lower. Basically it is who is furthest from the rest (after normalizing over the day).
The algorithm is yours for the asking, in fact, Howard is anxious to provide the source code so you can see how it works in any detail necessary.
Brett
-
So what is being tracked? If a judge gives a bad score to a top guy. If a Judge is too far from the rest? If a judge is not in a certain range?
So if 2 judges give a 32-35 and the low man gives a 22 ? Who's right? Is it not suppose to be subjective?
Its not done on a maneuver-to-maneuver basis, it's on the total flight score.
Brett
-
It would not make much sense to me so no need to send it. I just see it as its suppose to be subjective until the math kicks in then its not.
So if one judge is 40 points off they do not make the cut? Who was right? The low or the high?
My new program awaits at the post office right now.
-
I just see it as its suppose to be subjective until the math kicks in then its not.
The current judge tracking algorithm is not subjective in any way. The scores themeselves are subjective but once the scores are entered, there is no subjectivity at all. That's the entire point of having an algorithm, as opposed to the very similar eyeball method used by Warren and Shareen.
Brett
-
It doesn't go by score. It looks at the order in which each judge ranks contestants: 1st, 2nd, 3rd, and so forth. Each time a judge's ranking of a guy is a notch off the average ranking (the one that appears on the scoreboard), he gets dinged a point. There's some other stuff involved, but that's the main idea. I've posted the formula here or on SSW. It's done flight-by-flight, but it could also be done maneuver-by-maneuver. I'll change if somebody shows me that maneuver-by-maneuver is better.
-
I was just trying to understand how a subjective event can be done by a computer program that by its nature is not subjective. Just as always said the best will always win. I can see the extreme complaining if done by luck of the draw and the guys who are all the best wound up on one circle and only 5 got to move on.
When a Judge looks at a Vincent Van Gogh or a Michael Angelo and compair it to a Renoir or a Warhol and the same set of judges look at those artists a year later not one of the judges will change his mind. While they are all great in their own right. Very similar to what we have today. Same judges ,judging the same fliers on a different day.
By they way you would not my critique on these painters. But as in any subjective event you will get different opinions.
-
Dave Cook has some great thoughts on judging, and judging the computers of today....
As I recall, he advocated tossing the high and low judges' score per maneuver when more than three were used...
He has some cool ideas, although a bit contrary....
I am going to add to Brett's response to this idea of throwing out high and low scores. As Brett said, it is not a good idea. And it is not a good idea for a number of reasons. You may be throwing out the scores of the judges who are more accurately assessing the relative merits of one flight over another. Further more, it has been my experience that some judges become concerned that their scores may be consistently thrown out for being high or low will tend to narrow their range of scoring when indeed the individual maneuvers being flown deserve a wider range of scoring.
In the analyses by Dr. Buffalano for the CIAM a number of years ago, he was asked to run the scores from the judges from the results of several World Championships, in one case averaging all of the judges' scores for each flight and in the second case discarding the high and low scores for each flight. This was like more than 15 years ago in the day when there were 5 judges used in the finals (15 pilots, best 2 of 3 flights used to determine finals placement). The results were basically the same with sometimes a change in who was awarded say a 7th or 8th place. This was also for a period where there was not the blatant and obvious bias that has been experienced during certain years.
Keith
-
I was just trying to understand how a subjective event can be done by a computer program that by its nature is not subjective. Just as always said the best will always win. I can see the extreme complaining if done by luck of the draw and the guys who are all the best wound up on one circle and only 5 got to move on.
Now you are talking about the seeding. That, too, is entirely algorithmic, and not subjective. It uses selected contest results (NATS and TT) to come up with a seeding rank for each pilot who has a result in the contests. It is far from perfect as indicated by my ranking and Billy's ranking, since we were missing a fair number of these contests, and at least Billy hasn't forgotten how to fly.
You are absolutely right about the random pilot group selection. The big problem would not be the top pilots failing to qualify, it would be with the guys on the edge getting locked out on one circle, and a free ride on a different circle. I thought it worked a lot better when we had two Open and two Advanced circles and took 10 from each, because the quantization effect is reduced. But in any case the current system has far fewer complaints than anything before it.
Brett
-
See my post in debate section. D>K
-
I am sure my next statement will not set well with the status quo but here goes. Take a look at how many NATS winner we have had sense moving to Muncie. Far less than when the NAVY NATS was in play. I have to ask is it normalization? Is it because it the same people judging the same fliers on different day? If this were not the case the worlds would be won by those same guys every year. But they are not, the reason is not because one set is less consistant than the other. Its because of different judges looking for a different set of mistakes.
There is always opposition to a new way of getting the judging pool to Judge because they don't know what they are looking for. I say good!
Or they don't know the pattern. I guess we were all born knowing the pattern. So among us we choose the same set of judges (most of which are our friends)to judge year after year and expect them to pick a winner. They do a good job as is but is it really? I made a statement that everyone see art differently just as everyone person sees the concours plane differently. That being said a pattern to one will not look the same to another just as it looks differently from the inside. I think it was Doug in this thread saying something about the NAVY way of judging being influenced by the paint job. OK ,Awhile back I said let the computer judge the flight and that was me with opposition as it would take the presentation out of the equation (white pants etc.) 25 years ago I might have persued this system but its too late.
The top guys should embrace outside judging to see who really is the best. But as is not much will change.
-
I am sure my next statement will not set well with the status quo but here goes. Take a look at how many NATS winner we have had sense moving to Muncie. Far less than when the NAVY NATS was in play. I have to ask is it normalization? Is it because it the same people judging the same fliers on different day? If this were not the case the worlds would be won by those same guys every year. But they are not, the reason is not because one set is less consistant than the other. Its because of different judges looking for a different set of mistakes.
There is always opposition to a new way of getting the judging pool to Judge because they don't know what they are looking for. I say good!
Or they don't know the pattern. I guess we were all born knowing the pattern. So among us we choose the same set of judges (most of which are our friends)to judge year after year and expect them to pick a winner. They do a good job as is but is it really? I made a statement that everyone see art differently just as everyone person sees the concours plane differently. That being said a pattern to one will not look the same to another just as it looks differently from the inside. I think it was Doug in this thread saying something about the NAVY way of judging being influenced by the paint job. OK ,Awhile back I said let the computer judge the flight and that was me with opposition as it would take the presentation out of the equation (white pants etc.) 25 years ago I might have persued this system but its too late.
The top guys should embrace outside judging to see who really is the best. But as is not much will change.
Robert,
I guess I know what you mean by embracing "outside judging". But before you criticize the judging corps at the Nats for the past 20 or 30 years, maybe you should first take the time and bother to identify all of the names who have judged the Open event at the Nats since the Nats stopped using Navy judges which was some time before moving to Muncie. I think you will be astounded by the numbers involved. Yet, over those years with all sorts of judging combinations from a relatively large pool, a fairly limited number of pilots have consistently found their way to the top scoring positions. In your own way, you have condemned the efforts of a great number of honest and hard working volunteers.
Even during the Navy Nats when different judges were used each year, there was still a group of people who consistently moved to the top. (Thinking in terms of Werwage, Gieseke, McFarland, Gialdini, Mathis.)
What is so hard to comprehend that some people just fly better than others and will consistently win, or place well, or QUALIFY at the Nats regardless of who is judging?
We have a happy situation now where there are some new faces appearing in the top 5 and top 20. Can it really be that these new faces are flying well and that the old stodgy, set in their ways judging corps is really recognizing new talent?
Keith
-
Keith you won't be the only one to chastise me for my opinion. But switching one for the other year after year does not show much change to me. I in no way wish to discredit or take away any of the work done in the past. Its the future I am looking at.
-
I would expect the experienced judge to be more closely correlated with either newbie than the newbies would be with each other.
Howard, could you go back to my post and comment please? To my understanding, the consistency of he judges was very close this year with the highest inconsistency 2.25 or 2.5 across the 12 judges. I keep reading about odd man out with some wildly varying scores but I don't think we had that at all. I apologize I have not read your description and don't remember ever seeing it, but perhaps you could layout some examples of the point spread of a judging panel and what is good or better, bad or worse.
-
I would offer two other possible contributing reasons for the number of Nats champs lessening since the days of the "Navy Nats", besides just pointing at judging.
1) The switch to the Top 5 format. Instead of the finals being Top 20 day, with the best single flight winning, it became best 2 out of 3. That rewards consistency, rather than being able to "drill in" one single great pattern.
2) The switch to Muncie meant that the site conditions are more-or-less known. (Competitors who've flown there year after year know how the wind acts when it's blowing in certain directions, etc.) When the Nats moved from location to location, everyone was in the same boat - time to trim the plane to the conditions of the new location, and figure things out quickly. At Muncie, for a lot of the top guys, "trimming" is more making fine-tuned changes to a plane rather than significant changes.
And there's probably a lot of other factors as well. (Engines are better, flyers are better at trimming planes now than they were years ago ...) All I'm saying is that I don't think anyone can point at a single reason for "why less people win the Nats" now.
-
Howard, could you go back to my post and comment please? To my understanding, the consistency of he judges was very close this year with the highest inconsistency 2.25 or 2.5 across the 12 judges. I keep reading about odd man out with some wildly varying scores but I don't think we had that at all. I apologize I have not read your description and don't remember ever seeing it, but perhaps you could layout some examples of the point spread of a judging panel and what is good or better, bad or worse.
My comment that you're quoting is about a hypothetical situation in which an experienced and presumably accurate judge gets on a circle with two rookie and presumably inaccurate judges. Would that unfairly affect the experienced guy's ranking? Unless the rookies make the same errors as each other consistently, I wouldn't think so, but I think that's amenable to analysis. I don't have time now, but I'll do something after August.
I don't have the program with all the data in it from the 2014 Nats. I have some from previous Nats, but don't have time to look at them now. I would take the rankings from the last two Nats with a grain of salt. I fear that adding "Expert" messed up the judge rating. The jist of it is that a judge will have more contestant ratings that deviate from the official, averaged ratings the more contestants he judges: resolution gets finer with number of contestants. We're now comparing judges from circles with greater differences in number of contestants than before. Had Bob not fiddled with the program to put "Expert" on two circles instead of one, this might have distorted judge rankings even more. So we need to find a way to normalize the judge scores better. That's what I've been asking for help with.
To be fair, even if we recombine Expert and Advanced at the Nats, dwindling attendance may put them on fewer than four circles, and the normalization problem will still need to be addressed.
-
My comment that you're quoting is about a hypothetical situation in which an experienced and presumably accurate judge gets on a circle with two rookie and presumably inaccurate judges. Would that unfairly affect the experienced guy's ranking? Unless the rookies make the same errors as each other consistently, I wouldn't think so, but I think that's amenable to analysis. I don't have time now, but I'll do something after August.
I don't have the program with all the data in it from the 2014 Nats. I have some from previous Nats, but don't have time to look at them now. I would take the rankings from the last two Nats with a grain of salt. I fear that adding "Expert" messed up the judge rating. The jist of it is that a judge will have more contestant ratings that deviate from the official, averaged ratings the more contestants he judges: resolution gets finer with number of contestants. We're now comparing judges from circles with greater differences in number of contestants than before. Had Bob not fiddled with the program to put "Expert" on two circles instead of one, this might have distorted judge rankings even more. So we need to find a way to normalize the judge scores better. That's what I've been asking for help with.
To be fair, even if we recombine Expert and Advanced at the Nats, dwindling attendance may put them on fewer than four circles, and the normalization problem will still need to be addressed.
Howard it takes no more work for EXPERT other than adding a E or a A on the score board when writing the score down.
-
And there's probably a lot of other factors as well. (Engines are better, flyers are better at trimming planes now than they were years ago ...) All I'm saying is that I don't think anyone can point at a single reason for "why less people win the Nats" now.
I'd venture to guess those few that have won also reflect a disproportionate ratio of handle time to keyboard time.
-
Howard it takes no more work for EXPERT other than adding a E or a A on the score board when writing the score down.
I have to disagree, it is quite a bit more work than just writing a letter down. What if you end up with 2 experts on circle A, 2 on circle B, 0 on circle C, and 6 on circle D. You cannot take an equal number from each circle, same with Advance. The seeding has to be manually manipulated to get the correct number of guys on each circle. The only way to do this is by using math to figure out the seeding number for all of the Advance and Expert pilots. That is what has taken so long the past two years.
It has been suggested that we just use a different system to score Expert but this would add quite a bit of work for the ED. One of the really nice things about Howard's program is that it prints all the score sheets in order for each judge. The hours of printing, labeling, and stacking score sheets are eliminated with the program. If you try to run two separate programs you would have a lot of excess work trying to mix the Expert in with the pre-organized Open and Advanced pilots. Also the flight order would be all screwed up because after the ping pong draw the program takes care of the rest.
Derek
-
I'd venture to guess those few that have won also reflect a disproportionate ratio of handle time to keyboard time.
The best answer so far.
-
All I'm saying is that I don't think anyone can point at a single reason for "why less people win the Nats" now.
I can! Dave and Paul are very hard to beat.
Derek
-
Howard it takes no more work for EXPERT other than adding a E or a A on the score board when writing the score down.
Well, you figure out the judge scoring normalization problem for starters. I figure that integrating Expert into the program will take me 200-some hours. Although it would be fun, I didn't do it this year because I was busy getting ready for a stunt contest. Fortunately, Steve Yampolsky figured out how to accommodate all three events regardless of entry numbers, maintaining the same Nats format, ensuring a meaningful qualification round (not taking 20 out of 23), and balancing the circles. Go back over the discussion two years ago and you can see what's involved.
-
Instead of the finals being Top 20 day, with the best single flight winning, it became best 2 out of 3. That rewards consistency, rather than being able to "drill in" one single great pattern.
Can we retroactively switch to this format for 2014? I like that idea for some reason...
Brett
-
My comment that you're quoting is about a hypothetical situation in which an experienced and presumably accurate judge gets on a circle with two rookie and presumably inaccurate judges. Would that unfairly affect the experienced guy's ranking? Unless the rookies make the same errors as each other consistently, I wouldn't think so, but I think that's amenable to analysis. I don't have time now, but I'll do something after August.
I don't have the program with all the data in it from the 2014 Nats. I have some from previous Nats, but don't have time to look at them now. I would take the rankings from the last two Nats with a grain of salt. I fear that adding "Expert" messed up the judge rating. The jist of it is that a judge will have more contestant ratings that deviate from the official, averaged ratings the more contestants he judges: resolution gets finer with number of contestants. We're now comparing judges from circles with greater differences in number of contestants than before. Had Bob not fiddled with the program to put "Expert" on two circles instead of one, this might have distorted judge rankings even more. So we need to find a way to normalize the judge scores better. That's what I've been asking for help with.
To be fair, even if we recombine Expert and Advanced at the Nats, dwindling attendance may put them on fewer than four circles, and the normalization problem will still need to be addressed.
No Howard, my comment is not hypothetical, I was one of this years judges and I'm questioning the consistency spread.
-
I can! Dave and Paul are very hard to beat.
Derek
You got that right!
It's the same guys winning because stunt is odd in a way that you can dominate for 50 years. Billy is a great example. I think he has a Win in 5 different decades. What other physical activity can you say that about? And when you do find one, which I am sure there are some, you will probably see the same winners in that arena from year to year as well. Being in Muncie every year has tailored who will show up and who wont. The traveling nats would bring different people from different areas, but the hardcore psycho top guns would travel anyway and usually be in the equation for the win at the end anyway. In 1994 when I attended Lubbock TX nats for my first time as a beginner Paul W and Bob H were in the flyoff. Last year when I made the flyoff Bob H and Paul W were there as well. A good pattern holds up over time. For those who think it should change more often get out your SN over the last 20+ years and look at the entries the same names keep on bringing it year after year. I only hope my name is one they comment about in 20 years as always in the hunt and howcome.... 3 2nd places will have to start translating to more wins sooner or later........I hope....A note to Dave and Paul, I think the nats will be in Kansas next year...... ;D ;D ;D ;D
-
I am not saying who won was not the best pilots or the judging is unfair. I just cant for the life of me figure out why any mention of any of the subjects here cackeles feathers by the status quo. To all who have won in the past "YOUR THE GREATEST!" "Your the best at something that pays nothing." Thats a quote from a worlds champs wife.
-
No Howard, my comment is not hypothetical, I was one of this years judges and I'm questioning the consistency spread.
I understand. You quoted something I said, then went on to ask about this year's consistency spread. I responded to both even though I suspected you were only interested in the latter. I wasn't able to give you a satisfactory response because I don't have the data to see what happened at this year's Nats.
-
I am not saying who won was not the best pilots or the judging is unfair. I just cant for the life of me figure out why any mention of any of the subjects here cackeles feathers by the status quo. To all who have won in the past "YOUR THE GREATEST!" your the best at something that pays nothing. Thats a quote from a worlds champs wife.
Then what are you saying. I dont think I follow.
Are you genuinley asking why do the same guys rise to the top and want to know their secrets?
Or
Are you asking that question because you think it should be otherwise?
PLEASE NOTE THIS IS FREINDLY DISCUSSION....Those are not pointed questions to try to stir it up so to speak. Just discussion.
-
This thread started out as why do they put inexperienced judges on top 20 days and leave the judges with experience judging advanced. Anytime a question comes up the powers to be take great offence. I am not the only one who sees a problem with the status quo. I can not change anything on my own and in 10 years when there is only 20 guys to compete for top 20 it wont matter.
-
This thread started out as why do they put inexperienced judges on top 20 days and leave the judges with experience judging advanced. Anytime a question comes up the powers to be take great offence. I am not the only one who sees a problem with the status quo. I can not change anything on my own and in 10 years when there is only 20 guys to compete for top 20 it wont matter.
Yeah I hear that too. And each year I wonder "Is this the year with the big drop in attendance?" And each year is 36-38 guys battling it out....
-
This thread started out as why do they put inexperienced judges on top 20 days and leave the judges with experience judging advanced. Anytime a question comes up the powers to be take great offence. I am not the only one who sees a problem with the status quo. I can not change anything on my own and in 10 years when there is only 20 guys to compete for top 20 it wont matter.
You are correct but other than a few comments in the very beginning of this thread it has been positive and informative. I do not see anyone getting their panties in a wad, just asking questions and debating ideas. That is what makes this forum what it is.
Derek
-
This thread started out as why do they put inexperienced judges on top 20 days and leave the judges with experience judging advanced. Anytime a question comes up the powers to be take great offence. I am not the only one who sees a problem with the status quo. I can not change anything on my own and in 10 years when there is only 20 guys to compete for top 20 it wont matter.
The method used is completely public: at least as I wrote it. I don't know whether Bob used it rigorously to pick the Top-20 judges. Dig in and understand the method, then suggest an improvement, rather than just saying it's wrong. I think it does have flaws, which I have described here.
Lots of people have put in a lot of work to put on a Nats that's fair, objective, and picks the best as winners. I think you owe it to them to understand the process before you criticize it or -- worse yet-- talk the PAMPA EC into making capricious changes to it. If you do understand the process and propose and justify improvements, they will be welcomed, and I'll do my best to get them incorporated.
-
I don't know what to say because all I get is BS from everyone so best not say anything about the sacred way things are done anymore just keep on what your doing.
-
Yeah I hear that too. And each year I wonder "Is this the year with the big drop in attendance?" And each year is 36-38 guys battling it out....
So far your right with the same guys in the top 20 give or take 1 or 2 but its coming trust me.
-
Since some judges tend to consistently score high or low, I would think that the current method of normalizing the total score would create a bracketing effect. It's unintended result would be to throw out total high score or total low score, despite those scores possibly being more accurate. Is there a way to normalize the individual maneuver scores so you can compare how consistent the judges are, maneuver to maneuver (by normalizing the total score and then going back and on that basis normalizing each individual maneuver score)? If most of the judges score a particular maneuver high (normalized score) and one judge scores significantly lower or higher (normalized score) would that indicate that that particular judge was not as accurate as the others? Suppose you weighted the normalized scores of the individual judges by a goodness factor (based on judging experience or by acknowledged expertise in judging) to give an indication of which normalized maneuver score should be considered "normative" or "basis."
Scott
-
My belief is that when, essentially, the same judges, judge essentially the same fliers, using the same system to "grade" the judges and fliers, the results will be predictable. I have no doubt that a lot of the same fliers would appear in the top 20 with an entirely new judging corps, but believe that all fliers all deserve to be viewed by fresh judges.
A flier that complains about the judging may be sour grapes.....if he/she goes off to the same contest and there are different judges and the results are the same, he/she should take up a stopwatch event.....but if the judges are not entirely different, then there is a shadow of doubt.
-
Since some judges tend to consistently score high or low, I would think that the current method of normalizing the total score would create a bracketing effect. It's unintended result would be to throw out total high score or total low score, despite those scores possibly being more accurate. Is there a way to normalize the individual maneuver scores so you can compare how consistent the judges are, maneuver to maneuver (by normalizing the total score and then going back and on that basis normalizing each individual maneuver score)? If most of the judges score a particular maneuver high (normalized score) and one judge scores significantly lower or higher (normalized score) would that indicate that that particular judge was not as accurate as the others? Suppose you weighted the normalized scores of the individual judges by a goodness factor (based on judging experience or by acknowledged expertise in judging) to give an indication of which normalized maneuver score should be considered "normative" or "basis."
I'm not sure what you are saying. A contestant's score for a flight is the average of the flight scores of all the judges on that circle. Are you suggesting that we not average the scores, but adjust them for judging like the full-scale aerobatics guys do? It would take a rules change, and would resist that unless I could see (and understand) proof of its statistical validity.
Judge assessment is something different. It's based on the order in which each judge ranks the contestants compared to the official ranking of the contestants (the one on the scoreboard). Here is a description of our judge assessment, written when my dementia had not progressed to today's state: http://www.clstunt.com/htdocs/dc/dcboard.php?az=show_topic&forum=103&topic_id=327442&mesg_id=327442&listing_type=search . I should mention that Paul intoduced the exceedance term to penalize favoritism. He set it to zero, where it has remained, either because favoritism was not a problem or because we couldn't agree on how to spell exceedance.
As I mentioned in the SSW writeup, we could change the judge assessment to maneuver-by-maneuver. I'd do it and recommend that it be adopted if somebody can show me it's better than flight-by-flight.
One interesting property of judging is that the guy who uses a narrow scoring band has less influence on the contest's outcome than a guy who uses the full range. For example, if he gives all maneuvers a 34, he doesn't affect the outcome at all.
-
My belief is that when, essentially, the same judges, judge essentially the same fliers, using the same system to "grade" the judges and fliers, the results will be predictable. I have no doubt that a lot of the same fliers would appear in the top 20 with an entirely new judging corps, but believe that all fliers all deserve to be viewed by fresh judges.
A flier that complains about the judging may be sour grapes.....if he/she goes off to the same contest and there are different judges and the results are the same, he/she should take up a stopwatch event.....but if the judges are not entirely different, then there is a shadow of doubt.
Now Richard, you haven't looked at the judge roster lately. I see a couple of new guys this year, a handful of guys who've only done it twice or so, and some guys who haven't done it for several years. I only see three regulars. And out of the hundreds of judges Bob had to pick from, I think he got a particularly competent group.
-
Now Richard, you haven't looked at the judge roster lately. I see a couple of new guys this year, a handful of guys who've only done it twice or so, and some guys who haven't done it for several years. I only see three regulars. And out of the hundreds of judges Bob had to pick from, I think he got a particularly competent group.
I agree, I could not have done a finer job myself...and I have tried.
Derek
-
I am not saying who won was not the best pilots or the judging is unfair. I just cant for the life of me figure out why any mention of any of the subjects here cackeles feathers by the status quo.
The quick answer is that you are trying to provoke this kind of response. The question is answered clearly and concisely but you don't bother to even read the response, or take the time to try to understand it. There have already been very extensive discussions about this for years now, you can have the exact source code yourself, put in your own numbers, and see exactly what it is doing. You have had several qualitative descriptions.
Of course, you don't actually care what the answer is, you want to grind your own axe. And bear in mind, I know what that is in detail since we talked about it for a while. At one moment, you disingenuously ask about how the judge selection works. But you immediately respond with "why don't we bring in outside judges". Then later, you appear to complain that the judge selection algorithm fails to pick the most experienced judges - which is the diametric opposite of the "outside judges" theory, which ensures the least experienced judges we could possibly find.
Then, you ask whether or not "normalization" is a factor. You have absolutely no idea what "normalization" means in this context, of course, and Howard stated earlier that it was not used. The ranking method amounts to the same thing as normalizing the scores, then choosing by the resulting normalized score, but it's not used directly.
In short, you aren't listening to the answers. That's why people get upset, you are just tossing around crap about the work people have spent many careful hours/years trying to raft, and you have no idea or no interest in how it actually works. You just know it's garbage and whatever pops into your head is going to "fix" the issue.
As I told you at the NATs, there are plenty of us willing to help you - but not to get abused and denigrated for trying. As I mentioned at the NATs, the scoreboard is telling you something *very important*, and you aren't willing to listen. To get out of your current rut, you are going to have to do MANY things differently than you do now, and the first and most important is to accept that you are going to have to pay attention to other people's inputs to succeed. David does, and you don't - see the difference there?
Brett
-
I'm not sure what you are saying. A contestant's score for a flight is the average of the flight scores of all the judges on that circle. Are you suggesting that we not average the scores, but adjust them for judging like the full-scale aerobatics guys do? It would take a rules change, and would resist that unless I could see (and understand) proof of its statistical validity.
Judge assessment is something different. It's based on the order in which each judge ranks the contestants compared to the official ranking of the contestants (the one on the scoreboard). Here is a description of our judge assessment, written when my dementia had not progressed to today's state: http://www.clstunt.com/htdocs/dc/dcboard.php?az=show_topic&forum=103&topic_id=327442&mesg_id=327442&listing_type=search . I should mention that Paul intoduced the exceedance term to penalize favoritism. He set it to zero, where it has remained, either because favoritism was not a problem or because we couldn't agree on how to spell exceedance.
As I mentioned in the SSW writeup, we could change the judge assessment to maneuver-by-maneuver. I'd do it and recommend that it be adopted if somebody can show me it's better than flight-by-flight.
One interesting property of judging is that the guy who uses a narrow scoring band has less influence on the contest's outcome than a guy who uses the full range. For example, if he gives all maneuvers a 34, he doesn't affect the outcome at all.
Howard: Sorry that I didn't explain myself better. I am not talking about changing the score. But aren't we essentially using the scores to "adjust" the judges by eliminating those judges that do not conform to the "average"? This, in effect, would bracket the scores simply by selecting the judges closest to average for judging the subsequent rounds. I'm suggesting that we use maneuver-by-maneuver by first normalizing to average, then normalizing the individual maneuver scores. Adding a "goodness" factor based on a judge's experience or acknowledged ability would then be used to weight individual scores in summing to an average maneuver score to which all scores would be compared.
The problem with using total scoring for comparison and selecting judges is embodied in your example: what happens when a judge scores 35 on every maneuver and coincidentally ends up with a total score that matches the average? This judge would be considered the "best" judge. Correct?
Scott
-
Um, it's clear, isn't it, that we don't use contestant scores at all to assess judges? We look at the order that they rank contestants. This avoids the high-low issue and the narrow-wide issue. Please put your method in a formula. I still don't understand it.
I don't see how to make an objective goodness factor.
-
Um, it's clear, isn't it, that we don't use contestant scores at all to assess judges? We look at the order that they rank contestants. This avoids the high-low issue and the narrow-wide issue. Please put your method in a formula. I still don't understand it.
I don't see how to make an objective goodness factor.
Howard, the method is called a "nomalized cross correlation". It is a measure of how two series of measurements resemble each other. It is a common mathematical analysis tool, the "normalized" part removes the high-low concerns. I have used the technique to analyze F2B judging from past World Championships.
If you wish, we can talk about this later, after Poland, since we're both pretty tight for time right now.l
Bill
-
Howard: thanks for your patience with me. I obviously am not doing a very good job of explaining this. And Bill: thanks for saying what I was trying to say.
It seems to me that comparing total scores (and thus, resulting placement of pilots, as in 1st, 2nd, and 3rd) is more concerned with excluding favoritism by judges. By comparing the judge's maneuver scores, one is trying to actually discern which judges are doing the best job of scoring. I think that developing this method would benefit our judging core by helping show who is on the mark and who isn't. I don't think that (within reason) it should matter that a particular judge scores high or low as long as they score consistently. But, total score comparisons as done now, tends to drive towards an average. If your goal is excellence, your target should not be "average".
Career-wise, I wish I had known this stuff about 20 years ago to use as a management tool.
Bill: I'm not sure how much I can help, but would be glad to go over anything you suggest.
I hope you guys do great in Poland!
Scott
-
Scott, thank you for your patience. I'd like to discuss this in a month or so.
-
By comparing the judge's maneuver scores, one is trying to actually discern which judges are doing the best job of scoring. I think that developing this method would benefit our judging core by helping show who is on the mark and who isn't.
Hi Scott:
I don't fly at the nats (yet), but I think I'd rather fly in a contest where the judges are allowed to be persnickety about different things, rather than one where they're all expected to judge exactly alike. Judging is, of necessity, subjective -- asking the judges to be totally regimented in individual maneuver scores is going to suppress that.
And I must say, I'm quite happy about the fact that we don't pay the judges. When you start paying people based on a performance metric, you usually end up with two things: an overall rise in that metric, and a deep appreciation about how raising that metric in unexpected ways can really mess up your overall productivity.
http://www.dilbert.com/strips/comic/1995-11-13/ (http://www.dilbert.com/strips/comic/1995-11-13/)
-
I don't fly at the nats (yet), but I think I'd rather fly in a contest where the judges are allowed to be persnickety about different things, rather than one where they're all expected to judge exactly alike. Judging is, of necessity, subjective -- asking the judges to be totally regimented in individual maneuver scores is going to suppress that.
http://www.dilbert.com/strips/comic/1995-11-13/ (http://www.dilbert.com/strips/comic/1995-11-13/)
Tim,
In all due respect to your analytical ability, I think you have missed the message here when you think that the Nats judges are "expected to judge exactly alike" or that the judges are to be "totally regimented in individual maneuver scores".
Yes, judges are expected to recognize errors, but they are also expected to assign their own point value to the maneuver based how well they saw the maneuver flown,. Yes, it is recognized that the event us subjectively judged and yes, it is expected that each judge will have his/her own assessment of each maneuver. If every judged scored each maneuver exactly the same and "correct" we would only need one judge. When we see 3 or 5 or 6 judges on one circle, it is recognition that judging is far from a perfect process.
Keith
-
Interesting thread. As one of the judges this year I can say this. The training wasn't what you'd call "ground up", it was really more of a discussion session to point out safety and procedural issues with some discussion about where maneuvers begin and end, and other nuances. All the judges were experienced. The most interesting point was that in the consistency rating, there was a very small variation, the least consistent being a 2.25 which to my understanding is a very narrow range. This, also to my understanding would have qualified any of this years judges as top 20 last year which had a range nearing 5. The Head Judge is not necessarily the most qualified, he is an organizer, not a boss. Mark did an exceptional job in my opinion and himself admitted he is not an "expert", whatever that might be.
Personally it was a very rewarding however exhausting experience. the daily warm-up flights put everyone on the same page first thing. For me, it was a chance to compare my style and observations with judges other then our local guys. I was pleased to find that I fit in very closely with the group, seeing the same things an scoring very consistently.
Mr Ryan,
Mark IS an organizer, and not a boss, as you stated. BUT, if he is not an expert judge, then NOBODY is. He has been doing this for about 35 years. Some may disagree with his scores, but thats human nature. The "expert" reference may be the fact that he was talking about flying or something else.
-
I'm not responding to anyone in particular, nor am I advocating any of the proposals I've read here.
My observation with non-professional judges is , their first couple, or few flights in a day are judged differently from the remaining, after the judge set is "broken in". If you are the first flight of the day, and the judges are cold, watch out!
Floyd