Author Topic: Score Calcultion Methods (Read 4685 times)

Jorge de Azevedo · « **on:** August 10, 2012, 03:10:42 PM »

Why to use shaved or trimmed mean insted of median to calculate score?
Statistically , mean is affected for extremes (low or high) values, so, shaved or trimmed minimize the problem but not solve it at all.

Brett Buck · « **Reply #1 on:** August 10, 2012, 07:55:07 PM »

Quote from: Jorge de Azevedo on August 10, 2012, 03:10:42 PM

Why to use shaved or trimmed mean insted of median to calculate score?
Statistically , mean is affected for extremes (low or high) values, so, shaved or trimmed minimize the problem but not solve it at all.

I don't think I understand what you are referring to. I have never heard the worked "shaved" with regard to points, aside from Pete Rose.

Brett

BillLee · « **Reply #2 on:** August 11, 2012, 05:20:08 AM »

Brett, I think he means to throw out the high and low scores and average the rest.

Perhaps the more astute on the forum would comment on the effect of averaging the scores rather than using the median.

Bill

Jim Thomerson · « **Reply #3 on:** August 11, 2012, 06:41:49 AM »

The more scores, the more likely the average is to be about right. So I am not sure throwing out the high and low score is a good thing. I think using the median score rather than the arithmetic mean requires an odd number of judges.

Jorge de Azevedo · « **Reply #4 on:** August 11, 2012, 07:21:18 AM »

I think is not necessary to go to a statistic book to understand the effects of mean or median as a central tendency value. Go to a Microsoft Excell and make a simulation there. Put a matrix of 3 or 5 values per maneuver and select on statistical functions, mean and after median. Observe what happens with the values of mean and median when there is extreme values in the series. Observe that when the 3 or 5 values are almost the same, the mean and median value are the same (symmetrical distribution) and when there is extreme low or high values, the mean and median values are different (asymmetrical distribution). After that tell me why FITA and others use mean insted of median ?

Jorge de Azevedo · « **Reply #5 on:** August 11, 2012, 07:26:28 AM »

Quote from: Jorge de Azevedo on August 11, 2012, 07:21:18 AM

I think is not necessary to go to a statistic book to understand the effects of mean or median as a central tendency value. Go to a Microsoft Excell and make a simulation there. Put a matrix of 3 or 5 values per maneuver and select on statistical functions, mean and after median. Observe what happens with the values of mean and median when there is extreme values in the series. Observe that when the 3 or 5 values are almost the same, the mean and median value are the same (symmetrical distribution) and when there is extreme low or high values, the mean and median values are different (asymmetrical distribution). After that tell me why FITA and others use mean insted of median ?

I am sorry, FAI insted FITA.

Brett Buck · « **Reply #6 on:** August 11, 2012, 09:02:15 AM »

Quote from: Jorge de Azevedo on August 11, 2012, 07:21:18 AM

I think is not necessary to go to a statistic book to understand the effects of mean or median as a central tendency value.

Particularly since the data is not a statistical sampling, its the entire population. All the data there is are the 3, 4, or 5 scores, there's no need for statistics, you have all the information, not just a part of it. Consulting the first few pages of a statistics book, where the definition of sampling, distributions, etc, are given, would be useful.

Given that it is not a statistical issue, the correct method is the sum. The mean will give you exactly the same ranking, and the numerical value is more recognizable.

I agree that throwing out high/low is generally just a fig leaf to make people think we are doing something about judge bias. If a judge is biased, he affects the score even if he does get thrown out high or low.

Brett

Trostle · « **Reply #7 on:** August 11, 2012, 09:04:10 AM »

Quote from: Jim Thomerson on August 11, 2012, 06:41:49 AM

The more scores, the more likely the average is to be about right. So I am not sure throwing out the high and low score is a good thing. I think using the median score rather than the arithmetic mean requires an odd number of judges.

In the past, there have been some analysis of the finals of several F2B World Championships which compared the results where the high and low scores (of five judges) were not considered for each flight (averaging or the mean of the three remaining judges) to averaging the scores of all five judges for each flight. Essentially, the rank order - 1st through 15th did not change appreciably. There were sometimes a few minor changes like a shift from 3rd and 4th or a 12th and 14th. I have done a similar comparison from some of our own finals from the Nats and Team Trials where as many as seven judges were used and came up with the same results that there is little difference in the placings whether or not high and low scores were discarded to determine the average score of each flight. and usually, the top two or three did not change, even when there was a very small difference in scores. (And even when there are these small differences in scores, there is little evidence that there is a "clear" winner, and most will agree that any one of those top two or three pilots could have been the winner. The point here is that it makes little difference of the system used, a winner is determined, regardless of the method to calculate the scores, that most observers can find acceptable.)

Now, as far as using a mean score rather than an average score, I think if something like this is done, a better approach would be to utilize the process that the full scale aerobatic community uses which normalizes the scores or each maneuver by each judge, works through some elaborate comparisons, and then establishes the final rank ordering of individual flights after all of the flights are completed and the scores entered into the program. This process will show the differences in judging where there might be intentional bias or just plain judge incompetence. Another problem is that there is no way to show scores as the contest progresses. Scores cannot be posted until all of the flight scores are entered into the proram, the computer does is thing, then prints out the final results.

Studies have been done in the past on applying this normalization process to the scores of the finals from the F2B World Championships. Other than sometimes showing individual judge bias/incompetence, little change in the final results would have occurred in the finals rankings.

Jorge de Azevedo · « **Reply #8 on:** August 11, 2012, 10:02:18 AM »

Suppose a FAI competition where the quality of the judges were different from the quality of the judges in NATS and World championships.
3 judges; 3 notes to the round inside loops:

10 9 3 final note if you use mean = 7,33 if you use median = 9

Now in NATS and Word championships

9 9 9 final note if you use mean = 9 if you use median = 9

9 10 8 final note if you use mean = 9 if you use median = 9

9 9 8 final note if you use mean = 8,66 if you use median = 9

PS: Mean and median are statistic functions. Are numerical summarizers used in descriptive statistics.

Brett Buck · « **Reply #9 on:** August 11, 2012, 12:11:27 PM »

Quote from: Jorge de Azevedo on August 11, 2012, 10:02:18 AM

Suppose a FAI competition where the quality of the judges were different from the quality of the judges in NATS and World championships.
3 judges; 3 notes to the round inside loops:

10 9 3 final note if you use mean = 7,33 if you use median = 9

Now in NATS and Word championships

9 9 9 final note if you use mean = 9 if you use median = 9

9 10 8 final note if you use mean = 9 if you use median = 9

9 9 8 final note if you use mean = 8,66 if you use median = 9

PS: Mean and median are statistic functions. Are numerical summarizers used in descriptive statistics.

I think we understand that well enough. But just because you can take numbers and run them through and equation, does not mean they meet the definition of a statistical sampling or that the results mean anything. For instance no one has ever proven, or even attempted to prove to my knowledge, that stunt scores are a stochastic process. You certainly aren't taking samples of a large data population, you are taking the entire data set. It's the difference between a political "exit poll" where you ask 1000 people how they voted and use that to make conclusions about the entire election, and actually counting all 100 million votes. In the first, it's an estimate that presumes some random error and you take enough samples to get the error margin lower. In the second you actually count all the votes and then you have no uncertainty at all (presuming the counting is accurate). Stunt judge scores are the latter.

Brett

Jorge de Azevedo · « **Reply #10 on:** August 11, 2012, 02:34:42 PM »

Mr.Buk,
I agree with you. It is not a sample and nobody here is sampling anything. Sampling Distribution of the Mean ; method of sampling; etc... is not in discussion here.
I think the case here is simplest; we have three notes and have to resume those three values in one number (note of the maneuver). I am referring to the method of well summarize those three values in a way to neutralize more adequately possible asymmetry. The mean is a good summarizer only when the distribution of the values is normal, other way; the median is a better summarizer. The mean is affected by extreme values and the median is not.

Brett Buck · « **Reply #11 on:** August 11, 2012, 03:47:26 PM »

Quote from: Jorge de Azevedo on August 11, 2012, 02:34:42 PM

Mr.Buk,
I agree with you. It is not a sample and nobody here is sampling anything. Sampling Distribution of the Mean ; method of sampling; etc... is not in discussion here.
I think the case here is simplest; we have three notes and have to resume those three values in one number (note of the maneuver). I am referring to the method of well summarize those three values in a way to neutralize more adequately possible asymmetry. The mean is a good summarizer only when the distribution of the values is normal, other way; the median is a better summarizer. The mean is affected by extreme values and the median is not.

Why do you think any of the scores is "wrong" and should be removed (much less two of them)? Moreover, how can you tell which of the scores is correct or not without looking a how the same judge scores other fliers?

I think that, taken to the end of the reasoning, is that you don't want to throw out any of the scores. That suggests using the mean instead of the median - which is the same as adding them all up. Particularly true in the case you give where it's essentially degenerate, ending up throwing away 2/3 of the information.

This is a very interesting discussion point, particularly since there is a huge amount of utter mathematical nonsense associated with the topic as people decide what they want to do, then figure out the reasoning that justifies it afterwards.

They key to it is the answer to the first question, I think.

Brett

peabody · « **Reply #12 on:** August 11, 2012, 04:02:24 PM »

I think using all of the scores makes sense....

I also wonder "what if?" a random judge's score was thrown out each round....I believe that it might have the same effect as one member of a firing squad receiving a blank round...just curious....

Jim Thomerson · « **Reply #13 on:** August 11, 2012, 04:14:28 PM »

If I am going to stand out in the sun for hours concentrating and writing down numbers, I will not appreciate having some of my work disregarded.

Jorge de Azevedo · « **Reply #14 on:** August 11, 2012, 04:54:53 PM »

Quote from: Brett Buck on August 11, 2012, 03:47:26 PM

Why do you think any of the scores is "wrong" and should be removed (much less two of them)? Moreover, how can you tell which of the scores is correct or not without looking a how the same judge scores other fliers?

I think that, taken to the end of the reasoning, is that you don't want to throw out any of the scores. That suggests using the mean instead of the median - which is the same as adding them all up. Particularly true in the case you give where it's essentially degenerate, ending up throwing away 2/3 of the information.

This is a very interesting discussion point, particularly since there is a huge amount of utter mathematical nonsense associated with the topic as people decide what they want to do, then figure out the reasoning that justifies it afterwards.

They key to it is the answer to the first question, I think.

Brett

Mr.Buk,
Yes, it is a very interesting discussion. Almost every one known what "mean" is, but few knows exactly its limitations.
Why use tree or five judges if you don't think any of the scores will be "wrong" ?
I think the correct word is "extreme value" instead of "wrong value". The extreme value, different of the other values of the group is not necessarily wrong, but the possibility of being right is very low.
Whatever the method you use (mean or median), you substitute the notes of each judge (3,5,7 or better 20 or more) for only one number. I dint understand when you say ......and should be removed (much less two of them)? .....

Howard Rush · « **Reply #15 on:** August 11, 2012, 05:38:17 PM »

If I remember correctly, Charles Buffalano and Phil Cartier think that at the highest level of stunt, a judge's error in scoring a maneuver is greater than the flier's error in performing the maneuver. Hence, as I recall, it takes a lot of judges to get the accuracy you need to pick the best flier. I have suggested that we use many judges for the Open Nats finals, leaving two or three for the kids' finals. The problem is that not that many judges will fit on the circle.

Howard Rush · « **Reply #16 on:** August 11, 2012, 06:00:47 PM »

Quote from: Trostle on August 11, 2012, 09:04:10 AM

Now, as far as using a mean score rather than an average score, I think if something like this is done, a better approach would be to utilize the process that the full scale aerobatic community uses which normalizes the scores or each maneuver by each judge, works through some elaborate comparisons, and then establishes the final rank ordering of individual flights after all of the flights are completed and the scores entered into the program. This process will show the differences in judging where there might be intentional bias or just plain judge incompetence. Another problem is that there is no way to show scores as the contest progresses. Scores cannot be posted until all of the flight scores are entered into the proram, the computer does is thing, then prints out the final results.

I looked into that system. We pretty much use an equivalent system to rank Nats judges, but stop short of using it to rank contestants. I wouldn't want to change our current system unless somebody comes up with real mathematical proof that there's a better way than how we do it now. We've seen plenty of methods that look better intuitively to some guys or that are valid for a different situation, but misapplied to ours.

Speaking of statistics and Nats judging, we do have an actual math or statistics problem with the Nats judge ranking formula we use to select judges for the semifinals and finals. Now, a judge's score (in the judge ranking, not that for a flier) is, I think, a function of how many people he judges. Namely, it isn't really normalized. This hasn't been a real problem, because all judges judge essentially the same number of flights. Next year will be different, so the formula ought to be fixed. Anybody interested?

Jorge de Azevedo · « **Reply #17 on:** August 11, 2012, 07:25:12 PM »

Mr Rush,
You can get lots of proofs about the correct use of mean; median and mode in any good American statistical book.
About analysis of judge's performance I suggest to compute to each judge in each maneuver judged, the difference in modulus (don,t consider the signal - or +) between his particular note and the mean of the notes given to that maneuver. The best performance = the smallest sum of differences in modulus.
It is a simple way to work with deviation of the mean.

Howard Rush · « **Reply #18 on:** August 11, 2012, 09:15:31 PM »

We take total flight scores, rather than scores for each maneuver, although we could easily use scores for each maneuver. In stunt scoring, some judges tend to give low scores consistently, some give high scores. What if a low- or high-scoring judge is the best? Therefore, we look at how judges rank fliers compared to the ranking determined by the mean score of the judges (the one we use to determine the contest outcome). The judge whose ranking of fliers is closest to the mean ranking is rated best. Is this statistically valid? I don't know, but it eliminates the offset bias effect. Each judge's score, as I remember, is the average number of differences in ranking between his ranking and the ranking determined by the mean scores, divided by the number of flights he judges. However, a judge seeing a hundred flights may have more deviations per flight from the mean ranking than a judge seeing ten flights. How do we account for that? Do we divide by the square of the number of flights, rather than just the number of flights? That's the problem we have now.

Howard Rush · « **Reply #19 on:** August 11, 2012, 09:18:55 PM »

Quote from: Jorge de Azevedo on August 11, 2012, 07:25:12 PM

Mr Rush,
You can get lots of proofs about the correct use of mean; median and mode in any good American statistical book.

What luck! All my statistical books are American.

Jorge de Azevedo · « **Reply #20 on:** August 12, 2012, 05:55:19 AM »

Good morning Mr.Rush
We were discussing here a simple question of choosing the best method of resume the notes of each judge to get the final note in maneuvers, and as you have good books, and I am sure you have, probably you check that the best one is median.
Now you bring a very complex task: to access quality of judge's performance
If we want to consider consistence, then we must define what the attributes to be considered consistent / inconsistent are and after that will be necessary to identify what will be considered a high quality judge performance:
Judges that give notes consistent mean or near mean in relation of others notes ?
Judges that give notes consistent low in relation of others notes ?
Judges that give notes consistent high in relation of others notes ?

- The method of access judge's performance with sum of deviation will rank better consistent near mean judges and rank poor consistent low or high judges.
- I don't know how to access individual consistency to solve the problem of ranking poor consistent low or high judges.
- If we have a well defined protocol to be followed by judges in order to give notes, why consider consistent low or high a good performance? I agree is a impartial performance or not biased performance but not a good performance. It is my thoughts to suggest working with sum of deviations.
- I think that is possible to compare judge s performance that judged the same flights. It is necessary to be exposed to the same expositions (flights) to a valid comparison.
After define the best way possible to access performance, than:
If we have 5 groups of 3 or 5 judges , one group judging 1/5 of total number of competitors, we can access the best judge's performance by group and choose 5 judges to judge finals with 5 competitors. We can adapt those numbers to the number of competitors and judges available.

billbyles · « **Reply #21 on:** August 12, 2012, 12:22:00 PM »

Quote from: Jorge de Azevedo on August 12, 2012, 05:55:19 AM

Good morning Mr.Rush
We were discussing here a simple question of choosing the best method of resume the notes of each judge to get the final note in maneuvers, and as you have good books, and I am sure you have, probably you check that the best one is median.
Now you bring a very complex task: to access quality of judge's performance
If we want to consider consistence, then we must define what the attributes to be considered consistent / inconsistent are and after that will be necessary to identify what will be considered a high quality judge performance:
Judges that give notes consistent mean or near mean in relation of others notes ?
Judges that give notes consistent low in relation of others notes ?
Judges that give notes consistent high in relation of others notes ?

- The method of access judge's performance with sum of deviation will rank better consistent near mean judges and rank poor consistent low or high judges.
- I don't know how to access individual consistency to solve the problem of ranking poor consistent low or high judges.
- If we have a well defined protocol to be followed by judges in order to give notes, why consider consistent low or high a good performance? I agree is a impartial performance or not biased performance but not a good performance. It is my thoughts to suggest working with sum of deviations.
- I think that is possible to compare judge s performance that judged the same flights. It is necessary to be exposed to the same expositions (flights) to a valid comparison.
After define the best way possible to access performance, than:
If we have 5 groups of 3 or 5 judges , one group judging 1/5 of total number of competitors, we can access the best judge's performance by group and choose 5 judges to judge finals with 5 competitors. We can adapt those numbers to the number of competitors and judges available.

Dear Mr. de Azavedo,

It is beginning to look to me like you are trying to kill flies with a 20 pound sledgehammer. I get the math, as a mechanical engineer, but this is overkill. How is it that the current system (which has been working for quite some time now with little tweaking) bothers you? Did you feel unfairly judged at the U.S. Nats? Just curious.

Jorge de Azevedo · « **Reply #22 on:** August 12, 2012, 03:37:12 PM »

Dear Mr. Bally,
I am sorry to cause any kind of discomfort. I will try to explain better.
As you know, if any country wants to enter on FAI World Championship they have to adopt FAI rules in their qualifying competitions. Current FAI rules or any other rules will be very good to a country like USA and some others where you have plenty of good judges and pilots, but I think there is space to improve those rules in order to minimize effects of poor judge performance and bad pilot's behavior in some others countries. Poor judge performance and bad pilot's behavior may lead to desegregation and impair sport development. I have immense gratitude with good judge's work that spend their weekend making our fun possible, and I am seeing a possibility to help to promote a more peaceful environment both to good judges and good pilots.

billbyles · « **Reply #23 on:** August 12, 2012, 05:28:33 PM »

Quote from: Jorge de Azevedo on August 12, 2012, 03:37:12 PM

Dear Mr. Bally,
I am sorry to cause any kind of discomfort. I will try to explain better.
As you know, if any country wants to enter on FAI World Championship they have to adopt FAI rules in their qualifying competitions. Current FAI rules or any other rules will be very good to a country like USA and some others where you have plenty of good judges and pilots, but I think there is space to improve those rules in order to minimize effects of poor judge performance and bad pilot's behavior in some others countries. Poor judge performance and bad pilot's behavior may lead to desegregation and impair sport development. I have immense gratitude with good judge's work that spend their weekend making our fun possible, and I am seeing a possibility to help to promote a more peaceful environment both to good judges and good pilots.

Dear Mr. de Azevedo,

As an aside to the subject at hand; how do you manage to mis-spell names like Brett Buck (you spelled it "Buk") and Bill Byles (you spelled it "Bally")? Are attempting to be humorous or just careless? I realize that there is a language difference here, but I managed to spell your name correctly (or at least the way you have it written in your post.)

Brett Buck · « **Reply #24 on:** August 12, 2012, 06:15:57 PM »

Quote from: Jorge de Azevedo on August 11, 2012, 04:54:53 PM

Mr.Buk,
Yes, it is a very interesting discussion. Almost every one known what "mean" is, but few knows exactly its limitations.
Why use tree or five judges if you don't think any of the scores will be "wrong" ?
I think the correct word is "extreme value" instead of "wrong value". The extreme value, different of the other values of the group is not necessarily wrong, but the possibility of being right is very low.

  The reason you use more than one judge is that THE SCORE IS A MATTER OF OPINION, not a measurement of an objective number. They may (and do) weight different types of errors differently, but *not incorrectly*. If a particular flier manages to get good scores from a group of competent judges all weighting different errors fairly, that indicates that the pilot is making fewer errors of all types, and therefore should win. If you just had one judge, and that one judge weighted one aspect of a flight more heavily, it selects (more or less at random) for the guy who does that one thing better, not the one that does a reasonable range of things well.

   Moreover, judges in real life may in fact be looking for the same sorts of errors, but deduct more (or less) *per error* than someone else. That results in the "high judge/low judge" phenomenon. There are judges whose scores are known to run high or low but nonetheless rank the fliers in the correct order - someone might give a mediocre flight a 525 and a great one 570, someone else might give the same flights a 425 and a 470, but BOTH rank the flights in the right order. There is no reason to believe that the 525 is more "correct" than the 425. **

   Those are AMA scores, of course, but replace them with 900 and 1100 for FAI, same point applies.

   This is the essential fallacy of blindly throwing out high and low on a particular flight, and also a fallacy of using the median raw score. This has long been understood and that is why there is judge selection from one round to the next based on "tracking"

Quote from: Jorge de Azevedo on August 11, 2012, 04:54:53 PM

Whatever the method you use (mean or median), you substitute the notes of each judge (3,5,7 or better 20 or more) for only one number. I dint understand when you say ......and should be removed (much less two of them)? .....

  In your example case, with 3 judges, and using the median, you use only one score - the one in the middle. The other two are not used, despite the fact that one of the judges saw far more errors, or deducted more for a particular set of errors. With the average (or sum, same thing) the score is directly determined from all of the judge's inputs.

  It's a degenerate case, but that also makes the flaw of the method obvious.

   Brett

** p.s. by the way, this suggests that what you might do is wait until you get all the scores from a round, execute a tracking program, and then exclude a particular judge's score from EVERBODYs flight, then calculate and post the scores. This seems to directly address the possibility that some judge was intentionally biased, which should show up as his score reflecting a bias towards a particular flier. This is the essence of 99% of the judging complaints, particularly "judge is my flying buddy" or "West coast judge packing" complaints. Even this is fallacious reasoning - there's absolutely no way of studying the scores from distinguishing intentionally biased scores from a legitimate preference for the same type of flying/array of errors.

   A real example with real names for once, with made up hypothetical case. Judge Peabody and pilots Urtnowski and Fancher, same circle. Just say for arguments sake that we had 5 judges, Peabody and 4 others. Peabody and Windy are both from the same area, have many of the same contest experiences, see the same people fly most of the time, and have the same basic idea on how stunt should be flown. Peabody sees Fancher twice a year, have no common experience, see different people fly, and may have very different ideas on how errors should be weighted. The other 4 judges are have no common experiences with EITHER Ted or Windy or anyone else.

   Get out there at the NATs. Fly some flights, 4 of the judges track perfectly all day, all ranking the 40 flights they see in the same order. Say they all think Ted was first and Windy was third in the round. Peabody's score sticks out, it ranks Windy 1st and Ted 10th. Aha, we see what is going on, that antichrist of stunt is a ringer for Windy and killing Ted out of spite, right?

   But of course that's jumping to a conclusion. It is ABSOLUTELY IMPOSSIBLE to make that conclusion from looking at the scores after the fact. There's no way to distinguish this apparent "cheating" from a case where, naturally enough, Peabody looking for or weighting the errors differently, and it would be perfectly reasonable to expect that Windy has concentrated on removing the types of errors that are heavily weighted by people in the Northeast, where Ted has concentrated on removing other types of errors. It is entirely legitimate (and inevitable) that both the absolute and relative weighting of different types of errors maybe different from judge to judge, so the "anomalous" tracking that Peabody's results seem to show may be cheating, or may be the result of looking for different errors and legitimately deducting for them.

   That doesn't mean the tracking method used (which has always looked for these sorts of "anomalies") is incorrect or shouldn't be used. We think it picks judges that have a balanced range of error weights. But it does mean you can't ever determine that someone is doing it on purpose.

   I would note that if you replace "Peabody" with "McClellan" and "Windy" with "Bob Baron" you have the gist of the "Anatomy of a Team Trials", where careful mathematical analysis of scores was used to leap to idiotic conclusions about Gary's motivations.

  And for the record, I have never had any notion Peabody's score of my flying to be erroneous or biased despite our extremely contentious relationship, I just used him as an example. I did have a bit of a problem with his comments at the 2002 Judge Training on the topic of the "West Coast Hourglass" but even that could have been a discussion of the geometry. He was wrong as I showed in my SN article "Fun facts about the Hourglass" but being wrong and judging unfairly are two entirely different things. - bb

Brett Buck · « **Reply #25 on:** August 12, 2012, 06:22:10 PM »

Quote from: Jorge de Azevedo on August 12, 2012, 03:37:12 PM

As you know, if any country wants to enter on FAI World Championship they have to adopt FAI rules in their qualifying competitions.

I think that is also incorrect. As far as I know the FAI doesn't care at all how a team is selected, or that there is even a qualifying contest. A National Aero Club can just name a team using whatever method they care to. In fact that has happened at least twice to my personal knowledge (never in stunt, yet). Right now the AMA doesn't require the Team Trials participants to hold an FAI Sporting License, or meet the FAI airplane marking requirements, for example. The AMA currently has its own requirements that include use of the current FAI rules to some degree, but I think that is entirely their own rule, not flowed from FAI and has certainly not been adhered to at times in the past.

Brett

RandySmith · « **Reply #26 on:** August 12, 2012, 06:32:22 PM »

Quote from: Brett Buck on August 12, 2012, 06:22:10 PM

I think that is also incorrect. As far as I know the FAI doesn't care at all how a team is selected, or that there is even a qualifying contest. A National Aero Club can just name a team using whatever method they care to. In fact that has happened at least twice to my personal knowledge (never in stunt, yet). Right now the AMA doesn't require the Team Trials participants to hold an FAI Sporting License, or meet the FAI airplane marking requirements, for example. The AMA currently has its own requirements that include use of the current FAI rules to some degree, but I think that is entirely their own rule, not flowed from FAI and has certainly not been adhered to at times in the past.

Brett

Brett is correct, the AMA is under no requirement to adhere to the FAI rules, They do follow some rules in F2B team trials ,because the AMA thinks that is the best way to pick the team members.
I have heard of other countries that just use the top 3 finalist at their Nationals for the Team. Some others can be appointed.

Randy

Brett Buck · « **Reply #27 on:** August 12, 2012, 07:38:25 PM »

Quote from: RandySmith on August 12, 2012, 06:32:22 PM

Brett is correct, the AMA is under no requirement to adhere to the FAI rules, They do follow some rules in F2B team trials ,because the AMA thinks that is the best way to pick the team members.
I have heard of other countries that just use the top 3 finalist at their Nationals for the Team. Some others can be appointed.]

And in fact I know that at least some US FAI teams were done that way. It may have been (and may still be) the default method for Spacemodeling and there was some R/C electric soaring event in the 80s where they voided the TT results after it was over, and appointed the team - which curiously enough were the exact same finishers in the voided event. Probably should have done that in 1993 and I think it may have been the plan in 1997 for a brief time until cooler heads prevailed.

Argument about the location aside, there's certainly no one complaining about the quality or operation of our TT or TSC, absolutely the highest standards and completely smooth.

Brett

Jorge de Azevedo · « **Reply #28 on:** August 13, 2012, 09:48:58 AM »

Quote from: Brett Buck on August 12, 2012, 06:22:10 PM

I think that is also incorrect. As far as I know the FAI doesn't care at all how a team is selected, or that there is even a qualifying contest. A National Aero Club can just name a team using whatever method they care to. In fact that has happened at least twice to my personal knowledge (never in stunt, yet). Right now the AMA doesn't require the Team Trials participants to hold an FAI Sporting License, or meet the FAI airplane marking requirements, for example. The AMA currently has its own requirements that include use of the current FAI rules to some degree, but I think that is entirely their own rule, not flowed from FAI and has certainly not been adhered to at times in the past.

Brett

Very nice to read what you and Randy Smith writed !
Do you know how I can send and if a can do official ask to FAI about that ? What is FAI address and the person I should send the request?

To Howard Rush:

Are all Impacts orange?
Are all Noblers blue ?
I am sure you know the true,
Statistical method solve problems to you.

Jorge de Azevedo · « **Reply #29 on:** August 13, 2012, 01:32:27 PM »

Mr.Buck,
The same reasons you can’t state a low or high note is wrong, you can’t state it is right.
In fact you can’t state almost anything with 100% certainty. Our mind betray us more frequently that we know , it is why searchers use to do double blinds approach and many others procedures including appropriate statistical methods to go not so far from what is really happen. It is why you choose a procedure to select judges that looks better to you.
If I understand the current method to select judges, it looks to me having problems of sensitivity. I thing this method leads to many ties. How do you select between consistent low; consistent high and consistent mean whose pilots rank resulted the same?

Jim Thomerson · « **Reply #30 on:** August 13, 2012, 02:18:22 PM »

There are people who have perfect pitch and can tell you if a musical note is right or not. I am not sure there is an equivalent talent among stunt judges, however.

Brett Buck · « **Reply #31 on:** August 13, 2012, 03:08:06 PM »

Quote from: Jorge de Azevedo on August 13, 2012, 09:48:58 AM

Very nice to read what you and Randy Smith writed !
Do you know how I can send and if a can do official ask to FAI about that ? What is FAI address and the person I should send the request?

I would suggest your national CIAM representative.

Brett

Howard Rush · « **Reply #32 on:** August 13, 2012, 04:06:57 PM »

Quote from: Jorge de Azevedo on August 13, 2012, 09:48:58 AM

Are all Impacts orange?
Are all Noblers blue ?
I am sure you know the true,
Statistical method solve problems to you.

The good Impacts are orange.
George Noblers were blue
The Jive Combat Team wins
When the scores come out true.

Jorge de Azevedo · « **Reply #33 on:** August 13, 2012, 09:05:42 PM »

Quote from: billbyles on August 12, 2012, 05:28:33 PM

Dear Mr. de Azevedo,

As an aside to the subject at hand; how do you manage to mis-spell names like Brett Buck (you spelled it "Buk") and Bill Byles (you spelled it "Bally")? Are attempting to be humorous or just careless? I realize that there is a language difference here, but I managed to spell your name correctly (or at least the way you have it written in your post.)

Dear Mr.Byles,
Don't be so strict !
Probably It was a senior moment. The birth certificate becoming yellow; visual perception is decreasing; and most important, recent memory is affect and we don't remember what we write or read immediately before.

billbyles · « **Reply #34 on:** August 13, 2012, 10:02:14 PM »

Quote from: Jorge de Azevedo on August 13, 2012, 09:05:42 PM

Dear Mr.Byles,
Don't be so strict !
Probably It was a senior moment. The birth certificate becoming yellow; visual perception is decreasing; and most important, recent memory is affect and we don't remember what we write or read immediately before.

Dear Mr. de Azevedo,

Don't be so careless! It is not that difficult to at least get someone's name spelled correctly, language barrier or not.

Brett Buck · « **Reply #35 on:** August 13, 2012, 10:38:44 PM »

Quote from: Jorge de Azevedo on August 13, 2012, 01:32:27 PM

Mr.Buck,
The same reasons you can’t state a low or high note is wrong, you can’t state it is right.

Right, but we select people to stand out on the blacktop for 8 hours a day for a week, in the hot Sun, mostly for free. We also spend many hours training them every year. I figure I will err on the side of using their efforts rather than finding reasons to toss them out.

Quote from: Jorge de Azevedo on August 13, 2012, 01:32:27 PM

In fact you can’t state almost anything with 100% certainty. Our mind betray us more frequently that we know , it is why searchers use to do double blinds approach and many others procedures including appropriate statistical methods to go not so far from what is really happen. It is why you choose a procedure to select judges that looks better to you.

Look better *based on something*. And again, using appropriate statistical methods includes an evaluation of the applicability of the methods to the problem. That's the very first step in developing the approach, and I think that it is obviously inappropriate given the nature of the problem.

Quote from: Jorge de Azevedo on August 13, 2012, 01:32:27 PM

If I understand the current method to select judges, it looks to me having problems of sensitivity. I thing this method leads to many ties. How do you select between consistent low; consistent high and consistent mean whose pilots rank resulted the same?

We don't (if I understand your point). Using the median will quite obviously result in more ties since it increase the quantization of the results. We have had WC Teams determined by 0.04 points out of about 2200.

Brett

Howard Rush · « **Reply #36 on:** August 13, 2012, 11:12:33 PM »

Quote from: Jorge de Azevedo on August 13, 2012, 01:32:27 PM

If I understand the current method to select judges, it looks to me having problems of sensitivity. I thing this method leads to many ties. How do you select between consistent low; consistent high and consistent mean whose pilots rank resulted the same?

The formula appears here: http://stunthanger.com/smf/index.php?topic=11672.0 Paul set exceedance weighting to zero, and I think it has remained there.

I don't remember there being any ties in five years or so of operation.

Jorge de Azevedo · « **Reply #37 on:** August 14, 2012, 07:47:41 AM »

Quote from: billbyles on August 12, 2012, 12:22:00 PM

Dear Mr. de Azavedo,

It is beginning to look to me like you are trying to kill flies with a 20 pound sledgehammer. I get the math, as a mechanical engineer, but this is overkill. How is it that the current system (which has been working for quite some time now with little tweaking) bothers you? Did you feel unfairly judged at the U.S. Nats? Just curious.

Dear Mr.Byles,
How you can see, is not so easy to spell correctly some times. My name is Azevedo not Azavedo.

The problem was the disease
That with the age comes easy
Receive a German name
I don't know how
I can't remember now

Jorge de Azevedo · « **Reply #38 on:** August 14, 2012, 07:54:55 AM »

Quote from: Howard Rush on August 13, 2012, 04:06:57 PM

The good Impacts are orange.
George Noblers were blue
The Jive Combat Team wins
When the scores come out true.

The good Impact are orange
The Aldrich's Noblers were blue
Now as the score comes true
We have colorful Ringmaster
To easily defeat you

Steve Helmick · « **Reply #39 on:** August 14, 2012, 01:34:27 PM »

All of George's Noblers were not blue. This one was either yellow or cream colored, but alas, I cannot recall which.

My Statistics class has mostly been long forgotten, but I looked up Median, and it makes no sense (to me) to apply it to stunt scores, either for individual maneuvers or totals. Make the scoring system more complicated (require the use of a laptop) and fewer people will put their hand up to be event director. K.I.S.S. is not a bad principle.

Steve

Jorge de Azevedo · « **Reply #40 on:** August 14, 2012, 02:39:43 PM »

Quote from: Howard Rush on August 13, 2012, 11:12:33 PM

The formula appears here: http://stunthanger.com/smf/index.php?topic=11672.0 Paul set exceedance weighting to zero, and I think it has remained there.

I don't remember there being any ties in five years or so of operation.

The ties that I wrote before were about judge selection to the finals, not pilots ties.
If I understood well, the current procedure to select judges to the finals is based on ranking of pilots considering notes from each judge in particular.
I have in mind not exclusively NATS but World Championships and other completions adopting FAI rules. So, to me, the current method of scoring pilots by mean of judge's notes has very good sensitivity but lacks specificity because permit that extreme values (nobody 100% knows if right or wrong) affect mean and in case of being wrong we confound good pilot performance with not so good one.
The current method of selecting judges to judge the finals has good specificity because select consistent not intentional biased judges, but looks like having lacks of sensibility because consistent judges when analyzed separately, I think, ranking the same pilots the same place and its leads to ties (maybe I didn't understand the procedure well because I didn't visualize well what happens when that formula is applied).I understand that is valued the deviation of the consensus place divided per number of flights observed, to be possible compare judges whose observe different number of flights.
Just one suggestion:
If we a priority know how judge is consistent low notes; consistent high notes and consistent mean notes (certainly you know by historical analysis), then we can solve many problems if grouping them by category to assign the same circle to each judge's category. In this case, in thesis, the difference between notes will be small and you can use mean (to calculate the note to each maneuver) with good central tendency measure because we won't have extreme values (symmetric distribution).At the same time, to evaluate judge's performance we can use sum of deviations to determine inside the same category of judges (whose judge the same flies) who was more consistent. This method I think will satisfy every ego; interest conflicts, etc..
In the finals if different judge's category is mix, than will be more secure to use median to calculate pilots notes with better specificity.

See, All of George's Noblers were not blue !

Howard Rush · « **Reply #41 on:** August 14, 2012, 03:29:38 PM »

Quote from: Jorge de Azevedo on August 14, 2012, 02:39:43 PM

The ties that I wrote before were about judge selection to the finals, not pilots ties.

See, All of George's Noblers were not blue !

I was referring to judge selection to the finals, not pilots ties.

I challenge you to find a rhyme for orange in English. Alas, it's the only language of which I know more than a dozen words.

Does naranja rhyme with piranha?

Steve Helmick · « **Reply #42 on:** August 14, 2012, 03:38:42 PM »

ImpActs are Orange,
Noblers are Yellow.
Jive Combat Teams
are jolly good fellow(s).

billbyles · « **Reply #43 on:** August 14, 2012, 04:23:35 PM »

Quote from: Jorge de Azevedo on August 14, 2012, 07:47:41 AM

Dear Mr.Byles,
How you can see, is not so easy to spell correctly some times. My name is Azevedo not Azavedo.

The problem was the disease
That with the age comes easy
Receive a German name
I don't know how
I can't remember now

Dear Mr. de Azevedo,

Well what do you know...you really can spell when you put forth just a little effort!

Jorge de Azevedo · « **Reply #44 on:** August 14, 2012, 04:34:05 PM »

Quote from: Steve Helmick on August 14, 2012, 01:34:27 PM

All of George's Noblers were not blue. This one was either yellow or cream colored, but alas, I cannot recall which.

My Statistics class has mostly been long forgotten, but I looked up Median, and it makes no sense (to me) to apply it to stunt scores, either for individual maneuvers or totals. Make the scoring system more complicated (require the use of a laptop) and fewer people will put their hand up to be event director. K.I.S.S. is not a bad principle. Steve

Ah, Ha !
Is it complicated select with one mouse click the function one on Excell ?
Wait until look this formula Howard Rush presents in this link
http://stunthanger.com/smf/index.php?topic=11672.0
Good luck !

Jorge de Azevedo · « **Reply #45 on:** August 14, 2012, 06:40:47 PM »

Quote from: Howard Rush on August 14, 2012, 03:29:38 PM

I was referring to judge selection to the finals, not pilots ties.

I challenge you to find a rhyme for orange in English. Alas, it's the only language of which I know more than a dozen words.

Does naranja rhyme with piranha?

Ok, I agree,
Time to drink a 12 years old or a good Cabernet salvingnon.
Porém rimar orange com qualquer outra coisa em Inglês é foda !
I will keep trying with a little help of friend Mr. Belly.
See you,
Jorge

Jorge de Azevedo · « **Reply #46 on:** August 15, 2012, 01:26:32 AM »

Quote from: Howard Rush on August 14, 2012, 03:29:38 PM

I was referring to judge selection to the finals, not pilots ties.

I challenge you to find a rhyme for orange in English. Alas, it's the only language of which I know more than a dozen words.

Does naranja rhyme with piranha?

As The Jive combat team can't rhyme orange
They painted theirs Impacts red and blue
To do a revenge and defeat you once more
When the scores come out true