I'm trying to wedge your terminology into my terminology, and things aren't fitting. Do you mean observer in the control-theory-ish sense? I would peg the KF as being that critter in this case. If not, please elucidate.

What I think we are talking about are taking accelerometers and gyros to determine the position and attitude and then incorporating absolute position and attitude references to correct from cumulative errors arising from gyro/accelerometer scale factor, misalignment, and drift.

Absolute position and attitude indicators are frequently referred to as observers. What a typical inertial navigation system does is integrate the gyro and accelerometer data, include the geometry/kinetmatics, and generate an estimated position and attitude. In the absence of other information these will generate growing errors over long periods of time, but they are pretty clean (in terms of noise) and have a decent bandwidth over short periods. Every so often you need to correct for the tendency for the error to grow by incorporating an external absolute reference - GPS, Star tracker, magnetometer, sextant data. These are typically pretty noisy from second to second, and not available at a sufficient rate to use continuously, but don't wander off over time. Typically these are called observers.

Since the observers are known to be noisy, you don't want to take any individual reading of absolute position and/or attitude and just wipe out your estimate with it. What you want to do is nudge your estimate towards the observer information slowly, so, take the difference between your integrated estimate from the gyros/accelerometer and the observer reference, multiply it by some factor less than one , and add that to the estimate. Now the estimate is closer to the observer. If it's off the same direction the next time, you nudge the estimate closer, etc. Over time, if your estimate was really off, it will approach the average of the noisy observer (which is presumed to be accurate) even though none of the individual observer readings are really correct due to noise. So you wind up with an estimate that is not noisy, and right down the middle of the observer readings, perfect accuracy. The degree to which you nudge to towards the observer is the system gain.

Note also that there is drift (its always telling you a rate or acceleration that is in error by a fixed amount regardless of the actual rate) what will happen is that the estimate never gets to the average of the observer readings. It will approach it until the degree of update from the observer is the same as the drift. Of course, this allows you to estimate the drift, since you know that the average error is persistently in the same direction and roughly the magnitude. In fact the average error over time is is proportional to the drift. You can then use this drift estimate to alter the gyro information before you integrate it, so over time the estimate of the position or attitude is corrected. With a little more work you can use similar reasoning to determine error in the gyro or accelerometer that is not fixed, but some function of the rate - the scale factor error, and an error sources related to the absolute orientation - the misalignment.

You probably knew all this already. What I think we are not communicating on is how this is done. Kalman Filtering can be applied to this problem, but you can do it without Kalman filtering, too. The difference is that a Kalman filter considers the postulated degree and nature of the various error sources (degree of noise, amount of drift, etc) and changes the gain based on how long you have gone since the last update. If you have been without observers for a long time, it is reasonable to expect that the estimate has drifted off a long way, so when you get the next observer reading, you probably want to take a bigger jump towards it even though it is noisy. Once you have gotten lots of observer data, you might think your estimates are all pretty good, so you want to take only tiny jumps towards the observer position to mitigate the noise. Assuming you accurately know the noise and drift characterstics, this creates what is purported to be an "optimal" estimator, meaning that it drives the overall error to near zero in the "best possible way", i.e it converges really fast and then rejects all the noise nicely.

But no one says you have to do it that way. Many, many systems use fixed gains that always nudge the estimate towards the observe by the same amount regardless of error sources, how long it has been, etc. You can set them to converge fast but permit a lot of noise through, or to filter the noise nicely but take a long tome to converge, or something in between, but it doesn't change itself based on time or lack of updates. In many cases where observer data is available at a fixed interval, the gains of the Kalman filter converge to fixed values and then it operates absolutely identically to a fixed-gain system. Some hack even simulate their problem using a Kalman Filter, see what the gains wind up, then use these as the fixed gains in their real system. That's because they can't do math but they can type things into computers.

These are perfectly valid systems and most of the time they work similarly. The difference is that Kalman Filters have variable gains and other systems have fixed gains. A full-up Kalman filter is very computationally intense, requiring extensive matrix manipulation, fixed gain systems are far less so, even when you include the kinematics. The other problem is you never know what a Kalman filter is going to do in closed-form, or in real life, and it's response varies intentionally over time. If you lose the observer for a while, when it comes back, it will take the first reading it gets as perfect, dump almost all of the error at once, and cause a big jump in the reference.

There are many erroneous papers written describing "Kalman Filters" that are partial implementation, or just generically calling all inertial reference systems that include absolute references as "Kalman Filters".

My point was that in many cases where the axes are strongly coupled and constrained, not only are the extensive Kalman filter calculations not required, you don't need or want the variable gain features because you would rather have controllable smooth updates rather than unpredictable jumpy updates, and the strong constraints tend to cause the gains to vary in detrimental ways, while it actually helps in some of the fixed-gain systems.

Brett