Obviously I don't know the calibration algorithm being used - just the principle - but in that context we can probably assume that the FC is expecting a fixed magnetic field in the earth frame of reference (which should be just the earth's magnetic field) and a fixed magnetic field in the IMU (aircraft) frame of reference that will be different but constant at each of the magnetometer locations.
If that is the case (which is the ideal condition for a calibration) then taking the measured magnetic field vector as a function of orientation at each magnetometer, the FC should be able to decompose that to a linear combination of the two fields in the two frames of reference and store the field due to the aircraft for subtraction from the measured field in flight. That would be a good calibration.
If there are other magnetic field components that are neither fixed in the aircraft or the earth frame of reference, such as something magnetized being worn by the person calibrating that moves during the process, or a locally-generated field (such as a nearby vehicle or speaker magnet) that varies enough with position that the vector at the magnetometers changes as the aircraft is rotated, then there should be no unique linear combination of two fields in different FORs that account for the measured data, so that should be detectable as a bad calibration.
In other words I think that your example, above, is likely a correct interpretation of the situation, and that is also consistent with
@BudWalker's hypothesis that, in general, a calibration in a locally non-uniform distorted field will fail. The corollary of that is that a calibration in a locally uniform distorted field should succeed, but that would be fine because the FC is not interested in characterizing the field in the earth FOR - only the one in the aircraft FOR.