Yes, underexposing always creates noise. The reason is that each photosite on the sensor has what's called a full well capacity - the ADC will report a maximum value at base ISO when FWC has been reached, or in other words when the photosite has been fully saturated with light/electrons to its maximum capacity. If you are only feeding each photosite half the light that it needs to make an ideal, noise-free exposure, you get noise (noise appears randomly within an image). Alternatively, when you raise the ISO you are essentially telling the ADC to report max code at a lower electron count, which again as we all know increases noise (higher ISO = more noise). For a numerical example, lets assume for argument's sake FWC on the Mavic Air's sensor is 100,000 electrons at ISO 100. At ISO 400, the ADC is told to be satisfied and report maximum code after capturing only 25,000 electrons, and digital amplification is applied. That is the basic physics behind it anyway.
If you used a normal ND filter (or no filter at all) and exposed for the foreground, you would run into exactly the same problem, but the opposite (foreground properly exposed, but sky totally blown out). No matter what you do, the sensor can only expose for one exposure at a time - when you shoot a scene with dramatically different brightness levels (like a sunset over dark foreground) you have to pick one or the other. A standard ND filter would do nothing to help you at all but you can use whats called a graduated ND filter - these allow half the ND filter to expose more, and half to expose less. You would put the transition point on the horizon, and you could even out the scene by darkening the sky while exposing for the foreground. The problem with these filters on drones is that you can't adjust them in the air, so you would have to plan your entire flight around it, and never alter your position relative to the horizon unless you wanted very uneven footage.