An ND filter has only one specific purpose: the control of shutter speed. It cannot control or change the total range between a bright highlight and deep shadow (dynamic range). No filter can do that. But since drone cameras typically use shutter speed as the only means of exposure control (no iris), if you want to run a lower shutter speed, you have to cut back the total light with an ND filter.
The total dynamic range of an image cannot be directly controlled with a filter, but you can do something that helps you deal with it in post. Shoot your video using the d-log or cinelike profile, which provides a flatter, low contrast result, often at greater bit-depth (8 bit is standard for video, 10 bit is available in the d-log profile). This will help you capture a wider brightness range without crushing blacks or peaking whites, and then you'll have something you can adjust in post to better represent the reality. Doing this involves imposing a nonlinear brightness curve on the image.
Image sensors produce data that has a linear relationship to brightness. But our eyes see things very nonlinearly. Doubling the light intensity doesn't double what we see as brightness. So, our task is to capture as much of the total dynamic range of an image, without crushing the shadows or blowing out the highlights. D-log and cinelike help you do that to a greater extent, but then the image has to be adjusted (graded) in post. The adjustments can be simple or complex, but all editors have this in some form, the simplest of which is adjustment handles that let you control the darkest levels, a mid level, and the top level. Sliding the handles around bends the brightness curve, and applying a LUT does a lot of the work for you, assuming a LOT of things of course, not always correctly. A LUT is just a quick/dirty starting point, and is not actually necessary at all.
Try shooting some video out of the "normal" setting then try adjusting it in post. Your task then will be to represent the original scene while keeping the darkest parts of your image above 7.5 IRE and your brightest peaks below 100 IRE (most editors have a scope/meter for measuring that), while bending the brightness curve to pull detail out of shadows, and bring highlights down to something realistic.
In digital imaging, most of the data produced deals with highlights, very little deals with shadows. That's why it's better to slighty over-expose if your shadows are really dark, you'll then have more data, more digital levels if you like, to work with in post without bringing up noise or banding. Highlights already are represented by most of the digital levels, if not all, so they don't have as many adjustment side-effects.