Thanks Brojon.
If this conversation continues with you I will tell you a story about how unexpected results from a computer system are not always detrimental. The preverbal Ghost in the machine.
I am still trying to keep it within the context of people in this Hobby not IT developers. So its a bit Layman in its context.
Software and Hardware stability is a significant issue in the IT industry. You can see how Microsoft struggles with Security issues and realises patches monthly to fix security holes (Bugs) in the code.
You are 100% correct. There is no such thing as bug free code.
Every time an enhancement is released new bugs appear as customers push the limits of the data interaction.
Many years ago I constructed an AI based computer control system for the metal finishing industry. It was an education into Microprocessor based issues when you are constrained by both performance and memory.
The EEprom used in most micro control systems is very slow. As is the CPU, it cant be clocked very fast as it would generate too much heat. You would normally only use EEprom as a bootloader and load the actual code into faster memory. Not much chance of this when you are dealing with a miniature FC.
Hardware and Basic OS Architecture.
There are only 3 basic architectures you can implement with Microprocessor technology. I don't know what is inside a DJI FC.
You either:
- Poll the Hardware constantly. Least effective
- Use a hardware clock to generate a regular interrupt. Moderately effective.
- Use interrupt driven hardware input. This is the most efficient and effective. Often integrated with a Real time clock interrupt as well.
I also don't know if the actually have an operating system as such. I would assume so as it would allow them to develop FC systems and invest in a code base.
Here are the potential problems with the Hardware.
Polling. This consumes 100% CPU constantly and means there is no headroom for anything unexpected. It basically has to check every piece of incoming data stream even if no data is present. If it spends too much time on one task due to complexity in teh data stream then it will fail to recognise events within the poll cycle.
Hardware Clock. This is much more effective than polling but it requires that teh OS is smart enough to not miss any hardware events. It too is susceptible to a data stream hogging CPU cycles.
Hardware Interrupt. A hierarchical hardware interrupt scheme is the most efficient it can via the interrupt level prioritise high priority data streams above lower levels. An example of this would be proritising flight level stability. Reading Gyro / Accelerometer data rather than writing to the log on the SD card. This is the most effective and efficient but is completely at the mercy of how fast the data stream is generating interrupts. And this is dependant on the interfaces to the auxiliary sensors. Again I have no idea how DJI integrate the HW and Sensor technology. I suspect that a lot of it is imbedded in the main LSI (Large Scale Integration)
But is most cases the Hardware is pretty reliable, you can have some nuisance sensor noise as an example that could effect flight stability. But a HW failure with the FC will result in an irrecoverable fall from the sky.
Software and Development Tools..
This is where the problems really surface.
When I created the AI system in the mid 80s I wrote the OS the Interpreters and compilers in Assembler (Machine Code) This gives you the best control of hardware. But it is very slow and cumbersome writing complex code this way.
With the advent of 3 and 4GL languages your use of the language needs to be precise and you also suffer from bugs in the actual compiler itself.
Hence I always recommended "don't do anything out of the ordinary" in your coding practice, don't try and be too clever. It can exacerbate any errors in the compiler itself as you are covering new ground for the compiler. It, in most cases fell on deaf ears when I did code reviews. Its like telling a fighter pilot to fly slow. Doh.
The majority of code written today is in C there are zillions of C programmers as it was created and used to educate software Developers in Computer science classes in Universities.
It's one of the worst languages for practical application due to its intense issues with pointer management.
The other advice I always gave to student coders was never forget that 0 is a positive integer.
There are many bugs written due to this misconception. When you are dealing with tabular data in a buffer you deal with a base and an index (Pointer.) The first entry in the table (Base plus index) is index 0.
When I was working with Microsoft (JDP lead) in the late 90s on Windows 2000, they used a third party tool to scan all the Windows NT code for poor coding practices with pointer management. Thousands of bugs were eliminated in this process.
Yes Windows OS is written in C.
The other biggest issue is lazy coding. I mentioned it in an earlier post. Type Checking and Data Type Declaration. This really exposes issues when new functionality is implemented. (Type Checking) and when unexpected data like excessive vibration (Large Data Swings) occurs.
(Data Type Declaration)
Here are the potential problems with the Software / Firmware.
The are 2 finite resources in the FC. CPU cycles and Memory.
The OS manages the memory resource by allocating it in Pools / Buffers (If any one is interested I can explain how the simplest of these pools are managed. Its called a linked List) The compiles code itself uses memory pools but invisible to the programmer its imbeded in the Assembler output of the complier. CPUS only run machine code. Well some like the i3 to i7 Intel chips actual run microcode to make them compatible with earlier x386 and X64 AMD technology machine code.
So here are some logic errors that crop up constantly with Type Checking.
The logic of the written code might deal with parameters that run from 1 to 10. A lazy programmer will define options from 1 to 9 and if its not any of them then it must be 10. So option 10 Becomes a "catch all".
What happens if the parameter 11 is passed? or any number above 10?. Its treated like a 10. If this happens in a FC then imagine what calculations it might do based on the wrong data.
Examples of this occurs when new functionality is introduced like Pano shots. you would haev logic that determines what style of photo needs to be taken. If a logic section of code is missed when the feature is added. Unpredictable results may occur when selecting pano.
Proper Data Checking would ensure that the input data is checked that it is within bounds of the logic routine that is handling it. If an out of bounds parameter is passed you would then execute exception handling.
So here are some logic errors that crop up constantly with Data Type declarations.
I don't want to get boring here but when you define something "Data" to the compiler. You have to be specific about the physical size of what you are declaring.
If you want to define a pointer as an example you could define it as an long or short integer. The size is important as you use lots of data definitions that consume memory and memory is finite. So you declare the minimal size required.
If you can imagine a "word" thats 16 bits of memory in storage. 16 bits gives you a range of numbers from 0 to 1023 if its defined as a
positive integer. It can also be a number from -511 to +511.
Of the 16 bits in storage the most significant bit (the one to the max left is a sign bit). If 0 its positive if 1 its negative. Can you image if delivered incorrectly what would happen if the logic incremented this data word until it reached 512 or above?
Suddenly it would become negative and the pointer you are using would no longer point to the memory pool you are using but somebody else's. You are about to stomp all of random memory. In a lot of cases this would / Could try and write to protected ring 0 operating system memory. This would then cause the CPU to trap. In the Windows world that is a Blue Screen of Death.
So this are a couple of ways that poor coding can create a bug that would only occur when something excessive happens in the input data stream.
The particular case at point and the original Statement I made about excessive vibration goes like this.
The FC reacts to input from the Gyros and Accelerometers.
The FC will react to this data. Depending on the FC architecture it will read the parameter data. It may do this every 20 milliseconds. (example only) It most cases it will be the same as it was last time so the FC will basically ignore it. When it does get a change in input data it will process it and determine what it needs to do to the motors to stabilise the model.
Under normal flight conditions thais may consume a small amount of CPU cycles and memory allocation. i.e. it doesn't ahem to do this very often in relation to the time it spends sampling input data.
Now imagine that you are getting excessive vibration from your model. When the data is sampled it now need to process it every single sample. So it has to calculate a reaction to the vibration pulse and program a change in motor speeds.
This then starts to consume more and more CPU cycles and more and more memory.
It starts to go down logic paths that are not normally used.
i.e. it is entering coding paths that may never have been executed before. If the coding is perfect. Then not a problem. If not, you will get a FC trap and a fall from the sky.
I hope this dissertation helps clarify what I was talking about.
Cheers Brian
p.s. I have not checked this in detail for Spelling or grammar errors. I may have to edit. Sometime autocorrect (autocorrupt) can change the sentence meaning.
Point out anything that is questionable.