Yes, or at least another kind of interpolation. (there is always some interpolation going on in an image sensor).
A 48MP quad-Bayer sensor does indeed have 48 millions photodiodes, but the bayer filter in front of them looks like the filter on a 12MP sensor.
The Bayer filter in front of a 48MP sensor should have one colour in front of each photodiode to be a "true" 48MP sensor. But a quad-Bayer filter has one colour covering 4 photodiodes, in a 2x2 pattern. This gives a good quality 12MP photo, with 4 photodiodes capturing the same colour from the bayer filter. A very good idea.
But to make a 48MP photo it has to do some interpolation because one colour in the filter covers 4 photodiodes.
This is why we sometimes see colour artifacts or other strange colour patterns in 48MP photos from these sensors.
Not bad, but here's where you're going astray.
This is another installment of me droning on and on, but if you want to really understand this stuff, it'll be worth it.
A non-quad sensor still has a Bayer filter over the photodiodes, and each captures the light intensity of a single color, R, G, or B.
Each of these locations becomes an RGB pixel after the missing two channels are reconstructed by using neighboring pixels that captured that missing channel, via an interpolation algorithm. The simplest is nearest-neighbor averaging, and usually produces a decent result, but there are all manner of complications in the image that can defeat simple averaging and produce artifacts.
This is called demosaicing or debayering, and is a part of capturing images from all digital cameras (with some esoteric exceptions we'll ignore, as they're irrelevant to this discussion).
There are more sophisticated reconstruction algorithms that do a better job, some of the best being "content-aware", analyzing the content of the image and adjusting how it determines the 2 missing channels at each pixel.
I explain all this to make the following point: The recovered channels have an error range (error bars) that can be mathematically calculated. A non-quad image has errors at every pixel for the two reconstructed channels, just like the quad-Bayer capture does. The errors are just larger, and this is critically important –
for the same demosaicing algorithm — than for the simple Bayer filter.
So you can see where the problem is with the idea of a "true" 48MP image... what does that mean? Error-free isn't possible. Is there a particular error threshold you have in mind? I'm sure you see the problem.
The idea of a "true 48MP image" gets even more meaningless if you allow for different demosaicing algorithms. Suppose you apply a very sophisticated, compute-intensive demosaicing algorithm to the 48MP quad-bayer capture, and the simplest nearest-neighbor algorithm to a capture from a theoretical camera with the only difference being an ordinary Bayer filter instead of a quad.
The error bars for the missing channels can be less for the quad image than for the non-quad, resulting in a higher resolution higher color fidelity result than the non-quad capture, if it uses a far more sophisticated demosaicing algorithm.
Which one is the "true" 48MP image?
When Sony introduced the quad-bayer in 2018, demosaicing algorithms were all designed for a 2x2 bayer pattern, so didn't do the greatest job minimizing errors. Hence the reputation quad-bayer sensors acquired, and deservedly so.
Fast forward to 2023. A lot of R&D has improved quad-bayer demosaicing – a lot. Computing power has advanced much too, making it possible to do much more than simple averaging in real time in GPUs, and even to some extent on-chip with some higher-end sensors. As we see with the new
A3, quad-Bayer captures are getting nearly as good as simple Bayer captures, even better if they can be demosaiced by a sophisticated algorithm.
If we define "true" to mean a pixel where all 3 color channels are captured directly, with no reconstruction, the closest would be a 48MP sensor with a regular Bayer filter capturing in 12MP, a 2x2 cluster of photodiodes representing a single pixel but with the RGGB Bayer filter pattern over them. Then, red, green, and blue get directly captured for the "pixel" and there is no demosaicing. 48MP captures would still require demosaicing, but with the more typical error size for the reconstructed channels.
Why use a quad-Bayer filter then in the first place? Low-light performance, which sensor manufacturers have determined is a bigger gain than eliminating channel reconstruction errors, and can more easily be addressed computationally than data that simply isn't there at all in low light due to limited sensitivity and dynamic range of the sensor.
The other reason, sadly, is resolution wars.