The sensor is a large array of photosites, where each photosite corresponds to a pixel in the final image, and each photosite accumulates a voltage as photons strike it. After the exposure, the voltage at each site is read, and that set of values is the image.
You don't see cars because cars...