Our brain loves distractions, and multi-tasking gets bored quickly. When we read text or watch a photo, it engages us visually, a video (with audio) engages us even more. The bandwidth of eyes is much larger than the bandwidth of our ears. When we are watching something, it utilizes more bandwidth and hence occupies more of our attention span. Also, given the way our eyes work, we can focus more on the exciting aspect of the visual feed. Compared to that, audio underutilizes our brain’s bandwidth. Further, the unidimensional flow of audio data at a linear speed does not mimic our ability to process it. Contrast forced direct listening with how non-linearly humans read.
And that’s why video games are even more engaging than video. They utilize the bandwidth even further by forcing us to think and act in the game.
Since audio underutilizes our brain’s bandwidth, it leaves spare bandwidth for distractions, including eating food, driving, and exercising. No wonder most audio consumption is passive and happens as a secondary activity as opposed to being a mainstream activity like reading or watching movies.
Could be that audio underutilizes brain bandwidth, I don’t know. But the activities you mention as pairing with audio like food, driving, exercise all require your eyesight. So I read your conclusion as: it is the eye that decides what we do, first and foremost.