Apple on Monday unveiled its long-awaited mixed reality headset, called “Vision Pro” – the tech giant’s first major product launch since launching its Apple Watch in 2014. The device, which will retail for $3499 when it launches in early 2024, is aimed at at developers and content creators, rather than average consumers. The headset, sci-fi as it sounds, could be the beginning of a new era not only for Apple but for the entire industry. Apple is calling the Vision Pro, the world’s first spatial computer but what does it do? We simply the science behind the Vision Pro headset.
What is Apple’s Vision Pro?
To put it simply, Apple’s Vision Pro brings the digital into the real world by introducing a technology overlay into your real-world surroundings. Once you strap on the headset that is reminiscent of a pair of ski goggles, the Apple experience you must be familiar with by using iPhones or Mac computers is brought out into the real world.
But it’s not really that simple. The Vision Pro follows in the lead of many other Apple devices–there are a lot of complex technologies underpinning what seems like a simple user interface and experience.
“Creating our first spatial computer required invention across nearly every facet of the system. Through a tight integration of hardware and software, we designed a standalone spatial computer in a compact wearable form factor that is the most advanced personal electronics device ever,” said Mike Rockwell, Apple’s vice president of the Technology Development Group, in a press statement.
How does the headset work?
Before we get into how the headset does it, it would perhaps be prudent to understand what it does. The mixed reality headset uses a built-in display and lens system to bring Apple’s new visionOS operating system into three dimensions. With Vision Pro, users can interact with the OS using their eyes, hands and voice. This should mean that users can interact with digital content as if it is actually present in the real world, according to Apple.
Promotional videos where the wearers’ eyes are visible may make it seem like the Vision Pro uses transparent glass and puts an overlay on it à la the now defunct Google Lens, but that is not the case. The eyes are visible on the outside because there is an external display that puts a live stream of your eyes.
The Vision Pro will use a total of 23 sensors, including 12 cameras, five sensors and six mics, according to TechCrunch. It will use these sensors along with its new R1 chip, two internal displays (one for each eye) and a complex lens system to make the user feel like they are looking at the real world, while in reality, they are essentially getting a ” live feed” of their surroundings with an overlay on top.
The R1 chip has been designed to “eliminate lag” and motion sickness, according to Apple. Of course, the device also features the more conventional M2 chip for the rest of the computational processes that will actually drive the apps you use with the device.
Infrared cameras inside the headset will track your eyes so that the device can change the internal display based on how your eyes move, so that it can replicate how the view of your surroundings will change based on the movements.
There are also downward-firing exterior cameras on the headset. These will track your hands so that you can interact with visionOS using gestures. There are also LIDAR sensors on the outside that will track the positions of objects around the Vision Pro in real-time.
What’s the science behind the Vision Pro?
We live in a three-dimensional world and we see it in 3D, but did you know that our eyes can only sense things in two dimensions? The depth that we perceive is just something that our brains have learned to do. It takes two slightly different images from each eye and does its own processing to introduce what we perceive as depth.
Presumably, the two displays in the Vision Pro will take advantage of this processing done by our brain by displaying two slightly different images, tricking our brain into thinking that it is seeing a 3D dimensional image. Once you trick the brain, you have tricked the person, and voila, the user is now seeing in 3D.