Augmented vs Virtual Reality: Contrasting Technologies and Tools

While both augmented reality (AR) and virtual reality (VR) may use computing power, they use very different applications to achieve their ends.

Ok. Time to get the boring stuff out of the way…

First, a quick note on the distinction between AR and VR.

VR is the complete replacement of the real world with an artificial world.  At this time, VR usually replaces only visual and auditory input, but VR fans and businesses are increasingly incorporating artificial stimuli for other senses, especially balance (motion detection), touch, and smell, to make ever more complete virtual realities.

AR, on the other hand, is the artificial, seamless, and dynamic integration of new content into, or removal of existing content from, perceptions of the real world.  AR is most commonly seen in sports broadcasts, such as the yellow first down line in American football games, national flags in the lanes of swimmers and runners in the Olympics, and advertisements on the wall behind home plate in baseball games.  These augmentations of reality are so convincing that most TV viewers do not realize that they are artificially inserted.

Now for a small surprise…

Considering these two descriptions, one might assume that VR is much harder to implement than AR.  After all, VR has to replace all that visual and auditory input.  In fact, however, AR requires much more sophisticated programming and consumes more computing horsepower.

It turns out that recognizing and tracking a range of objects in the real world requires powerful, very fast computing.  The vast majority of computing power in augmented reality is consumed in identifying and tracking reality.  The insertion and removal of content, on the other hand, is relatively simple (other than occlusion, which is hard and discussed later).

Seeing something is not the same as recognizing it…

Human vision can identify all kinds of complex objects, distinguishing objects with only small apparent differences.  It’s useful, for example, to be able to tell your spouse from everyone else of the same gender with the same color hair, skin, and eyes (and let’s face it, there are a lot of people out there who are the same in those respects).

Computer vision still has a hard time even recognizing a large number of random, individual objects.  Specialized software exists for facial recognition and is used routinely by Facebook, iPhoto, and other applications, but to work reliably the camera angles and lighting have to be consistent.  Artificial intelligence (AI) that can generalize facial recognition skills into recognizing everything from sofas to automobiles and distinguishing between different kinds of sofas and automobiles has not yet made it into the public domain.

For now, AR must, therefore, settle for recognizing a few, clearly defined and pre-programmed objects: lines on a football field, or the blank green box behind home plate in a baseball stadium, for example.

As a further complication, with augmented reality, the developer has to allow for the nearly infinite array of variables that the “real world” could throw at his program, from shifting light, to varying camera angles, to falling rain, to a van partially obscuring a sign or the front of a building (making it unrecognizable to the AR application).

This is the point at which virtual reality programming becomes much easier than augmented reality.  In virtual reality, everything is known so all changes are understood.  If a van partially obscures the front of a building in a virtual world, the program knows about it because the van and the front of the building are both part of the virtual world created by the program.  If it starts to rain, or it snows, or a thick blanket of fog changes objects from clearly defined images into ghostly shadows, the program still knows what they are and how they will behave because the weather and the objects are part of the program.

Even the apparently random movements of a player controlled object in a virtual world are completely known to the program because, although the commands that result in the random movement come from an external source (the human), the program makes the changes in the player controlled object in the virtual world and thus knows how the object is changing or moving in the virtual world.

Believe it or not, it gets harder…

One of the big problems in augmented reality is determining whether a real world object is in front of or behind an augmented reality object.  For example, let’s assume you are watching a digital zombie walking down a sidewalk in a crowded city using a smartphone to do the appropriate digital insertion of the zombie.  Some of the (real) pedestrians will be behind the zombie and some will be in front of it.  Of course, the zombie must obscure the pedestrians behind it and the pedestrians in front of the zombie must obscure the zombie.  This is known as occlusion: objects in front hiding (or partially hiding) objects in back.

Right now, smartphones are far too dumb to handle occlusion, and, if the AR application chooses the wrong object to put in front, the results are visually very disturbing.  Therefore, AR apps on smartphones are learning the tracking piece, but have yet to make a serious attempt at occlusion, which means all of today’s AR digital insertions on smartphones and tablets float on top of the real world instead of integrating into it.

In the case of virtual reality, again, the program knows where each object is as well as the viewing angle, so, while the graphic rendering involved in having one object disappear behind another object and then reappear on the other side may not be easy, the harder process of identifying which object is in front of which is already solved.

Another big problem for AR is camera movement.  Let’s again imagine that digital zombie walking down a crowded city street.  Assume we have solved the problem of making people behind him disappear behind him and people passing in front of him causing him to disappear behind them.  The zombie is staggering down the sidewalk.  Each step, his foot lands on the pavement and plants itself relatively firmly in place while he lurches forward with the other foot.

Now start to move the camera with him.  Remember, he’s being inserted into the image seen through a camera.  Normally, the zombie will move with the camera, which means it looks like his feet are sliding along the pavement.  Or floating above the pavement.

So we have a new problem.  We have to be able to track changes in the camera view.  Cameras typically pan, tilt, and zoom.  They can also roll sideways, rise higher, or drop lower.  As the camera moves, the software that is inserting the digital zombie must know how to lock that zombie’s foot onto the ground so he moves with the “real world” environment, not with the camera motion.  This means precisely tracking the changes in the viewing angle, direction, and distance of the real world environment in which the digital zombie is moving.

In addition, the AR application has to change the angle and lighting for the digital zombie, making the zombie smaller or bigger as the camera either moves closer (zooms in) or moves away (zooms out) and shifting from a front view to a back view as the camera moves from in front of the zombie to behind the zombie, all of which is known and understood principally from analyzing the surrounding real world environment, a process that, at this juncture, even very smart computers find hard to do and smartphones find incomprehensible.

And then there’s lighting: Imagine an airplane or a cloud passing over the zombie and the city street.  The sidewalk, pedestrians, and litter blowing in the breeze are all briefly in shadow.  What happens to the digitally inserted zombie?  It would look very strange indeed if the AR app didn’t appropriately change the lighting on the zombie.

Keep in mind that the human eye/brain combination is trained and practiced at doing this routinely without effort.

Back to virtual worlds for a moment: If the zombie were in a virtual world, the software simply renders both the changes in the zombie viewing angle and the changes in the surrounding environment at the same time in the same way.

An interesting side note – one advantage that computers and AR have over the real world and vision: While it will take computers (especially mobile devices) decades to match the processing power of your average dog’s visual cortex, when it comes to augmenting reality, computers have access to information that dogs, let alone humans, will probably never access directly: gps data.

Here’s a thought experiment: Blindfold a human being, load him into a car for an hour long drive down winding roads, followed by an airplane ride of another hour, and then another car ride for a few hours.  Now take off the blindfold.  Unless the person has previously visited his new location and has a visual memory of it, he will have no idea where he is.

Now rewind back to the beginning of the experiment.  Put an iPhone in the pocket of that same blindfolded person and take him through the same confusing route.  Odds are that within a few seconds of taking the iPhone out of his pocket and turning it on, the iPhone will know with an accuracy level of a few feet where in the world the traveler has arrived and can display it to him on a built in mapping service.  It can even tell which way he is pointing the device.

AR applications often take advantage of gps systems to identify where they are and what buildings, streets, and businesses are nearby.  However, gps systems are not yet precise enough to plant a digital zombie’s feet on a real world sidewalk and zoom, pan, tilt, or roll past him as he staggers and lurches towards his next victim.

Even with the help of real world gps systems, AR is obviously harder computationally than VR.  True AR properly done has to recognize objects, appropriately occlude foreground and background objects, and keep digitally inserted objects tied to changes in the environment regardless of the camera angle or movement.