Friday, November 12, 2010

Augmented Reality Technology

The main hardware components for augmented reality are: display, tracking, input devices, and computer. Combination of powerful CPU, camera, accelerometers, GPS and solid state compass are often present in modern smartphones, which make them prospective platforms for augmented reality.

There are three major display techniques for Augmented Reality:
Head Mounted Displays
Handheld Displays
Spatial Displays

Head Mounted Displays
A Head Mounted Display (HMD) places images of both the physical world and registered virtual graphical objects over the user's view of the world. The HMD's are either optical see-through or video see-through in nature. An optical see-through display employs half-silver mirror technology to allow views of physical world to pass through the lens and graphical overlay information to be reflected into the user's eyes. The HMD must be tracked with a six degree of freedom sensor.This tracking allows for the computing system to register the virtual information to the physical world. The main advantage of HMD AR is the immersive experience for the user. The graphical information is slaved to the view of the user. The most common products employed are as follows: MicroVision Nomad, Sony Glasstron, Vuzix AR and I/O Displays.

Handheld Displays
Handheld Augment Reality employs a small computing device with a display that fits in a user's hand. All handheld AR solutions to date have employed video see-through techniques to overlay the graphical information to the physical world. Initially handheld AR employed sensors such as digital compasses and GPS units for its six degree of freedom tracking sensors. This moved onto the use of fiducial marker systems such as the ARToolKit for tracking. Today vision systems such as SLAM or PTAM are being employed for tracking. Handheld display AR promises to be the first commercial success for AR technologies. The two main advantages of handheld AR is the portable nature of handheld devices and ubiquitous nature of camera phones.

Spatial Displays
Instead of the user wearing or carrying the display such as with head mounted displays or handheld devices; Spatial Augmented Reality (SAR) makes use of digital projectors to display graphical information onto physical objects. The key difference in SAR is that the display is separated from the users of the system. Because the displays are not associated with each user, SAR scales naturally up to groups of users, thus allowing for collocated collaboration between users. SAR has several advantages over traditional head mounted displays and handheld devices. The user is not required to carry equipment or wear the display over their eyes. This makes spatial AR a good candidate for collaborative work, as the users can see each other’s faces. A system can be used by multiple people at the same time without each having to wear a head mounted display. Spatial AR does not suffer from the limited display resolution of current head mounted displays and portable devices. A projector based display system can simply incorporate more projectors to expand the display area. Where portable devices have a small window into the world for drawing, a SAR system can display on any number of surfaces of an indoor setting at once. The tangible nature of SAR makes this an ideal technology to support design, as SAR supports both a graphical visualisation and passive haptic sensation for the end users. People are able to touch physical objects, and it is this process that provides the passive haptic sensation.

Modern mobile augmented reality systems use one or more of the following tracking technologies: digital cameras and/or other optical sensors, accelerometers, GPS, gyroscopes, solid state compasses, RFID, wireless sensors. Each of these technologies have different levels of accuracy and precision. Most important is the tracking of the pose and position of the user's head for the augmentation of the user's view. The user's hand(s) can be tracked or a handheld input device could be tracked to provide a 6DOF interaction technique. Stationary systems can employ 6DOF track systems such as Polhemus, ViCON, A.R.T, or Ascension.

Input devices
This is a current open research question. Some systems, such as the Tinmith system, employ pinch glove techniques. Another common technique is a wand with a button on it. In case of a smartphone, the phone itself could be used as 3D pointing device, with 3D position of the phone restored from the camera images.

Camera based systems require powerful CPU and considerable amount of RAM for processing camera images. Wearable computing systems employ a laptop in a backpack configuration. For stationary systems a traditional workstation with a powerful graphics card. Sound processing hardware could be included in augmented reality systems.

For consistent merging real-world images from camera and virtual 3D images, virtual images should be attached to real-world locations in visually realistic way. That means a real world coordinate system, independent from the camera, should be restored from camera images. That process is called Image registration and is part of Azuma's definition of Augmented Reality.
Augmented reality image registration uses different methods of computer vision, mostly related to video tracking. Many computer vision methods of augmented reality are inherited from similar visual odometry methods.
Usually those methods consist of two parts. First interest points, or fiduciary markers, or optical flow detected in the camera images. First stage can use Feature detection methods like Corner detection, Blob detection, Edge detection or thresholding and/or other image processing methods.
In the second stage, a real world coordinate system is restored from the data obtained in the first stage. Some methods assume objects with known 3D geometry(or fiduciary markers) present in the scene and make use of those data. In some of those cases all of the scene 3D structure should be precalculated beforehand. If not all of the scene is known beforehand SLAM technique could be used for mapping fiduciary markers/3D models relative positions. If no assumption about 3D geometry of the scene made structure from motion methods are used. Methods used in the second stage include projective(epipolar) geometry, bundle adjustment, rotation representation with exponential map, kalman and particle filters.