Augmented Reality with Google Tablets and Glasses

Augmented Reality with Google Tablets and Glasses

Internship Description

Wearable devices have attracted a lot of attention from the research community. These devices enable access to a user’s day-to-day life. Many of these devices support multi-modal sensors that can record and/or transfer sensory data including video and audio. Coupling this source of data with network connectivity can enable a wide range of augmented reality (AR) applications, which serve to enrich the user’s life and provide more insight in making decisions. For example, a wearable device supported by intelligent computer vision and machine learning methods can automatically infer and relay information about the place and situation to the user directly. This provides the user with more information to make a particular decision, e.g. whether or not to buy a product in a store based on reviews and competitor pricing found online. Moreover, these augmented capabilities could be very beneficial for people with sensory impairments, e.g. a visually impaired person wearing an AR device can be warned (through audio) of immediate obstacles in his/her way or a hearing impaired person can be notified (through words on a display) of someone calling out to him/her.


In this project, we aim to build an AR system based on the Google Glass and a Google Tablet, which will automatically acquire visual and audio data and transfer it to a central processing station for analysis. Information inferred from this data will be transferred back to the Glass, so that it is conveyed to the user in visual or audio form. This is possible because the Glass supports both visual and audio sensors. One possible output of this project is the ability to project on the Glass display automatically-generated results of recognizing (i.e. labeling) and detecting (i.e. localizing) objects in front of the user.




A software module based on the Google Glass SDK to acquire and transfer still images and videos from the Glass to a central processing station and transfer meta-data in the opposite direction.

An API for the central processing station to invoke automatic computer vision and machine learning algorithms on the received images and videos.

A software module based on the Google Glass SDK that conveys to the user the meta-data acquired from the central processing station on the Glass display.

A large-scale dataset of videos and still images captured by a Google Glass during day-to-day activities. The important objects and activities in these videos will be manually labeled and used for training as well as testing the overall system. This dataset will be made publicly available to the research community for future algorithm evaluation and comparison.​

Faculty Name

Bernard Ghanem

Field of Study

Computer, Electrical , Mathematical Sciences , Engineering ​