Eye tracking interface
- 1 Students
- 2 Context
- 3 Adressed Problem
- 4 The approach
- 5 Experiment
- 6 Conclusion
- 7 References
With the development of augmented reality (AR) glasses such as Google Glass or Microsoft Hololens, new needs of interaction appeared because these devices involve a need of mobility and hands-free.
So, to discover why an Eye Tracking interface could be benefic in such new devices, we will make a quick review of the existing interactions between the user and these AR devices (in particular Google Glass and Microsoft Hololens). We will also present some typical use cases associated with these devices.
Google Glass are based on the technology of Optical head-mountain Display (OHMD).
With this device, there are two different kind of interaction possible:
- with a touch pad located on the right branch of the glasses (5 actions possible: slide up/down, left/right and clic)
- with voice commands
But these two interaction technics have several drawbacks and limitations:
with the touchpad, the only way to choose one action among n is to slide to pass from one option to another
- no pointing possible
- can be long if there are many choices
- can be weird to always touch the glasses side in public
using the voice recognition commands is not convenient at all in public
- difficulty for the system to understand the action to perform because of the ambient noise
- can be weird to speak loud to his glasses in public
Microsoft Hololens is an Augmented Reality HMD which allows to integrate holograms in the field of view of the user.
With this device, there are also two different kind of interaction possible:
- with arms/hands gesture commands (1 mini kinect sensor included in the glasses)
- with voice commands
For this second device, there are also several drawbacks and limitations in the interaction:
the technology of gesture command/recognition is not enough advanced to be efficient
- no precise pointing possible
- can be annoying/tiring to interact with harms and hands gestures, especially for long interaction
- can be weird also in public
for the voice recognition it is the same problem than with Google Glass
Typical Use Cases
To describe the basic actions a user can do with AR glasses, we will describe Bob's Saturday afternoon.
Bob is a 25 years old man, he is visiting her friend Julie at Grenoble for the first time and arrives at the train at 2pm. He is wearing his new AR glasses and uses it to see a message from Julie and the meeting point. He now wants to navigate toward the destination and is guided by instructions from the glasses. But he sees he have to walk for 10 minutes so he want to listen music during his travel. On the road, he sees a wonderful place and without stopping his walk, takes a picture of it and posts it on facebook. Once arrived, he calls Julie to tell her he is here.
In this scenario, we can see that Bob uses his AR glasses for 6 main tasks:
- Look at his messages
- Navigate toward a localization
- Listen to music
- Take a picture
- Share the picture on Facebook
Because of all the limitations and drawbacks of the existing interactions with augmented reality glasses, we wanted to bring a new way of interacting with such devices for the user. This new interaction must keep the advantages of AR glasses such as the mobility of the user and the fact that he has his hands free.
And that's why we thought about Eye Tracking as a new way of interaction:
- the user keeps the hands free, he interacts with the eyes
- the mobility is preserved, as far as the eye tracker is included in the glasses
- the interaction is more discret than existing ones and is more adapted to public environments
- we had the intuition that the performances with eye tracking interaction would outperform existing interactions
So the main goal of the project was to study how AR-glasses users can benefit from Eye Tracking in their interaction with the device.
To achieve this objective, we proceeded in that way:
- we simulate an Augmented Reality device with Eye Tracking included
- we implemented the interface based on the characteristics of our tracker and a use case
- we compare our interaction with an existing interaction on the use case
As augmented reality glasses with eye tracking integration are not available yet, we only had devices to simulate them, like the one we will describe in the next part: The Eye Tribe
Simulation of an Augmented Reality device
The Eye Tribe
The Eye Tribe is an eye tracking device for PCs which is placed under the monitor and can tell what the user is looking at on his screen, what are the coordinates of the point in the screen he is looking at.
Before its use, it needs to be calibrated, and the user has to perform a task where it has to follow a circle on the screen with his eyes.
After that, the device is ready. It offers a user interface to modify some options, an application to check if the calibration is correct, and an API allowing to use the data provided by the device in many programming languages.
When the system is calibrated, the eye tracking software calculates the user's eye gaze coordinates with an average accuracy of around 0.5 to 1º of visual angle. Assuming the user sits approximately 60 cm away from the screen/tracker, this accuracy corresponds to an on-screen average error of 0.5 to 1 cm according to the manufacturer.
So we had to take this precision into account for the design of our interface.
It is important to notice that, in addition of the use of eye tracking method to navigate through an augmented reality interface, we need a simple way to differentiate when the user wants to activate an item and when he is simply looking at it without the will of activating it.
We propose that the user should perform a really simple gesture like "touching its thumb with its index" to activate an action. Our system needs in consequence one extra device that we did not have but is easy to simulate (e.g. with a mouse clic or a key press).
We decided to focus on a particular use case that we think is really common and useful: listening to a song contained in a large music library.
In consequence, we implemented an application where the user can navigate by alphabetical order through a list of artists, with the support of eye-tracking.
In order to simulate an augmented reality interface like the one on the Google Glass or Microsoft Hololens, we decided to use the game engine Unity3D, and create a HUD over a basic 3D scene, pretending that the application in full screen will represent the complete field of view of the user wearing augmented reality glasses. The HUD being the augmented reality interface.
With the Unity3D application and the TheEyeTribe eye tracking device, we can totally simulate an augmented reality interface with eye tracking integration.
In our implementation, we show what the user is looking at as a small icon representing an eye.
We also show the currently selected item in red in order to highlight it.
Here a video showing the interface running: Fichier:EyeTrackFinal1.mp4
To experiment and evaluate our eye tracking interface, we needed to have a mean to compare it with an existing interface. But we didn't have access to Google Glass or Hololens. That's why we chose to implement a mockup of the interface of the Google Glass based on the few informations we had on this device. The comparison is based the same task as before.
Google Glass Interface Implementation
With the Google Glass, when you want to listen to a specific song, there are two ways of doing it :
- by saying “OK glass, listen to… “something” where “something” is an Artist, Album, Song or Playlist.
- by using the swap gesture, by selecting the “Listen” action in the home menu, then by searching through the Artists, Albums, Songs or Playlists and crawling through the list finding the song you want.
We decided to compare our implementation on the task against the “swipe only” method of the Google Glass (that is, with no use of voice), because there are many situations where the user would stay quiet and would listen to a specific song (for example walking in a crowded street, sitting in a train, bus or metro, in an elevator with someone else, running…).
So we implemented in Unity3D an application looking like the “Google Play Music” interface with a large but common number of Artists (~130), and we added the support for the “swipe” gesture (simulated with a laptop touch-pad) to navigate as it is done in the Google Glass.
Here a video showing the Google Glass interface: Fichier:EyeTrackFinal2.mp4
He had 6 participants testing our interface. The first presentation was done in group where we described the interfaces, explained the goal of our study. Then for each participant, we followed the following protocol:
- we let them interact "freely" with each interface during 1 minute
- we give them 3 artists to search (in different order for each participant: Babylon, Magenta, Tool)
- for each artists searched, for each interface, we measure the time required by the participant to find the artist from the starting menu
- we also collected their impressions about eye tracking
A video showing some of the participants doing the experiment is available: Fichier:EyeTrackXP.mp4
We compared the efficiency of the two interactions by measuring the time taken to select an artist for example one named “Tool”, from the top menu. We also collected all the impressions and comments of the participants during their interaction and after it.
We averaged the results for each tasks over the participants and present it in the following array:
And in the following graphs:
The first thing we can see is that the average time required to perform the three different tasks with our eye traking interface are really close between them. That means the search of an artist with our interface is almost performed in constant time. We can see that for each tasks, the average time required by the participants to find the artist is lower with our eye tracking interface than with the google glass interface. As we expected, the average time required to find the first artist in the alphabet is quiet similar with both interfaces because it is at the beginning of the sliding list. But instead of increasing linearly, the difference of time is lower to find an artist at the end of the alphabet than to find an artist in the middle. That not linearity can be easily explained by the fact that with the slider, it is easier to slide fastly until the end than slide fastly until the middle of the alphabet.
So, we can say that the performances of the navigation in the interface is a way better with our eye-tracking-based interaction than with the "simulated" Google Glass one. And in that case of selecting an artist it is evident that eye tracking will be benefic in addition with existing methods in AR glasses.
Concerning the remarks from the users, we summarize it and classify the remarks in positive and negative aspects.
First for what the users liked:
- at the first contact with our interface, for all the participants there was a "wow effect", they were impressed by the fact they can control an interface with the eyes
- it was really intuitive
- they liked the fact they could use it when they are riding a bike for example
For the negative aspects:
- the tracker isn't enough precise/stable because when they move a little their head, the location on the screen given by the tracker in not the true location they are looking at
- for one participant, it was tiring for the eyes at the end of the experiments
The first thing we can say about the project is that it was an hard task to design an interface for a system (here Google Glass or Hololens) without having the possibility to have the system and test it. And it is a way more complicated to compare its performances with an existing interface that we didn't have either. So the results we have show that our Eye Tracking interface is better than our simulated Google Glass interface, but we can't say it is really better than the existing one. In addition, one of the key idea of the interaction was the fact that the eye tracker was included in the glasses, so that the problem met by the participant when they moved the head is not a problem any more. Without the glasses we also can't know exactly in which part of the field of view the interface is really displayed, and this information could completly modified the way we designed the interface.
But despite all those complications to take into account, this project shows that it is possible to use eye tracking efficiently to control an interface. And we are quite confident in the fact that the near arrival of Augmented Reality and Virtual Reality glasses in our lives will also develop Eye Tracking control in addition with Gesture and Voice control included in these devices. A good example that justify our intuition is the fact that the company which develop the Eye Tribe has presented its solution for VR headsets at the CES 2016 this month. And this is a quote from their website: "Eye Tracking will become standard functionality in VR Headsets [...] Eye Tracking also provides easier and intuitive control in combination with gestures and voice control."
So the futur work is quite obvious: to implement and test this kind of interfaces directly in an AR headset equiped with an eye tracker, or at least with an eye tracker fixed on glasses and a wall display. That will be the only way to have relevant results concerning the use of eye tracking interfaces in Augmented Reality applications.