Gesture control for presentations

De Ensiwiki
Révision de 15 janvier 2020 à 09:09 par Chrzascj (discussion | contributions)

(diff) ← Version précédente | Voir la version courante (diff) | Version suivante → (diff)
Aller à : navigation, rechercher

Gesture control for presentations


University professors use presentations every day, to teach their subjects. In earlier times the blackboard and the chalk were used, which led to dirty hands and a fixed position for the professor, close to the board. Then, whiteboards got rid of the first problem, but not of the limited positioning in the room. Furthermore, beamer and PowerPoint presentations are very comfortable to present the content of the lessons. But with using the PowerPoint, the teaching person has to stay close to the keyboard of the computer, to be able to sliding pages or pointing at something in the slides. For this study, a gesture controlled presentation is being implemented and tested.


The aim of this study is it, to create a system that interacts with humans easily. This means, it should free the presenter in the space, to stand next to the computer or keyboard, when presenting slides and save some time. By using motion tracking sliding pages and pointing specific things in the presentation should be possible.  We will test our system by counting the errors, the system does, and by asking the participants afterwards, which system (common or our new one) they would prefer in a daily usage.

To clarify our setup, our research question was: “Is gesture controlled presentation freeing the presenter in his position and does it save time?”

Independent variables where the time and the errors. Measured for every pointing and sliding. The dependent variable was the tool, either keyboard usage or gesture usage.

The minimum objective is it, to detect the movement of sliding a page by the system.

The targeted users are professors in universities, students and everybody who needs presentations in their daily life.

State of the Art

Academic Studies

In 1993 Baudel already said that there are three key concepts, with which we can easily use gesture to control objects in the real world. These concepts can also be used for our study. First, to define a activation zone, second to recognize dynamic gestures and third, to use hand tension to structure the interaction for an explicit user intention. A problem Baudel (1993) mentions is, that gestures through video registering is not easy in the real world, when multiple persons or hands are in the environment. As it is a laboratory study, the problem is irrelevant, but it already refers to the problem we had by testing with the Optitrack (will be discussed in Discussion part). Charade (1993) gives very important information in his study, where he uses data gloves to perform a presentation with a computer. To detect the movement, the study uses wrist orientation, thumb bendings and index bendings. Furthermore the study divides the errors in two cases. User and System errors. We will only focus system errors, because users are already used to keyboard usage for presentations.

Industrial Standards

It is common to use the keyboard to slide the pages, as well as using the Powerpointer which is a remote with a laser pointer, which is connected by USB stick to the computer. Holding this remote is not that comfortable which leads to the result, that the remote is often forgotten somewhere in the room. As a consequence, the presenter searches the remote, to be able to continue this presentation, what interrupts the flow for both, listener and presenter.



At first, the idea was, to use a data glove, because it would track the exact movements of the hands and fingers. The problem with the glove was, that it is connected to the computer by a cable, which would not lead to a higher freedom of the movement possibility during presentations. Hence, the next idea was to use the Motion Tracking by Optitrack.

We were using the Optitrack Motion Tracker, which is a tracker based on reflections. Up to 6 cameras can be positioned randomly in the room, that will give a good signal, when at least one camera can receive the reflection of the trackers. The Trackers are positioned at the right hand of the participant, on the wrist and the second finger. The trackers are positioned as rigid bodies, that means, that three markers create a first rigid body for the wrist, and three additional trackers create a rigid body for the finger. Thereby it is to say, that the trackers are positioned a little bit randomly (not in line) that the likelihood for a receiving reflectance signal is higher. Everything can be calibrated and optimized with Motive, a software that belongs to the Optitrack cameras. It is used to see how the data is collected, and it makes it possible to transfer the data into other software for data collection.

To convert the data we used unity a development platform, that is being used in our case, to transfer the movement data into detected gestures.


The markers are positioned at the right hand, as shown in picture 1. The positions are randomly chosen, but in different directions, that the tracking is possible for several positions in the room. The trackers were positioned in the biggest possible distance, because it gives more data to the Motive, about the actual position of the created rigid bodies.

The unity code is specified for our case, to that it detects movements in the x-axes to “forward”and “backwards”. These two detections of the movements where given as an output in the exact time, the movement was made. We used this output from unity, to change or not change the slides by hand. This is a bug of missing connection between the Unity and PowerPoint, which we knew, but we could pretend the sliding by manually clicking the keyboard from a second person.

The pointing interaction had to be activated. This happened by making a straight gesture with the right hand in the y-axes (vertical) and also been deactivated by the same movement in inverse direction. If the pointing was activated as a movement in the unity, a dot could be seen on the screen and also been moved with moving the hand.


We asked 13 participants to perform a fake presentation for this study. To have a basis data, we forced the participants to do this presentation twice. At first with a normal keyboard button pressing for sliding a page and mouse using for pointing. Thereby we measured the time, from the moment they started walking to the keyboard, sliding the page and going back to the initial presenting position. This area was about half to 1m away from the keyboard and every participant could move during their fake presentation, as he wanted. After that, the trackers where positioned at the right wrist and index finger, and a 30 seconds free talk about panda bears was been done. The wrong activation of the page sliding was measured during this time. With wrong we mean unintentional page sliding, which appeared during gestures by talking. This was one error focused trial data block. Then, a third block was started with the trackers at the right hand. Now the instructions were similar to the first block, but we explained the movement for sliding and pointing once. Again the errors for wrongly sliding a page and also the errors for gesture is not being recognized by the system, were measured. Independent variables where the time and the errors. Measured for every pointing and sliding. The dependent variable was the tool, either keyboard usage or gesture usage. As we focus on the intentional using, our interest is specific to the first usage. We decided that three repetitions of each movement is enough, to see if it is intentional or not. By showing them one time the movements, everybody understood the instructions and started the experiment.

At total we had 9 exercises, with which we added auditiorial instructions be the instructors (us). Those instructions were

  • “please slide a page forward” (3x)
  • “please slide a page back” (3x)
  • “please point at the circle/rectangle/triangle (either with the mouse or by activation of the gesture controlled pointing)”

After the presentation, we added a questionnaire for participants based on System Usability Scale (SUS), in which we wanted them to answer following questions about the usability of both systems:

  1. I think that I would like to use this system frequently.
  2. I found the system unnecessarily complex.
  3. I thought the system was easy to use.
  4. I think that I would need the support of a technical person to be able to use this system.
  5. I found the various functions in this system were well integrated.
  6. I thought there was too much inconsistency in this system.
  7. I would imagine that most people would learn to use this system very quickly.
  8. I found the system very cumbersome to use.
  9. I felt very confident using the system.
  10. I needed to learn a lot of things before I could get going with this system.



As a first result we can say, that the movement detection worked with unity and motive. This means our minimum goal is successfully done.


We could not count any errors of the keyboard usage, so the error rate is compared to 0 for keyboard usage. This might be a result of the experiment and the clear instructions and that the participants where focused on the exercise, which is not the case in real life. The data was collected by a within-subject design. T-tests for dependent samples were done for the data set. These are the following results.

We compared if there is a difference in errors that are done, whether the sliding is forward or backwards. The mean error for sliding forward (M=2.00) is not significant different (p=0.89) to the mean error of sliding a page backwards (M=1.85). The effect size of d=-0.48 leads to the interpretation that there is no effect. We have two different errors, one is, when movements are not tracked at all, that the person needs to do the movement again. And the second, when the system tracks “back” instead of “forward” or detects something, but should not detect at all.


Additional we have the tracked gestures during a talk, where the participants used their hands without any intention of a page sliding. This are the errors that could give a big problem in a real testing, but for this prototype, we measured it separately, because we did not have a connection between the unity and a PowerPoint presentation. The problem for this data is, that they would lead to a very high error rate during the talk, and on the other hand it would lead to a stop of free body usage, because the participants would stop moving this hand, that the pages don’t slide anymore.



The results that come from the time measurement are neither significant. For the difference in usage of keyboard (M=5.06, SD= 1.74) and gesture control (M=4.16, SD=2.27) for page sliding, there is a small effect of d=-0.44, but no significant difference (p=0.27).

The difference in usage of keyboard (M=7.51, SD= 2.50) and gesture control (M=6.25, SD=4.01) for pointing at a page, there is a small effect of d=-0.38, but no significant difference (p=0.17).


At least we looked at the time difference for forward (M=4.76, SD=4.45) and backward (M=3.55, SD=1.83) sliding. Which a t-testing of p=0.37 this is not a significant result, but with a small effect of d=-0.36.

Questionnaire for the usability

Due to the questionnaire we have the following results for subjective opinions:

  • Intuitivness
  • System
  • Freeing of the movement

With the Wilcoxon test for ordinal scaled data with dependent variables, we tested if there is a significant difference in the answers of the questionnaire. As we assume from the diagram and from the Wilcoxon test, there is no significant difference in the usage of keyboard or gesture. With the knowledge, that everybody is used to the keyboard for years, this is a positive result, that gesture control is not worse, but equals it.


The results let us interpret two things: At first, the system is working, but still there are many errors that need to be solved. The challenge is, to get a system, that does not detect normal movements as sliding page movements, but still to be very precise, that a sliding movement which is done on purpose, is being tracked at the first time. The time saving is enormous and even if the results did not get significant now, for a prototype testing a mean difference of one second is good and should be a good motivation to work on a precise prototype. At second, the users think that it is intuitive, which is also an important result.

In general, the system might be better than going to the keyboard every time, but the tracking of the cameras needs to be precise, and yet there are many problems with other reflections in the room, that come from lights, windows and computers. For a daily use, the motion needs to be tracked by a system, that works in a similar way as the Optitrack, but that is not sensitive for other reflections.

To visualize the time again, we have a diagram which shows for each participant, how many seconds were needed to complete the different exercises of sliding a page. Same diagram is done for the pointing task. The interpretation of the two diagrams let us assume, that the movements where intentional and uncomplicated, but the system has some problems with the detection sometimes.


As we should refer to reality, the prototype is not very good. This comes from the fact, that with N=13 participants the results did not get significant, even if there was a visible difference in the mean times of the usages. But also we need to keep in mind, that in a free talk, the pages would slide very often, when the users speak freely.



With more time, we would have added a pointing, that is being activated by some squeezing of the fingers, instead of moving the hole arm up the y-axes, what was annoying for some users. Furthermore we would have initialized a presentation, that does work as a real presentation. Also another idea would have been to find another movement for page sliding, that there are less errors due to the normal movements during gestues by creating option of customization of gestures or by parameterization based on analysis of gestures of multiple users.


  1. Murakami K., & Taguchi, H. (1991). Gesture recognition using recurrent neural networks. Conference on Human Factors in Computing Systems - Proceedings, 237–242.
  2. Baudel T., & Beaudouin-Lafon M. (1993). Charade: Remote control of objects using free-hand gestures. Communications of the ACM, 36(7), 28–35.
  3. Kim J. S., Jang W. & Bien Z. (1996). A dynamic gesture recognition system for the Korean Sign Language (KSL). IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 26(2), 354–359.
  4. Bhuiyan M., & Picking R. (2009). Gesture-controlled user interfaces, what have we done and what’ s next ? Proceedings of the Fifth Collaborative Research Symposium on Security, E-Learning, Internet and Networking (SEIN 2009), Darmstadt, Germany, 25–29.
  5. Kumar P., Verma J. & Prasad S. (2012). Hand Data Glove: A Wearable Real-Time Device for Human-Computer Interaction. International Journal of Advanced Science and Technology, 43, 15–26. Retrieved from