Swipe for classification

De Ensiwiki
Aller à : navigation, rechercher

Multimodal picture classification



Photographs are in need of tools to classify pictures. Currently, the main solution adopted by professionals is to use a WIMP environment to perform these kind of tasks. Either a designed tool for photographs or simply a file explorer. Of course, the term "classify" can mean a lot of things, and in the case of photographs it can design a wish to find quickly a picture. In this case, photographs use generally tags in order have several factors of classification, they can then find any picture with a query. We will not address this case. Our main concern is to provide an alternative to the more classical approach of classification: a amount of pictures to move in several folders. We know from experience and from discussions with photographers that this part of their workflow is considered tedious. Therefore, our main motivation is to design a new way of sorting pictures.


We focused on the multi-modal interaction. Especially the use of a smartphone, or smartphone-like device as a peripheral of the computer (a remote). From this idea, we set up a frame of work so that the interaction techniques we will design are comparable.


  • The interaction must use a smartphone or a smartphone-like device as a remote.
  • The interaction must use targets in the form of screens that represents the folders in which the pictures will be sorted.
  • The pictures must pop one after the other on the smartphone until all the pictures are sorted.
  • When sorting a picture, it should disappear from the smartphone and appear on the selected screen.

Inreactions techniques

We designed 4 way of sorting pictures using this framework.


Fixed Swipe

The picture is send on a screen by orienting the smartphone toward the screen, and swiping in a specified direction. We use the sensors on the phone to know the orientation and decide on which screen the picture should be sent.

Terrier alam 2017 06.png

Oriented Swipe

The picture is send on a screen by swiping in the direction of the selected screen. This means the direction of the swipe and the orientation of the device are combined.

Terrier alam 2017 07.png

Proxemic selection

Each screen has a sensitive area around it. The user must physically go inside this area to be able to send a picture in a screen.

Terrier alam 2017 05.png


The smartphone displays a button for each screen. The user tap on a button to send the current picture to a screen. This interaction serves as ground interaction and is to be related to classical interaction method, like WIMP.


Then we compare our study for the same task with the other alternative approaches like fixed swipe, oriented swipe to see if they are giving more outperforming result or not. Due to time constrain, we are not considering proxemics interaction in our study though it could be an alternative technique for picture classification task.

In general, we are trying to build a comparative analysis of different interactive technique in our context and reach in a solution which can be used for further improvement of picture classification task by the users and be more usable and receptive as per user needs.


For time reasons, we don't evaluate the Proxemic selection.

In order to properly compare theses interactions, we designed an experiment and come up with the below hypothesis:

  • H1: Physical interaction is at least as good as WIMP interaction in term of performance for this particular task.
  • H2: Users prefer using physical interaction rather than WIMP interaction for this particular task.
  • H3: Users do less errors while performing this task with physical interaction than WIMP interaction.
  • H4: The skill level in computers of the user doesn’t impact his performance while performing this task using physical interaction.


To conclude on each hypothesis we define metrics on which we will gather data:

  • Performance: time needed to do the task.
  • Accuracy: numbers of errors made by the user.
  • Skill level: demographic information.
  • Satisfaction: SUS questionary after experiment.


Because we want to test the interaction and to reduce noise, the users don't know what is the purpose of the task they are doing, moreover the task is greatly abstracted. Each participant tries the three interaction techniques in a random order. They do it two times, once for training and once for the actual test. Because we want as less as possible of progression in our records of the experiment, the participants train on the three interactions before beginning the actual test.

A test is composed of twenty actions (twenty picture to sort). The prototype we've developed doesn't use pictures but colors. The task is to "send" the current color displayed on the smartphone screen to the corresponding target. This was done in order to avoid having biased data because of the pictures. With pictures, they could have been confusion or a learning curve to remember the pictures. Here we just have three targets and three colors.

After this part, the users are explained the context of picture sorting, and then they are asked to answer an SUS questionnaire on each of the three interaction techniques they used.

We initially intended to have at least 6 participants, but we only could have 4 participants: 2 novices and 2 experts.



In the following charts, we plot the average time and the standard deviation of the measured time for each interaction technique of the experiment. We used Standard Deviation as one of quantitative evaluation because it provides an indication of how far the individual responses to a question "deviate" from the mean. Basically, it tells that how spread out the responses are? Are they concentrated around the mean, or scattered far wide? We also measured the number of error occurred during classification task with each individual technique identify the accuracy compared to the performance by the user. An analysis of those graph also provided below.

Terrier-alam-hci-2017-01.jpg Terrier-alam-hci-2017-02.jpg

Terrier-alam-hci-2017-03.jpg Terrier-alam-hci-2017-04.jpg

The first graph Merged speed comparison shows the overall speed of all the participants. The time value is in milliseconds and is the time between two actions (two swipes or two taps). The second graph Speed comparison of novices and experts is similar to the first but novices and experts are separated. The third graph Number of errors shows the total number of errors of all participants on each interaction. The fourth is similar but the novices and experts are separated.


H1: Physical interaction is at least as good as WIMP interaction in term of performance for this particular task

In term of speed, it seems the three interactions are in the same range: around 1 second per action. But in term of accuracy it's clear that the Buttons interaction is better as far less errors were committed with it. The hypothesis mention that the physical interaction is at least as good as WIMP in term of performance. So, it seems the hypothesis is valid, but we prefer to stay cautious. We think, maybe the experiment should be rerun with a way of making the participant slow down if they are making too much mistakes.

For now, we cannot conclude.

H3: Users do less errors while performing this task with physical interaction than WIMP interaction

The experiment showed that the participants made more errors with the physical interaction than with the buttons. After watching the participants perform the tasks it seems obvious. Where a button is the embodiment of a digital value, a swipe gesture and an orientation are, by nature, analogical and therefore subject to imprecision and noise.

H3 is false.

H4: The skill level in computers of the user doesn’t impact his performance while performing this task using physical interaction

Here, we have the same problem as with H1. If we just consider the speed, novices and experts had similar results. But when considering the accuracy, it's obvious there is a difference. The same answer is needed, we must rerun the experiment with a way of slowing down participants when they are doing to much errors.

For now, we cannot conclude.


We used System Usability Score (SUS) questionnaire to collect user data to analyze the usability score of three different technique we studied. Just after the questionnaire, we also asked the users to provide their feedback for each individual technique they assessed to make sure which interaction (buttons/fixed swipe/oriented swipe) seems to be most favorable for the user. It is worth mentioning that, based on research, a SUS score above a 68 would be considered above average and anything below 68 is below average.

Below is the graphical representation of the usability scores of three different technique:



H2: Users prefer using physical interaction rather than WIMP interaction for this particular task

From the figure above, we see that all the three technique exceeds the average score. However, there are still significant differences among the techniques in terms of usability. Our experiment says, most of the user prefer Buttons compared to Fixed and Oriented Swipe. After Buttons, people like Fixed Swipe and finally Oriented Swipe. These results are surprisingly close to the number of errors done by the participants, so this aspect of the experiment may have drove their feeling about the interactions.

H2 is false.

General discussion

This work had somewhat expected results. The fact that buttons are safer than complex gesture is not new. Still there is two unexpected outcomes:

Firstly, all interactions have a SUS score above 68, meaning the interactions were globally not disliked by the participants.

Secondly, one of the novices actually used the Fixed Swipe interaction when performing the test of Oriented Swipe. Of course, the Oriented Swipe can be used to reproduce the Fixed Swipe interaction so we let the participant do it. When asked about this behavior, the participant said that it felt "more natural". This answer made us realized something we didn't see when designing the Fixed Swipe interaction. This interaction is actually a two-handed interaction as seen in research. The left hand select the global direction by rotating the phone, and the right hand do the precise gesture, the swipe to confirm the action.

Although, the use of theses physical interactions techniques to sort pictures seems to be liked, after all, not such a good idea, the link with two-handed interaction is interesting and may be worth mentioning for further investigation.