Empty Gesture

Ever since Minority Report brought gesture-based interfaces into the public eye, there are been periodic demonstrations of their evolution in the real world. Here’s where MIT’s John Underkoffler, one of the consultants who were used by the producers of Minority Report, has got to with his g-speak “spatial operating interface” (SOE):

As with most of the demonstrations of gesture-based and multi-touch interfaces, they are high on wow factor but rather low on suggestions for how such a UI would be useful. That’s not necessarily a problem of course – research is research. But it’s notable that whenever such interfaces are displayed, there are a large number of people who seem convinced of their utility.

The primary advantage of gestures is that they are said to be “intuitive.” That word is enough to make anyone who has been involved in software design suspicious from the start. Look at the video and you will see that there area number of hand, arm and finger gestures being used to manipulate screen objects. The word “intuitive” implies that everyone would know what (for example) a pointed finger meant, or two hands facing palm outwards, or a flat hand raised over a distance. Instead, I would guess that these would take some considerable time to master. At least, if that were not the case, we would have videos of people using the systems for the first time and telling us all how easy it is. Of course, the same lack if intuition is true of the use of mice and other pointing devices such as trackerballs (I have seen somebody attempting to use a mouse upside down, because mice have tails coming out behind them). With gestures, however, the word “intuitive” is at best disingenuous.

Another issue that relates to gesture-based UI is the amount of physical effort involved. Right now, if you stand up from your desk and start miming away in the manner of the people in the video above, you will start to tire in about 15 minutes if you mix large movements with smaller, more detailed ones (such as needed for typing for example). You will probably not be able to raise your arms above shoulder height at all after about 30 minutes, and you will need to sit down after an hour or so.

Now, that could be simply because you are unfit. Standing, certainly, is better for your back. Perhaps we will develop musculature in our arms to enable us to sustain long periods gesturing in mid air, and perhaps voice control will also feature (although that is a mature technology unused for largely obvious reasons in the workplace at least). But again, the “on ramp” to this is not trivial.

In general though, I think the main problem with such interfaces is not that they are hard to use, but that they are so heavily dependent on the quality of the interactions afforded. For example, one good use of the technology could be collaborative (on what I don’t know – virtual Lego?). The success of that example would depend extremely heavily on how well the system responded to subtle variations in the users’ “standard” gestures. Making mistakes (which are liable to be large and random) on your own is one thing, but in the company of others is quite another: apologies, “help” that gets in the way, misinterpretations of your intentions, etc. could be rife. Either the system would have to be extremely tolerant of human inconsistency, or the users’ highly trained. I get the feeling it would have to be the latter.

Mind, I have never actually tried a proper gesture-based interface so I am perhaps being unfair. My impression of them is that they offer very little in terms of progress though. I don’t doubt they will acquire a niche though in the same way as other impressive but hard to learn technologies have.