Empty Gesture
Ever since Minority Report brought gesture-based interfaces into the public eye, there are been periodic demonstrations of their evolution in the real world. Here’s where MIT’s John Underkoffler, one of the consultants who were used by the producers of Minority Report, has got to with his g-speak “spatial operating interface” (SOE):
As with most of the demonstrations of gesture-based and multi-touch interfaces, they are high on wow factor but rather low on suggestions for how such a UI would be useful. That’s not necessarily a problem of course – research is research. But it’s notable that whenever such interfaces are displayed, there are a large number of people who seem convinced of their utility.
The primary advantage of gestures is that they are said to be “intuitive.” That word is enough to make anyone who has been involved in software design suspicious from the start. Look at the video and you will see that there area number of hand, arm and finger gestures being used to manipulate screen objects. The word “intuitive” implies that everyone would know what (for example) a pointed finger meant, or two hands facing palm outwards, or a flat hand raised over a distance. Instead, I would guess that these would take some considerable time to master. At least, if that were not the case, we would have videos of people using the systems for the first time and telling us all how easy it is. Of course, the same lack if intuition is true of the use of mice and other pointing devices such as trackerballs (I have seen somebody attempting to use a mouse upside down, because mice have tails coming out behind them). With gestures, however, the word “intuitive” is at best disingenuous.
Another issue that relates to gesture-based UI is the amount of physical effort involved. Right now, if you stand up from your desk and start miming away in the manner of the people in the video above, you will start to tire in about 15 minutes if you mix large movements with smaller, more detailed ones (such as needed for typing for example). You will probably not be able to raise your arms above shoulder height at all after about 30 minutes, and you will need to sit down after an hour or so.
Now, that could be simply because you are unfit. Standing, certainly, is better for your back. Perhaps we will develop musculature in our arms to enable us to sustain long periods gesturing in mid air, and perhaps voice control will also feature (although that is a mature technology unused for largely obvious reasons in the workplace at least). But again, the “on ramp” to this is not trivial.
In general though, I think the main problem with such interfaces is not that they are hard to use, but that they are so heavily dependent on the quality of the interactions afforded. For example, one good use of the technology could be collaborative (on what I don’t know – virtual Lego?). The success of that example would depend extremely heavily on how well the system responded to subtle variations in the users’ “standard” gestures. Making mistakes (which are liable to be large and random) on your own is one thing, but in the company of others is quite another: apologies, “help” that gets in the way, misinterpretations of your intentions, etc. could be rife. Either the system would have to be extremely tolerant of human inconsistency, or the users’ highly trained. I get the feeling it would have to be the latter.
Mind, I have never actually tried a proper gesture-based interface so I am perhaps being unfair. My impression of them is that they offer very little in terms of progress though. I don’t doubt they will acquire a niche though in the same way as other impressive but hard to learn technologies have.
I think that is one of the most disingenuous ‘demonstrations’ I have ever seen. For instance, at 1:37 they make the bold assertion that “Direct manipulation provides intuitive, high-bandwidth access to information”. Even disregarding the sheer meaninglessness of that statement, the following example shows a man zooming around a large field of boxes with Chinese characters (?) in them. That’s not information, high-bandwidth or otherwise, and he’s not ‘accessing’ it. I have no idea what he’s doing, but it isn’t intuitive.
Without straying too far into the what an ‘intuitive interface’ really is, to me ‘intuitivity’ is closely related to how familiar the interactions are to what one is used to. I have used applications on a Surface, and they can be extremely intuitive – because the actions are familiar.
Abstract gestures, however, are far from familiar or intuitive. In fact, many years ago I wrote a gesture-based UI for an application I was building precisely *because* I wanted the UI to be counter-intuitive. The reasons for that are rather too complicated to explain in this comment.
Been trying to think of scenarios when gestures would in fact be better. The only one from my own experience I can think of so far is watching TV. You are sitting largely motionless most of the time it would easy to come up with a small gesture vocabulary that would not conflict with anything. It would also mean you would never lose the remote.
Mind you, the above demo shows them wearing gloves, so that’s a bit of a damper for TV I would think. It also (curiously) does not show them typing. TV watching *might* be a candidate for voice control there though.