Wednesday, 4 April 2018

Multi-Hand Gesture Recognition Algorithm

Gesture recognition is a vital tool for virtual reality.

Sure, we can get by with simple touch-to-use functionality, but having fast and accurate hand gesture recognition opens a huge number of potential directions for user interface development.

By contextualizing the gestures of a single hand against either its own previous gestures, or the concurrent gestures of another hand, we open up even more options. Most of the interfaces I have developed personally rely upon contextual gestures to expose the full extent of functionality.

But how do we actually recognize and distinguish hand gestures, and how do we do it in a timely fashion? Well, aside from training an AI to do it for us (more on that in a future post), there is another way to create fairly robust and fast gesture recognition, which works across a variety of devices - and it's quite easy!

If we measure the distance between each finger tip and the palm of the hand, as well as the relative angle of the palm's normal to the camera's viewport (or the user's head, if tracked), we can classify a large number of distinct gestures.


(The red dots are the nodes, the green arrow is the normal of the palm - a vector pointing away from the palm directly).

If you point your index finger, you will notice that the distance between the tip of your extended finger and your palm is much greater than that of any other digit and your palm. This is true regardless of which finger is extended, though the precise ratio can change - normally, the palm-distance between the extended digit will be more than 1.5x times that of the other digits.

To detect a handshake, for example, we must also check the distance between the fingers and the head, to make sure that all digits are more distant than the palm (i.e. the hand is being held straight). Furthermore, we need to check the angle of the palm against the angle of the head, and make sure that the palm is pointing inwards. For example, the palm of a right hand points toward the head's left when held out for a handshake.

We can combine the two methods as well, to detect the specific orientation of a pointed finger - classifying whether the user is pointing up or down, or what direction their palm is facing as they point (which is useful for detecting some of the less polite, but still distinctive gestures).
Once you begin to identify gestures in terms of these sorts of relationships, classifying them (and creating functions to check for them) becomes quite manageable - and it doesn't rely on anything device specific. As long as your hand tracking method can tell you the positions of the fingertips, and the position and rotation of the palm, you can apply this method of gesture recognition.

The crudeness of this method also does it something of a favour - these checks can be performed very quickly and the system can be expanded or reduced as necessary, making it fairly optimizable.

As mentioned above, taking it even further we can introduce context with other gestures. In the below example, different parameters of the flock of green blobs are affected by different gesture configurations - an open, upward-facing left palm combined with a pointed right index finger opens the settings for the follow speed of the flock. By contrast, two inward-facing open palms manipulates flock cohesion, and two extended index fingers allow control of each blob's avoidance radius.



And of course we can introduce gesture order as a requirement for accessing certain functionality. For example, we can require the user to make a finger gun, then lower the hammer to fire it:



Anyway, that's an overview of how the system works. I will be releasing it on the Asset Store once I've had a chance to do some more testing.