Carbon Copy

Six Gestures, Maximum Value: Typing in Mixed Reality


https://arxiv.org/pdf/2403.06998

Mixed reality devices face a key problem: how do we type efficiently in public? This matters for both privacy and usability.

I want to interact with computers faster. I use Dvorak, vim shortcuts, and voice dictation. Brain-computer interfaces would be ideal, but most people don't want brain implants, and they are a few years away from being commonplace.

Meta's Orion glasses offer a middle ground. They use an EMG wristband that reads arm signals when you make hand gestures; no invasive technology needed.

Voice dictation is fast (three times faster than phone typing) but often inappropriate in public. I don't want to announce my messages in coffee shops or share private thoughts on buses. Carrying a keyboard defeats the purpose of lightweight mixed reality glasses.

Meta's wristband approach works because it's discreet and requires no extra equipment. But it has a major limitation - it only detects six hand gestures:

Available Gestures

  1. Thumb up
  2. Thumb down
  3. Thumb left
  4. Thumb right
  5. Thumb tap
  6. Pinch

Six gestures severely limits text input. A standard keyboard has 80+ keys, and even smartphone keyboards have dozens of touch points. This raises the question: how can we type efficiently with so few options?

Here are methods to maximize efficiency with these limited inputs, considering speed, learnability, and practical daily use:

Method 1: Linear Selection

A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, SPACE, DELETE

How it Would Work:

Thoughts:

Method 2: 2D Grid Linear Selection

This is a slight improvement of the previous method by simply adding a second dimension

A, B, C, D, E
F, G, H, I, J
K, L, M, N, O
P, Q, R, S, T
U, V, W, X, Y
Z, SPACE, DEL

How it Would Work:

Method 3: 2D Grid with Axis Selection

This is the same 2D layout of letters as before, only now, instead of navigating a cursor around a grid you can select the X and Y axis of your desired input.

   [1][2][3][4][5]
[1] A, B, C, D, E
[2] F, G, H, I, J
[3] K, L, M, N, O
[4] P, Q, R, S, T
[5] U, V, W, X, Y
[6] Z, SPACE, DEL

How it Would Work:

Thoughts:

Method 4: Triple Click (Like Old Phone Keyboards)

Remember texting on old Nokia phones? Similar concept but with thumb gestures instead of number keys.

Thumb up:    A, B, C
Thumb down:  D, E, F
Thumb left:  G, H, I
Thumb right: J, K, L
Thumb tap:   M, N, O
Pinch:       P, Q, R, S

Long Thumb up:    T, U, V
Long Thumb down:  W, X, Y, Z
Long Thumb left:  SPACE
Long Thumb right: DELETE
Long Thumb tap:   .?!
Long Pinch:       ,;:

How it Would Work:

Thoughts:

Method 5: Modal Layers

This is exactly like methods one and two but now it adds a third dimension, think of it as stacking lots of keyboards on top of one another. Like having different keyboards that you can switch between with a gesture.

Mode 1: Common Letters
A, B, C, D, E
F, G, H, I, J
K, L, M, N, O
P, Q, R, S, T
U, V, W, X, Y
Z, SPACE, DEL

Mode 2: Numbers & Symbols
1, 2, 3, 4, 5
6, 7, 8, 9, 0
!, @, #, $, %
^, &, *, (, )
-, +, =, [, ]
{, }, <, >, /

Mode 3: Common Words
THE, AND, FOR
BUT, YOU, THAT
WITH, HAVE, THIS
FROM, THEY, WILL
WHAT, BEEN, WHEN
THERE, THEIR, YOUR

How it Would Work:

Thoughts:

Future Possibilities

As EMG technology improves, we might see: