Facial recognition isn’t the only scary thing bad actors and governments can use computer recognition for. What if an AI could watch a video of us tapping on our touchscreen phones and infer exactly what app we’re using and what we’re typing?
Modern computer vision techniques have the ability to imbue us with the kind of technological super powers typically only seen in the movies. We can load video into an AI system and tell it to zoom in on a low-resolution frame and, with a little training and some clever algorithms, we can make it “enhance” the image. It’s like magic, only far more accessible.
That might not sound very nefarious, but the same technology Tesla uses in its driver-assistance features could be adapted for myriad purposes. We use computer vision for everything from cancer detection to counting large numbers of objects in a photograph.
There’s nothing stopping a clever developer from training an AI system to infer text from keystrokes or finger movement. And that’s pretty scary, we’ll explain why in a moment.
First, it’s worth mentioning that computer vision has come a long way since 2017 when Google’s AI still made simply mistakes such as confusing a turtle with a rifle.
Today’s CV systems can make incredibly robust inferences with very small amounts of data. For example, researchers have demonstrated the ability for computers to authenticate users with nothing but AI-based typing biometrics and psychologists have developed automated stress detection systems using keystroke analysis.
Researchers are even training AI to mimic human typing so we can develop better tools to help us with spelling, grammar, and other communication techniques. The long and short of it is, we’re teaching AI systems to make inferences from our finger movements that most humans couldn’t.
It’s not much of a stretch to imagine the existence of a system capable of analyzing finger movement and interpreting it as text in much the same way lip-readers convert mouth movement into words.
We haven’t seen an AI product like this yet, but that doesn’t mean it’s not already out there.
So what’s the worst that could happen?
Not too long ago, before the internet was ubiquitous, “shoulder surfing” was among the biggest threats faced by people for whom computer security is a big deal. Basically, the easiest way to steal someone’s password is to watch them type it.
That’s why most password entry screens hide the password as you’re typing it – you never know who can see your screen.
Most humans don’t have the ability to determine exactly which keys you’re pressing or what numbers you’re tapping on the screen. Our fingers move surprisingly fast when we know what we’re doing and we’ve got pretty good hand-eye coordination.
But AI can be trained on these tiny movements. And almost anything is possible in the world of AI with enough data.
This means, theoretically speaking, it should be relatively simple for a developer with enough resources to train up a model that could either run on an AI chip (such as those on many flagship smart phones) or connect to a cloud-based tech.
In the case of the former, it would give just about anyone in the world the ability to “see” what other people are tapping and typing on their phones and keyboards.
Hypothetically speaking, this would enable bad actors to steal passwords, ATM PINs, and entire documents (assuming you typed them in view of a camera).
And, if we look at the latter idea, where we have a cloud-based infrastructure connected, we have to assume big tech and the government are involved. The idea of, say, Google or the NYPD gaining the ability to turn any camera into a keystroke detector seems horrific.
Greetings Humanoids! Did you know we have a newsletter all about AI? You can subscribe to it right here.
Get the TNW newsletter
Get the most important tech news in your inbox each week.