July 18, 2010

I’ve always hated predictive text in SMSs, but the prediction and autocorrection on iOS (the operating system for the iPad, iPhone, and iPod Touch) is, by and large, extremely good. I can type fat-fingered txts on my iPhone and it happily turns “Wdnt me tp gwt vresd frim Tesxo on thr way himr?” into “Want me to get bread from Tesxo on the way home?” OK, so it doesn’t know about TescOS[1], but the message is a lot more comprehensible in their version. And once you learn to trust its most predictable autocorrection, you can use it to save time by typing “youre” and “wouldnt”, safe in the knowledge that it will add the apostrophe — without you having to hit numeric-shift, find the apostrophe, type it, hit numeric-shift (the “123” key) again to turn it off, type the next letter, realise it’s already released numeric-shift so you’ve just typed a ‘5’ instead of a ‘t’ (I am sure numeric-shift was sticky in the earliest versions of iOS), delete it, type the ‘t’… I do find it really hard to mistype things deliberately, but it’s starting to get to the point where it’s a shortcut my fingers know how to take.

There are one or two places, though, where the autocorrection is wrong. The most aggravating one of these is “its”, which always gets autocorrected to “it’s”. Now, you do get a chance to say “no”: if you type “its”, a tooltip-type bubble pops up with the proposed autocorrection “it’s” and you can press the “[x]” to make it go away, to decline the edit. But it’s not easy (the “[x]” is about 4 pixels wide) and in order to do this you have to notice it in the first place.

iPhone autocorrection from "its" to "it's"

It's got its apostrophes confused

I’m used to it now, I never type the words “its” or “it’s” without checking, but it’s still irksome. You could argue that by always using “it’s” iOS was actually just following modern usage, BUT YOU’D BE WRONG AND SO WOULD ALL THE PEOPLE WHO DON’T KNOW THE DIFFERENCE BETWEEN “ITS” AND “IT’S”.[2] Ahem.

So much for the actual wrongness. There are two other categories of annoying autocorrection (or absence of autocorrection):

  1. Oh, come on, you know what I meant
  2. Huh?

Category one includes things like failing to realise that “sine” was a mostly-off-by-one for “some”, and failing to realise that “fir” is a mistyping of “for”. These are real words, I can’t really blame iOS for this, but… it just feels as though it somehow ought to be able to predict that “some” is more likely than “sine” in most people’s SMSs, and “for” is more likely than “fir”. (No, I haven’t checked, but I think a corpus of SMSs would back me up on this one.) If I was a mathematician or a lumberjack then I’d probably be really glad that the autocorrection wasn’t trampling all over my perfectly valid use of “sine” or “fir” … but I’m not. (Unfailingly changing “reading” to “Reading” is even more understandable, but still usually wrong.)

Category two is the autocorrections which are just baffling. Examples:

  • When I mistype “to” as “ti”, it changes it to “ti'”. That’s “ti” with an apostrophe after it. Is that even a word? Google suggests “Ti’ punch”. Is that really more commonly used than the word “to”? Is there some other meaning I’m missing?
  • Various mistypings of “want” seem to get changed to “Wang” (with a capital W). OK, it’s a common Chinese surname, but it’s quite unlikely to be the word I actually Wang…
  • Various mistypings of “gone” get changed to “Gond”. Google suggests that this is something to do with Dungeons & Dragons. I don’t even want to know.

In all these cases, though, I wouldn’t want to replace a guess that’s frequently wrong for me with a guess that’s frequently wrong for someone else: what I really want is an opportunity to configure this behaviour, to decide which autocorrections help me and which are far more likely to hinder. I want to be able to add rules like:

  • never autocorrect anything to “ti'” (because it’s not a word in my vocabulary!)
  • never add an apostrophe to “its” (because I know better than iOS here)
  • correct “fir” to “for” (because while I might want to say “fir”, that’ll be a far less frequent annoyance)
  • never autocapitalise things (because proper nouns are hard to guess, and I’d rather do them by hand than get annoyed by wrong guesses)

I also want to be able to edit things out of its autocorrection dictionary altogether (if I have a sudden burning need to txt people about Wang of Gond, I’m sure I’ll be able to type it by hand) and add things in. The latter does seem to happen eventually (if I type “OYCS” it does now suggest “OUCS” — i.e. Oxford University Computing Services, where I work) but I want to be able to accelerate the process, because I do still actually know more about my life than it does. Just about.

Also, less practically, I want to understand how the software makes its predictions and corrections. In a former job, a long time ago, I spent quite a lot of time editing/processing electronic texts which had been keyed from photocopies of 16th century texts, and once in an idle moment I started trying to write a script to pick out common keying errors automatically — particularly the substitution of “f” for a medial “s”. My efforts were painfully naive, but it was clear that there were some combinations of letters that never happened in English, and others which sometimes happened but were extremely unlikely, and so on. If anybody knows of any readable explanations of the algorithms behind autocorrection and predictive text (assuming they’re not all trade secrets?), please do point me to them!

[1] very old in-joke, sorry
[2] This one’s STUPIDLY SIMPLE, people!!!