Wednesday, December 4, 2024

Who's Training Who? (The Perils Of Voice Recognition)

Is the machine serving us, or vice-versa?

A reader writes, suggesting I use voice recognition to dictate blog entries.  It is a good idea and I have used it in the past.  You can tell the voice recognition entries, as they tend to be verbose and long and sometimes have awkward phrasing.  This one is typed, and probably suffers from the same problems.

I tried one of the first voice recognition programs, Dragon NaturallySpeaking back in the 1990s.  It was kind of a hot mess.   As I recall, you had to "train" it with sample sounds, and even then, well, the results varied from frustrating to amusing.  I gave up rather quickly.

Still, it was an amazing thing.  As a kid, my Math teacher had this "crazy" idea that 7th graders could learn computer programming - normally a special elective reserved only for the smartest High School Seniors.  Come to think of it, in 2nd grade, they taught us set theory and Boolean Algebra as part of the "New Math" curriculum.  I could have been Bill Gates!

Problem was, back then, software was considered "Liberal Arts" and a trivial pursuit.  Colleges didn't recognize it as "science" and neither did the Patent Office.  Maybe that is why so many tech Billionaires are college dropouts.  Not everything there is to learn is taught in school.  In fact, nothing new is learned by memorizing the old.  But I digress.

Back then, of course, it was the 1970s and we sat around all day long getting high and drinking beers (at age 15, act shocked) and when passing the bong would say things like, "Hey man, wouldn't it be cool if someday you could just talk to a computer and it would talk back to you?"  And a friend would reply, "Far out, man!  Maybe someday they'll make a computer small enough to fit in a suitcase!"  "No way, man!"  I remember having a vision, at age 16, while high, about artificial intelligence - something about language models or something, I forget.  It evaporated as quickly as it appeared.

We had no idea how prescient our marijuana-fueled fever-dreams would be, and how soon they would come true, and how timid our expectations were.  Today, a cell phone fits in the palm of your hand and does all these things and more - and stores a library of information as well.  I have over 10,000 songs stored on my cell phone, along with a library of a thousand books.  For a guy raised on 20MB hard drives, who had to solder in DIP chips of memory, one Kbyte at a time, well, it seems unreal.

And yea, voice recognition has come a long way.  But it also hasn't.  One reason I am loathe to use it - in addition to the verbosity problem - is the manual corrections that need to be made.  And these corrections are as painful (in my case, literally) as typing new text.  Speaking of which, I have about 20 minutes until the Tylenol wears off, so I'd better damn well get to the point.

Google Voice has problems with homonyms.  They're, Their, and There are indistinguishable to voice recognition software, as are You're and Your.  Not long ago, posting something online and using the wrong your or you're would generate a litany of pedantic complaints along the lines of, "It's spelled you're, dumbass, buy a dictionary!"  But today, I see less and less of that, as everyone just assumes you are using voice recognition.  In automotive discussion groups, confusion of "breaks" for "brakes" isn't even commented on anymore.  Everyone knows what you meant.

Which raises an ugly point: Are we training the machines or are they training us?  Because I notice already that I tend to avoid contractions when using voice recognition to as to avoid the You're/Your or the They're/Their dichotomies.  So my voice recognition "writing" tends to be more formal, with "You are" or "They are" instead of their contracted counterparts (which save one ASCII character each!).

(I think also that people are accepting these alternative spellings and they may supplant the real deal in a decade or two.  The online brochure for your new SUV will describe the four-wheel anti-lock disc breaks and no one will bat an eye.  The Oxford English Dictionary will list "brakes" as an archaic spelling).

But beyond that, I find myself talking differently as well.  Apparently, according to voice recognition, I have several speech impediments, so I have to pronounce words more carefully.  Funny thing, though, when it comes to punctuation, Google Voice recognizes period, comma, exclamation point, and question mark, as actual punctuation, but cannot understand "quote" or "quotation mark" for the life of me.  Moreover, sometimes it reverts to spelling out those words, instead, for no apparent reason.

I wonder if perhaps voice recognition is training us to talk in a certain, stilted, accent-free manner, to the point where, a decade from now, the sound of our very voices will be unrecognizable to the people  of today.  We adapt to the machine, not vice-versa.  And if you need evidence of this, look no further to your smart phone, which you are likely hunched over as we speak (Are we speaking? Why is that a phrase?).  And consider how "Social Media" has changed how people actually think - and how it has swayed elections and overturned governments.  We are slaves to the machine.

Of course, it might not end up that way (or has it already?). A few lines of code could be inserted to use text context to determine whether you meant to say, "You're breaks are bad" or not.  Maybe "AI" will fix this, maybe not.   The results of Google searches using AI are laughable and inexcusably wrong - for no apparent reason, it seems, other than to mess with our minds.  Maybe that is the point.

The Tylenol is wearing off and my error rate is skyrocketing, so I guess I am done for today.  Not verbose, though, eh?  Take that, Google Voice!