Thursday, November 17, 2011

Google. It Knows.

Fascinating article in the London Review of Books:

"It knows" by Daniel Soar
http://www.lrb.co.uk/v33/n19/daniel-soar/it-knows/print

This appears to be an article that digests several books about Google and attempts to divine Google's future plans. It's a fun read. Some highlights:

"Of Schmidt’s four technology juggernauts, Google has always been the most ambitious, and the most committed to getting everything possible onto the internet, its mission being ‘to organise the world’s information and make it universally accessible and useful’. Its ubiquitous search box has changed the way information can be got at to such an extent that ten years after most people first learned of its existence you wouldn’t think of trying to find out anything without typing it into Google first. Searching on Google is automatic, a reflex, just part of what we do. But an insufficiently thought-about fact is that in order to organise the world’s information Google first has to get hold of the stuff. And in the long run ‘the world’s information’ means much more than anyone would ever have imagined it could. It means, of course, the totality of the information contained on the World Wide Web, or the contents of more than a trillion webpages..."

"But all this is just the stuff that Google makes publicly searchable, or ‘universally accessible’. It’s only a small fraction of the information it actually possesses. I know that Google knows, because I’ve looked it up, that on 30 April 2011 at 4.33 p.m. I was at Willesden Junction station, travelling west. It knows where I was, as it knows where I am now, because like many millions of others I have an Android-powered smartphone with Google’s location service turned on. If you use the full range of its products, Google knows the identity of everyone you communicate with by email, instant messaging and phone, with a master list – accessible only by you, and by Google – of the people you contact most. If you use its products, Google knows the content of your emails and voicemail messages (a feature of Google Voice is that it transcribes messages and emails them to you, storing the text on Google servers indefinitely). If you find Google products compelling – and their promise of access-anywhere, conflagration and laptop-theft-proof document creation makes them quite compelling – Google knows the content of every document you write or spreadsheet you fiddle or presentation you construct. If as many Google-enabled robotic devices get installed as Google hopes, Google may soon know the contents of your fridge, your heart rate when you’re exercising, the weather outside your front door, the pattern of electricity use in your home."

FWIW, I do have one of those devices that reports (or used to, anyway) my electricity usage back to Google. And several Android phones, too, of course! :)

"Google knows or has sought to know, and may increasingly seek to know, your credit card numbers, your purchasing history, your date of birth, your medical history, your reading habits, your taste in music, your interest or otherwise (thanks to your searching habits) in the First Intifada or the career of Audrey Hepburn or flights to Mexico or interest-free loans, or whatever you idly speculate about at 3.45 on a Wednesday afternoon. Here’s something: if you have an Android phone, Google can guess your home address, since that’s where your phone tends to be at night. I don’t mean that in theory some rogue Google employee could hack into your phone to find out where you sleep; I mean that Google, as a system, explicitly deduces where you live and openly logs it as ‘home address’ in its location service, to put beside the ‘work address’ where you spend the majority of your daytime hours."

"Some people find all this frightening. ...the fear is that all the information about us it has hoovered up is used to create scarily exact user profiles which it then offers to advertisers, as the most complete picture of billions of individuals it’s currently possible to build. The fear seems be based on the assumption that if Google is gathering all this information then it must be doing so in order to sell it: it is a profit-making company, after all. ‘We are not Google’s customers,’ Siva Vaidhyanathan writes in The Googlisation of Everything. ‘We are its product. We – our fancies, fetishes, predilections and preferences – are what Google sells to advertisers.’"

"The reason is that Google is learning. The more data it gathers, the more it knows, the better it gets at what it does. Of course, the better it gets at what it does the more money it makes, and the more money it makes the more data it gathers and the better it gets at what it does – an example of the kind of win-win feedback loop Google specialises in – but what’s surprising is that there is no obvious end to the process. Thanks to what it has learned so far, Google is no longer the merely impressive search engine it was a decade ago."

"What every one of those signals is and how they are weighted is Google’s most precious trade secret, but the most useful signal of all is the least predictable: the behaviour of the person who types their query into the search box. A click on the third result counts as a vote that it ought to come higher. A ‘long click’ – when you select one of the results and don’t come back – is a stronger vote. To test a new version of its algorithm, Google releases it to a small subset of its users and measures its effectiveness through the pattern of their clicks: more happy surfers and it’s just got cleverer. We teach it while we think it’s teaching us. Levy tells the story of a new recruit with a long managerial background who asked Google’s senior vice-president of engineering, Alan Eustace, what systems Google had in place to improve its products. ‘He expected to hear about quality assurance teams and focus groups’ – the sort of set-up he was used to. ‘Instead Eustace explained that Google’s brain was like a baby’s, an omnivorous sponge that was always getting smarter from the information it soaked up.’ Like a baby, Google uses what it hears to learn about the workings of human language. The large number of people who search for ‘pictures of dogs’ and also ‘pictures of puppies’ tells Google that ‘puppy’ and ‘dog’ mean similar things, yet it also knows that people searching for ‘hot dogs’ get cross if they’re given instructions for ‘boiling puppies’. If Google misunderstands you, and delivers the wrong results, the fact that you’ll go back and rephrase your query, explaining what you mean, will help it get it right next time. Every search for information is itself a piece of information Google can learn from."

"By 2007, Google knew enough about the structure of queries to be able to release a US-only directory inquiry service called GOOG-411. You dialled 1-800-4664-411 and spoke your question to the robot operator, which parsed it and spoke you back the top eight results, while offering to connect your call. It was free, nifty and widely used, especially because – unprecedentedly for a company that had never spent much on marketing – Google chose to promote it on billboards across California and New York State. People thought it was weird that Google was paying to advertise a product it couldn’t possibly make money from, but by then Google had become known for doing weird and pleasing things."

"What was it getting with GOOG-411? It soon became clear that what it was getting were demands for pizza spoken in every accent in the continental United States, along with questions about plumbers in Detroit and countless variations on the pronunciations of ‘Schenectady’, ‘Okefenokee’ and ‘Boca Raton’. GOOG-411, a Google researcher later wrote, was a phoneme-gathering operation, a way of improving voice recognition technology through massive data collection."

"Three years later, the service was dropped, but by then Google had launched its Android operating system and had released into the wild an improved search-by-voice service that didn’t require a phone call. You tapped the little microphone icon on your phone’s screen – it was later extended to Blackberries and iPhones – and your speech was transmitted via the mobile internet to Google servers, where it was interpreted using the advanced techniques the GOOG-411 exercise had enabled. The baby had learned to talk. Now that Android phones are being activated at a rate of more than half a million a day,[4] Google suddenly has a vast and growing repository of spoken words, in every language on earth, and a much more powerful learning machine. If your phone mistranscribes what you say, you correct it by typing it in, and Google’s algorithms – once again – are taught how to get better still. It’s a frustratingly faultless learning loop."

"They also threaten to put whole industries out of business by being free. In 2009, Google updated its Maps application for Android to include free turn-by-turn navigation: on-screen and spoken directions to whatever destination you choose. The cost to Google was negligible, and the damage to existing businesses was enormous: companies like Garmin and TomTom had been getting large margins on hundred-pound satnav hardware, and then charging for monthly subscriptions. Not any more."