Walt Mossberg wrote an Article about why Siri seems to be so dumb and compared Apples “AI Assistant” with e.g. “Google Assistant” or “Amazon Alexa”.
Currently competition seems to focus on more and more Artificial Intelligence topics. Most of the technologies and algorithms that drive this current incarnation of an old movement are not particularily new. The difference is that they now try to create end-user-products from those technologies. Apple historically proved great talent in creating such mass products from former niche technologies. It can be difficult to recognize the key characteristics that are necessary to differentiate between a “cool demo prototype” or a solid product for anyone. Apple tends to reduce things; make them simpler and strives to get those few things right that make up the core of a real product.
In comparison: Google is better in creating “cool demo prototypes”. Their devs and managers don’t have the same attitude to make things simple enough for everyone. The one thing were they did succeed in this way is the Google search engine.
The Long Tail
Mossberg was unhappy with the answers, or often the lack of answers to some questions he asked Siri.
Siri has been unable to tell me the names of the major party candidates for president and vice president of the United States. Or when they were debating.
— Walt Mossberg, the Verge
Apple told him about what they call “long tail questions”. This question seems to be such a thing. Mossberg wondered though, how this question is different to e.g. “When was Abraham Lincoln born” — which Siri has an answer for. And this clearly shows the difficulties which “Artificial Intelligence”-Products face today: The typical customer tends to see it much like an intelligent being and so there is little understanding on why and how something works or not.
So what is a “long tail question”? The point is actually not really the structure or the content of the question itself — in the case of Siri it actually is the popularity. There is this big pile of questions which make up the main bulk of queries to Siri — those get optimized first and most. And then there are those many many questions which get asked only a few times — this is the long tail in the statistical diagram. The question about Abraham Lincoln perhaps was just a thing that often enough was used for demo purposes. Mossberg also mentioned it being used prominently on the Apple website— the reason it did work may even be just that it had to work for such demos. Things like this actually just depend on big fact databases — something like a machine processable version of Wikipedia. Another difference is that the Lincoln question is about older historical facts. It is likely that there already is some knowledge database Apple can use as a source to draw information like this from. The questions about the 2016 candidates for US presidents are actually about short time “news facts”. This are facts which are quite fresh and may be of alot less interest quickly after the event.
What IS Siri actually?
But why do such things work better in “Google Assistent”/”Google Now”? To answer this question one has to look at what “Siri” actually was and is. The task of siri is to recognize a spoken sentence per voice recognition, successfully transliterate it and then find a semantical pattern rule on what it was about. If one speaks:
“Write Sue-Allen a message using WhatsApp”
What Siri does is recognizing a Rule of the form:
MESSAGE (receiver=Sue-Allen) USING (WhatsApp) WITH CONTENT (?)
It can already complete the receiver by searching its contacts database and it can choose an Siri-enabled App. The sentence seems to miss actual message content, which triggers Siri to ask for this content.
The problem Siri (the algorithm) actually solves is, that you can use quite different forms of natural sentences to resolve to such pattern queries. What Siri does not solve is looking for and knowing about World facts or actually doing real “intelligent reasoning”. When Apple says they continuously work on Siri to make it understand more this actually means that they implement more kinds of semantical pattern rules and import more sources about factual knowledge that can be used from this rules. This is an ongoing process not much different to what needs to be done for map data and factual knowledge about places.
The Siri SDK — a new feature — actually is nothing more than a public documentation of a small set of semantic conversation patterns that Apps can register with. This way a new messenger App can easily integrate itself with Siri and in this way get a natural language user interface.
Google has something they call “Google Knowledge Graph”. This is actually just their version of a factual database which also gets extended using direct human knowledge on one side but also using automatic processes to continuously extend it. Google’s most comprehensive product (some say its only real) is its search engine. This engine needs to understand in part things written in the web pages it parses. It needs this to make the Google search queries work naturally. This is were Google seems to have a bigger factual database and more processes to do data mining on day by day news facts. If Apple wants to compete on this knowledge database field it needs to find or create similar sources of common facts.
Does Siri need a “Apple Knowledge Graph”?
So the big question is: Does Apple really need to compete with Google when building up its factual databases? This question leads me back to the introduction were I claimed that Apple has a talent to create astonishing mass products; particularily reducing the things those do to the most useful minimum. The question is: Is it important for a digital assistant to answer all possible questions like “When was Abraham lincoln born?” or is it actually more important to answer personal questions like “Do I have any appointments next weekend?” or trigger actions like “Remember me to call Peter when I come home”. Those kind of natural language queries don’t profit very much from globally data mined factual knowledge — they need access to your most personal data. This is what Google actually stressed on being new with their “Google Assistent” on the Pixels. They made clear they want to collect and pile up any private and personal data about you so that they can make their computing center to use that data to act in a way that is useful for you. This is where the competition between Apple and Google boils down to the fact how Apple wants to provide this by collecting as few data as possible and provide tools that can compute using those facts locally. All done for the simple feature of protecting your private and personal data. Googles approach always was to first collect all data it can get — being public, private or personal. Even legal issues often seem unimportant and are postponed in a way that “if nobody screams- it might have been ok to just take it.”
I don’t think that it really is important for Apple to concentrate on answering questions from common public knowledge. I think there are more important things Apple can do to make Siri better.
Apple needs to move away from its strict voice input interface to Siri. Of course: Using spoken words is useful if it is difficult to type in something. Its a “handsfree user interface”. But using natural language can be very useful also in written form. The popular calendar app “Calendars 5” uses written natural language input to create appointments. This is often quicker than filling out forms and dialogs. Often, the easiest user interface is just a “natural language text field”. You actually can use Siri this way today by first speaking “Hey” and then tapping and editing the query; but this is nothing more than a workaround.
Apple needs to address some user interface issues with Siri — particularily how it copes with mixed language queries, proper names and correcting errors and misunderstandings.
Apples Operating Systems already have another textual query engine — it is called “Spotlight” and is available nearly everywhere. This also means though, that Apple actually has two very different query interfaces: The natural language “Siri” using spoken words and the much simpler Spotlight. Apple really needs to combine those two query interfaces into an all-encompassing one.
A natural language interface with written and spoken word in mixed language on any of its platforms and with perfect integration in privacy protected personal data sources. Thats what Siri needs.