Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One of the big eye openers here is not the use of OpenAI, but how dumb and limited Siri seems by comparison (especially its inability to grasp context). Apple, Amazon and (to a lesser extent) Google’s voice assistants never evolved past simple intent and keyword matching, and this hack puts their product teams to shame…


ChatGPT is impressive but it makes a lot of mistakes. Siri can't afford that rate of errors for PR and legal reasons, so they need to use a technology that's less flexible but more reliable / safer. This is similar to self-driving cars: it's relatively easy to come up with a proof-of-concept but making it into a safe mainstream product is a different story.


Siri makes a lot of mistakes.

It keeps mishearing "a lamp" as "alarm", and goes into alarm setting mode even if the rest of the query makes no sense.

When told to turn of specific lights, it sometimes ignores the room/qualifier, and turns all lights off in the whole house.

Many other queries end up "I can't find <what you said> in your Music library".

Siri is three regexes in a trench coat.


> It keeps mishearing "a lamp" as "alarm",

Doesn't this Shortcut use the same voice recognition? Doesn't seem like a problem GPT solves.


You've missed the "even if the rest of the query makes no sense." part.


ChatGPT will have the same problem, since it works on the text iOS gave it. It's not clear to me it would perform any better at reformulating your query.


1. I expect it to know it's not possible to increase brightness of an alarm and reject incongruent requests.

2. This particular implementation is limited to operating on an English text in its finalized form, but a different first-class LLM implementation of an AI assistant could work directly on some form of phonetic input. ChatGPT is pretty good at dealing with such ambiguities, e.g. it already understands questions written in IPA notation. It also understands Japanese written in romaji, which is a non-native phonetic spelling of a language with tons of homonyms.


> Siri can't afford that rate of errors for PR and legal reasons

Have we been using the same Siri? The only thing I trust it with is starting a timer. Everything else is literally a coin flip if it’ll actually understand me or mangle my request into something ludicrous.


But one timer and one timer only. Siri cannot start a second timer on your phone in 2023. Its ridiculous


That’s not Siri, but the Clock app. Siri’s “start a timer for …” just calls out to the Clock app.


I don’t trust it for setting a timer fully. When I say “set a timer for 50 minutes” (laundry) it ends up setting a 15 minute timer >50% of the time


My experience as well - I now just disable it on new phones. I think they only thing worse - or a close second - is the Apple TV interface.


IME Siri is much more limited than Alexa and Google Assistant. Have there been any lawsuits regarding those assistants? Or is Apple just being more conservative for other reasons?


Yeah IIRC they had an emphasis on disaster response and getting fixes pushed out across all 20+ languages.


You're conflating technology with functionality. Surely Siri's core tech can be improved.


Did I say that Siri's core tech cannot be improved?


Yeah, I use a Siri shortcut that lets me ask questions of Google Assistant. "Hey Siri, OK Google". The main downside is that this requires me to unlock my phone before it will proceed beyond this point. I usually am asking via an AirPod, when my phone is in my pocket.

But Apple is getting better, with Siri pulling up a relevant website and reading me the first bit, and offering to go further. I used this yesterday when I was driving and talking to my kid about biology; we looked up "cytoplasm" and various other technical terms, and it gave accurate definitions with sources noted. The only thing it failed to do was to tell us how many kingdoms of life there are (but from looking at Wikipedia it appears this question is not quite cut-and-dried).


I really want to agree, but those are different tasks. Apple and Amazon attempt to focus on monetizable parts of NLP flows meaning optimization for narrow use cases (play music, order me a book, etc) and OpenAI can share an impressive chat bot, to just see how that plays out and generate extra PR.

Pretty sure Apple and Amazon are capable of improving their voice-assistants, the question is whether they decide to invest into it. OpenAI is relatively young and is "quicker" than large orgs, as it's a startup with top-notch talent.


Oh come on, I haven't tested it lately but for like 5 years, if you asked Google, "play Mozart next" it would say it can't find "Mozart next".

Even with the most common use cases it doesn't even try. One coder, one evening, simple pattern matching + a few rules and it could be improved so much. I simply don't understand how they can keep it so bad for so long. There were more examples like this, I don't really have a list since I gave up on it.


> One coder, one evening, simple pattern matching + a few rules and it could be improved so much. I simply don't understand how they can keep it so bad for so long.

Exactly what I've been thinking in the last 10 years. It was ludicrously bad, and not improving even on obvious things. They just sat on it.


I don't know what you're talking about. "play Mozart next" opens Spotify and selects Mozart for me. if anything, it's too eager to play music. Sometimes it thinks turning off the lights is a request to play a song.


Now try “play two Mozart songs, a Bruce Springsteen song, and then four flaming lips songs” or ask it to play Dark Side of the Moon in alphabetical order.


If these were actual requests humans would make, I’m sure it would not be a difficult task to implement such functionality.

Your example requests are at best extreme outliers, and not good tests of smart home assistants.


I think the point is that you shouldn’t have to explicitly implement any of this stuff. It should “understand” basic commands that include sequences and counts.

It seems like ChatGPT could be a giant leap ahead of the current crop of home assistants.


I would be happy to be able to schedule lighting and media properly. Just the basics actually working would be great.


I think that's rather a case of hindsight as we now realise and agree that Alexa, Siri etc won't evolve past this crude monetization scheme.

They began as very much what you describe though, to generate pr and make their underlying ecosystems seem more attractive and advanced.


The question is what are the respective companies trying to get it to do.

Apple appears to try to make Siri an well defined interface to specific apps that offer specific services.

Looking at https://developer.apple.com/documentation/sirikit you can see specific intents and workflows that Siri can hook into. This makes it rather limited but for the things that it can do, it does. To that end, Apple isn't trying to monetize Siri - its trying to be a hands free interface to specific tasks that apps can do.

Amazon was trying to make Alexa a general tool (and they've reduced those capabilities over the years) running on AWS with additional goals of providing additional monetization routes. Things like "A book by an author you follow was released, would you like to add it to your cart?" Personally, I never found the chat more than a curiosity and even less so now that there is no knowledge engine backing the general knowledge questions.


Not really. Amazon has already started working on "generalizable intelligence" [0] for Alexa inspired by GPT-3 (as they say so themselves) and released to production at least one model (viz Alexa Teacher Model) based on that effort: https://archive.is/gItZq / https://www.amazon.science/blog/scaling-multilingual-virtual...

[0] https://archive.is/UlCpM / https://www.amazon.science/blog/alexas-head-scientist-on-con...


Worth mentioning that article is from last June, but in December Amazon laid off 10,000 people, most of which were in the Alexa division.

https://www.forbes.com/sites/qai/2022/12/06/amazon-stock-new...


> in December Amazon laid off 10,000 people, most of which were in the Alexa division

Maybe the new tech worked _really well_.


The article mentions that “API will cost around $0.014 per request.”


I keep trying Siri every now and then, and about the only things work reliably for me are (a) asking through my airpods for it to phone someone, (b) setting a timer and (c) asking what the piece of music I'm hearing is. But the answer to (c) is only on the screen for a short time, which isn't ideal if it's while I'm driving and can't write it down, and if I say "what was the music you just identified" or similar, it has no understanding whatsoever. Every time I try it I come up against something like that which feels really obvious. This ChatGPT version sounds pretty amazing in comparison.


If you download Shazam (owned by apple) you can see all the past music that Siri has identified


You can also add the Music Recognition/Shazam tile to control centre and long press on it for the history with no additional apps, but it's very hidden. I wonder why apple is not shipping Shazam or some sort of UIs by default.


Thank you! That was really bugging me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: