I find it quite funny how this blog post has a big "Ask ChatGPT" box at the bottom. So you might think you could ask a question about the contents of the blog post, so you type the text "summarise this blog post". And it opens a new chat window with the link to the blog post followed by "summarise this blog post". Only to be told "I can't access external URLs directly, but if you can paste the relevant text or describe the content you're interested in from the page, I can help you summarize it. Feel free to share!"
That's hilarious. Does OpenAI even know this doesn't work?
It looks like this doesn't work for users without accounts? It works when I'm logged in, but not logged out. I went ahead and reported it to the team. Thanks for letting us know!
SDET here. A year ago when AI came into play SDET/QA roles started disappearing. People were like oh ya anyone can write tests. Then with the recent fiascos about outages and what not, I am seeing the SDE roles are disappearing and SDET roles are going back up?! Apparently AI is good at writing applications but you still need someone to make sure it is doing the right things.
It’s not really good at writing the software either — it’s a moderate to decent productivity booster in an uneven, difficult-to-predict assortment of tasks. Companies are just starting to exit the “we’re still trying to figure this out” grace period. Expect more of that as soon as these chatbot companies have to start charging enough to pull in more money than they spend. I foresee some purpose-built models that are pretty lean being much more useful in long run. It’s neat that the bot which can one-shot a simple CRUD website for you can also crank out Scrubs-based erotic fan fiction novellas by the dozen but I don’t foresee that being a sustainable business model. Having good purpose-built tools is, in my opinion, better than some unwieldy tool that can do a whole bunch of shit I don’t need it to.
Interestingly, the first real productive use of AI that I found was writing the unit tests and integration tests for my applications. It was much better at thinking about corner cases that I was.
I picked up Claude today after being away and using only ChatGPT and Gemini for a while.
I was pretty impressed with how they’ve improved user experience. If I had to guess, I’d say Anthropic has better product people who put more attention to detail in these areas.
Many people buy two separate Claude pro subscriptions and that makes the limit become a non-issue. It works surprisingly well when you tend to hit the 5 hourly limit after a few hours, and hit the weekly limit after 4-5 days. $40 vs $100 is significant for a lot of people.
I hit limit of Pro in about 30 minutes, 1 hour max. And only when I use a single session, and when I don't use it extensively, ie waits for my responses, and I read and really understand what it wants, what it does. That's still just 1-2 hours/5 hours.
You're probably having long sessions, i.e. repeated back-and-forth in one conversation. Also check if you pollute context with unneeded info. It can be a problem with large and/or not well structured codebases.
The last time I used pro, it was a brand new Python rest service with about 2000 lines generated, which was solely generated during the session. So how I say to Claude that use less context, when there was 0 at the beginning, just my prompt?
So you had generated 2000 lines in 30 minutes and ran out of tokens? What was your prompt?
I’d use a fast model to create a minimal scaffold like gemini fast.
I’d create strict specs using a separate codex or claude subscription to have a generous remaining coding window and would start implementation + some high level tests feature by feature. Running out in 60 minutes is harder if you validate work. Running out in two hours for me is also hard as I keep breaks. With two subs you should be fine for a solid workday of well designed and reviewed system. If you use coderabbit or a separate review tool and feed back the reviews it is again something which doesn’t burn tokens so fast unless fully autonomous.
Thanks for the tip, didn’t think of using 2 subscriptions at the same company.
When reaching a limits, I switch to GLM 4.7 as part of a subscription GLM Coding Lite offered end 2025 $28/year. Also use it for compaction and the like to save tokens.
I'm using it via Copilot, now considering to also try Open Code (with Copilot license). I don't know if it's as good as Claude Code, but it's pretty good. You get 100 Sonnet requests or 33 Opus request in the subscription per month ($20 business plan) + some less powerful models have no limits (i.e. GPT 4.1), while extra Sonnet request is $0.04 and Opus $0.12, so another $20 buys 250 Sonnet requests + 83 Opus requests. This works for me better since I do not code all day, every single day. Also a request is a request, so it does not matter if it's just a plain edit task or an agent request, it costs the same.
Btw. I trust Microsoft / GitHub to not train on my data more (with the Business license) than I would trust Antrophic.
I agree! I recently migrated from ChatGPT to Claude and it is just superior in every way. It doesn't blather on the at the end ask me for clarification. It's succinct and clarifies vital information before providing a solution.
Oh interesting. I've never used voice input on either so I can't comment, but understandable why you can't switch if it's disruptive to your workflow to do so.
I held off migrating from ChatGPT to Claude Code due to being a laggard that lived in the Eclipse world. I didn't believe what I was told that I wouldn't be writing code any more. Pushed into action by recent PR gaslighting from OpenAI, I jumped to claude code and they were right - I barely venture into the IDE now and certainly don't need an integration.
I agree, but in general those chat apps have relatively bad user experiences for multibillion BtoC company. I used to have a lot of surprises and frustrations while using Claude Code / Desktop, and still encounter issues, but it's the best in major LLM services.
It's funny cause, you know, fixing all those little nitty gritty things should be practically automatic with their own offerings... have your agent put in a lot of instrumentation... have it chase down bugs or dead-end user-journeys... have it go make the changes to fix it...
I've seen these tools work for this kinda stuff sometimes... you'd think nobody would be better at it than the creators of the tools.
I had something similar happen with skills today. A popup appeared saying, "hey, did you know ChatGPT has skills?" Clicking on it opened a new chat window, and after some thinking it said, "I tried to launch the built-in skills demo flow, but it isn’t available".
Following this process summarizes the blogpost for me. Perhaps the difference is I'm signed into my account so it can access external URLs or something of that nature?
This is infuriating. However, for those in this situation, know this: it works if the document or spreadsheet is in OneDrive. I just wish Copilot told you this instead of asking you to upload the doc.
This is such a stale take. In the past 3 years I’ve worked on multiple products with AI at their core, not as some add-on. Just because the corpo-land dullards[0] can’t execute on anything more complex than shoehorning a chatbot into their offerings doesn’t mean there aren’t plenty of people and companies doing far more interesting things.
[0] In this case, and with heavy irony, including OpenAI, although it sounds like most of this particular snafu is due to a bug.
>> This is such a stale take. In the past 3 years I’ve worked on multiple products with AI at their core, not as some add-on. Just because the corpo-land dullards[0] can’t execute on anything more complex than shoehorning a chatbot into their offerings doesn’t mean there aren’t plenty of people and companies doing far more interesting things.
I feel like this is just a disagreement of what "AI integration" means. You seem to agree that the trend they're describing exists, but it sounds like you're creating new products, not "integrating" it into existing ones.
Kinda reminds me of crypto. There are certainly very interesting things happening in the crypto space. But the most visible parts of the crypto universe are the stupid parts (buying PNGs for millions, for example)
But when I was in the crypto space in 2018, there was a lot of interesting things happening in the smart contract world (like proofs of concepts of issuing NFTs as a digital "deed" to a physical asset like a house).
I don't think any of those novel ideas went anywhere, but it was a fun time to be experimenting.
Yeah, like most startups. I'd argue that a majority of AI startups now will go nowhere as well. That's just how new technology goes. Lots of shiny objects, lots of hype, and maybe 1%, if that, goes on to become a foundation of society.
Jury is still out on if crypto will become a foundation for society (if anything, it would be foundational for something boring and invisible like banking). I wouldn't bet on a startup doing that, but that's the only viable thing I can foresee crypto being useful for. But it doesn't mean that other applications can't be interesting and useless!
I mean, to be fair, both things can be technically true. There can be lots of interesting things being done, even while most can be low-effort garbage.
But this is just Sturgeon's Law (ninety percent of everything is crap), not an actually insightful addition to the discussion, and I very much agree it's a stale take.
This is not only openai, but other models as well. Last week I added a summarise with AI block on a product blog page. I had seen it somewhere and felt like it’s a cool feature to have. Wrote a small shortcode in hugo for the block and added it with various models.
It’s like a hit and miss, sometimes claude says i cannot access your site which is not true.
I think you might have hit on the issue - just the wrong way around. I would assume they’re using LLMs for testing, and no humans or maybe just one overworked human, and that is the problem
As bad as Google Gemini telling me it couldn't search Google Flights or Google reverse image search for me. These companies really need to dogfood their own products first. Do they not realize how embarrassing it is when their flagship intelligence refuses to interop with their own services?
In Codex I was suggested to try Codex Spark for a limited time. So for my next session, I gave it a shot.
It is much, much faster. However on the task I gave it, it spun around in circles cycling through files and finally abandoned saying it ran out of tokens.
Major fail.
Different team "manages" the overall blog than the team who wrote that specific article. At one point, maybe it made sense, then something in the product changed, team that manages the blog never tested it again.
Or, people just stopped thinking about any sort of UX. These sort of mistakes are all over the place, on literally all web properties, some UX flows just ends with you at a page where nothing works sometimes. Everything is just perpetually "a bit broken" seemingly everywhere I go, not specific to OpenAI or even the internet.
> Or, people just stopped thinking about any sort of UX. These sort of mistakes are all over the place, on literally all web properties, some UX flows just ends with you at a page where nothing works sometimes.
It's almost like people are vibe coding their web apps or something.
If only there was some kind of way to automatically test user flows end to end. Perhaps testing could be evaluated periodically, or even ran for each code change.
They're having service issues - ChatGPT on the web is broken for a lot of people. The app is working in android - I'd assume that the rollout hit a hitch and the chatbox in the article would normally work.
Welcome to a big company where pretty much everyone has been working full steam for years, in order to take advantage of having a job at a company during a once-in-a-lifetime moment.
That's hilarious. Does OpenAI even know this doesn't work?