Hacker Newsnew | past | comments | ask | show | jobs | submit | ineedasername's commentslogin

This is going to be task-dependent, as well as limited by your (the implementer's) ability and comfort with structuring the task in solid multi-shot prompts that cover a large distribution of expected inputs, which will also help increase the ability for the model to successfully handle less common or edge case inputs-- the ones the would most typically require human-level reasoning. It can be useful to supplement this with a "tool" use for RAG lookup against a more extensive store of examples, or any time the full reference material isn't practical to dump into context. This requires thoughtful chunking.

It also requires testing. Don't think of it as a magic machine that should be able to do anything, think of it like a new employee smart enough and with enough background knowledge to do the task, if given proper job documentation. Test whether few-shot or many shot prompting works better: there's growing information about use cases where one or the other confers an advantage but so much of this is task dependent.

Consider your tolerance for errors and plan some escalation method: Hallucinations occur in part because models "have to" give an answer. Make sure that any critical cases where an error would be problematic have some way for the model to bail out with "i don't know" for human review. The first layer of escalation doesn't even have to be a human, it could be a separate model, eg Opus instead of Sonnet, or the same model but with a different setup prompt explicitly designed for handling certain cases without cluttering up context of the first one. Splitting things in this way, if there's a logical break point, is also a great way to save on token cost: If you can send only 10k of tokens in a system prompt instead of 50k and just choose which of 5 10k prompts to use for different cases then you save 80% of upstream token $$.

Consider running the model deterministic: 0 temp, same seed. It makes any errors you encounter easier to trace and debug.

Something to consider with respect to cost though: Many tasks that a SoTA can do with very little or no scaffolding can be done with these cheaper models and may not take much more scaffolding. If a SoTA giving reliable responses with zero shot prompting there's a decent chance you can save a ton of money with a flash model if you provide it one or few shot prompts. Open weight models even more so.

My anecdotal experience is that open models like Google's gemma and OpenAI's gpt-oss have behaviors more similar to their paid counterparts than other open models, making them reasonable candidates to try if you're getting good results from the paid models but they're perhaps overkill for the task.


Why does an agent tasked with email summarizing have access to anything else? There’s plenty of difference between an agent and a background service or daemon but it’s at minimum got to be given the same restrictions in scope they would be, or an intern using your system for the same purpose. Developers need to bring the same ZTA mindset to agent permissions they would to building the other services and infrastructure they rely on.

“Move fast and break things.” It’s funny you even need to ask on hacker news of all places. ;)

Sure, it may be post-hoc chest thumping theatrics, but also he was a tween during the fall of the Soviet Union and the end days of the Cold War along with having close family of Jewish descent since his mother was Jewish (Irish Catholic father). So there could be some baked-in, rather than acquired later, antipathy- understandably- towards communism. Especially the Soviet variety since its still warm corpse was around in the '03/'04 era when Palantir was founded. At that time, Hanssen had just recently been arrested, poking those coals again. Heck we're still living with the scars of all that and its fallout- if Lonsdale meant specifically former members/supporters of the CPSU and its shambling corpse then the statement is a little bit less over the top.


I mean, I’m even more skeptical that Palantir or its customers were concerned about killing former members or supporters of the Soviet Union prior to 2009. The focus was probably the War on Terror and related crimes.

Alex Karp was calling himself a self-described socialist as recently as 2018.


If it's about the War on Terror and related crimes, why have they also (since their inception) gathered such vast troves of data and profiles on U.S. citizens?


War on Terror means watching all of the brown people, many of whom are citizens


I am talking about at the founding. Their mission has obviously shifted from then.


Hasn't Palantir been gathering and storing data on U.S. citizens since the earliest days of the company?


I know that in some cases, apparent bloat like this is related to needing to support so many potential devices and versions of the underlying OS. Google has to support, on iOS, roughly 6 years of devices and their variations + OS variations on them. Each of these may require their own libraries compiled against it, for optimal performance or because it simply is less practical to engineer non-breaking updates against new SDK and HW versions in the same codebase without introducing complexity.

Apple, on the other hand, doesn't have to do this. They can integrate at lower levels and even with all else being equal can develop updates with perfect insight on the ecosystem and foresight on things to come.

Somewhat supporting this possible explanation is that, similar to Apple on iOS, Google's apps on android are significantly smaller.


They’re making very direct, no caveat statements though, using words like “now”, “immediately”, and “in production”. So if it’s a bluff, they haven’t left themselves much wiggle room with weasel words. I don’t see any wiggle room. And if that’s that case then a failure to deliver would be have to be more “wow, yeah, turns out forgot a decimal point and messed up some imperial-metric confusion”. Or “my teenager kid found the blog unlocked and thought it would be awesome no-press-is-bad-press PR stunt. Sorry”


This, and apparently they don’t have a track record of producing bluffware before. They already have some interesting know-how heavy products, and previously they have fully delivered on them.


It pretty much did join the work force. Listen to the fed chair, listen to related analysis, the unexpected overperformance of GDP isn’t directly attributed AI but it is very much in the “how did that happen?” conversation. And there’s plenty of softer, more anecdotal evidence in addition to that to respond to the headline with “It did.” The fact that it has been gradual and subtle as the very first agent tools reach production readiness, gain awareness in the public, start being used…? That really doesn’t seem at all unexpected as the path than “joining” would follow.


> the unexpected overperformance of GDP isn’t directly attributed AI but it is very much in the “how did that happen?” conversation.

We spent an amount of money on data centers that was so large that it managed to overcome a self-imposed kick in the nuts from tariffs and then some. The amount of money involved rivals the creation of the railroad system in the United States. Of course GDP overperformed in that scenario.

Where did AI tool use show up in the productivity numbers?


Productivity increase is what showed in the numbers. AI is the partial attribution chair Powell gave for the reason. Quotes from December meeting:

"Reporter: ...do you believe that we’re experiencing a positive productivity shock, whether from AI or policy factors or whatever?"

"Powell: So, yeah, I mean, I never thought I would see a time when we had, you know, five, six years of 2 percent productivity growth. This is higher. You know, this is definitely higher. And it was—before, it could be attributed to AI. I think you—I also think if you look at what AI can do and if you use it in your personal life, as I imagine many of us have, you can see the prospects for productivity. I think it makes people who use it more productive. It may make other people have to find other jobs, though. So it could have productivity implications"

And:

"Reporter: If I could just follow up on the SEP. You have a whole lot of—big increase in the growth numbers, but not a big decline in the unemployment numbers. And is that an AI factor in there?"

"Powell: So it is—the implication is obviously higher productivity. And some of that may be AI."

He also hedges in places, hesitant to say "Yes that's the reason". I'm not sure anything in the data sets they use could directly capture it as the reason so that's too high a bar for evidence- to require some line item in the reports with a direct attribution. He could be wrong, it might not be AI, but I don't have any reason to thing his sense of things is wrong either.


My understanding was that the growth came mainly from things like building data centers and buying chips. Boring old fashioned stuff.


Those were known factors. I'm referencing Powell, in the late December meeting. There wasn't much substantive change in terms of knowledge about how much building was going on since the prior September GDP release that would say "yes, after September we learned X, so that's why forecasts were lower than actual". In his address, and in his followup with questions from the press, Powell specifically talks about data centers and AI-driven productivity separately.

These aren't context free data points you have to interpret, this is synthesized analysis being delivered in a report by the Fed Chair, giving his & the reserve's own interpretation. He could be wrong, but it is clear their belief is that AI is a likely factor, while also there is not certainty on that interpretation. They've up'ed their estimates for April though, and this advanced estimate from December is about to be followed up with the revised, final numbers on Jan 23'rd, so we'll find out a little more then, and a lot more in April.


This is why I’m not worried about an imminent AI bubble burst. The data centers will be built, the GPUs have already been ordered, etc. What I am worried about is what happens when in 2-3 years time the AI companies need to find paying customers to use those data centers. Then it might be time to rebalance into gold or something.


The energy isn't available and that is going to take much longer to build.


Grid energy is lacking for now, which is why the new builds have their own power supplies. It’s a good time to be in the solar/battery/turbine industries.


>Listen to the fed chair, listen to related analysis, the unexpected overperformance of GDP isn’t directly attributed AI but it is very much in the “how did that happen?” conversation

I would very much like to read this if you have a link


Sure, here's the transcript, AI is a large theme throughout but page 7 and 24 have specifically relevant remarks about the better than expected number, productivity's increases in relation to both this and AI, data centers, etc., but really the whole thing is peppered with details related to AI so it's worth reading in depth if you want the Fed's synthesized pov, and the press at these meetings ask incisive questions.

https://www.federalreserve.gov/mediacenter/files/FOMCprescon...


Thanks for the response!


They're just bluffing. It's bullshitting they get away with everywhere else so they think it's acceptable here.


I won't take offense since I doubt you meant me in particular. So, while it's tedious to have to go back over web history from weeks ago to cite my sources like its high school every time I reference a specific, concrete source that people here can followup for themselves, I did it because it's worth doing, given that your sentiment is also not completely wrong: lots and lots of BS being thrown around all over the place on AI, so here's the direct source, chair Powell's report from December, and he's actually stating some things stronger than I remembered:

AI is a large theme throughout but page 7 and 24 have specifically relevant remarks about the better than expected number, productivity's increases in relation to both this and AI, data centers, etc.

https://www.federalreserve.gov/mediacenter/files/FOMCprescon...


> the unexpected overperformance of GDP isn’t directly attributed AI but it is very much in the “how did that happen?” conversation.

it's builing datacenters and buying servers and GPUs. It isn't directly attributed to AI because it isn't caused by use of AI, but blowing the AI bubble


I don't know, I figure with a billion dollars Apple should be able to do much better at being awful than this. More proactive rather than accidental awfulness. Something that isn't just bad but capital intensive at the same time. Anyone can build a bad UX on a few menus, or a whole system incrementally over time. But to really lean in? Maybe commission famous artists with eye watering fees for each icon, truly over the top marketing campaigns, really get the cash-fired furnaces going. Really just go full-potlatch on things.


It also immediately eats 5% of the raw compute on my RTX 4080 Super, which is more than a dozen tabs in chrome and 3 active youtube videos running, run in each of Chrome, Firefox, Edge-- all combined, which was 3% before loading this page (which is 5% on its own) up to 9% total.

That explains my other comment, which speculated the snow as the cause for my iPhone instantly overheating, followed by screen-dimming throttling.

Also: this is not a plea to stop putting snow/etc on pages. I miss the days of such things in earlier internet. I'd trade back janky plugins and Flash player crashes for the humanizing & personalized touch many sites had back then.


I was starting to wonder why my iPhone got crazy hot. I’m using reader mode and it appears to continue running the web page and animation in the background… crazy.


Something on this website made my iPhone (16 pro) immediately begin warming up, become hot to the touch, and invoke screen-dimming throttling. I verified by exiting it and reentering, during which the pattern abated and then recurred.

The snow? Something else?


You can turn the snow off in the toolbar.


Pretty sure Wayne Tech had the prototype of this sonar-vision translation layer of software all wrapped up way back in 2008, so it’s just a matter of productizing that, and since Pickle seems to deal in fiction already there’s good product synergy.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: