Its worth reading this follow-up LKML post by Andres Freund (who works on Postgr...

aftbit · 2026-04-05T02:31:42 1775356302

>If this somehow does end up being a reproducible performance issue (I still suspect something more complicated is going on), I don't see how userspace could be expected to mitigate a substantial perf regression in 7.0 that can only be mitigated by a default-off non-trivial functionality also introduced in 7.0.

cr125rider · 2026-04-05T13:11:00 1775394660

They said the magic words to get Linus to start flipping tables. Never break userspace. Unusably slow is broken

anal_reactor · 2026-04-05T08:01:30 1775376090

> Maybe we should, but requiring the use of a new low level facility that was introduced in the 7.0 kernel, to address a regression that exists only in 7.0+, seems not great.

Completely right. This sounds like a communication failure. Maybe Linux maintainers should pick a few applications that have "priority support" and problems with these applications are also problems with Linux itself. Breaking Postgres is a serious regression.

Reminds me of a situation where Fedora couldn't be updated if you had Wine installed and one side of the argument was "user applications are user problem" while the other was "it's Wine, like come on".

falcor84 · 2026-04-05T09:54:01 1775382841

I for one liked the old and simple WE DO NOT BREAK USERSPACE attitude.

https://linuxreviews.org/WE_DO_NOT_BREAK_USERSPACE

gcr · 2026-04-05T11:19:23 1775387963

Performance regressions are different from ABI incompatibilities. If the kernel refused to do any work that slowed down any userspace program, the pace would go a lot slower.

shadowgovt · 2026-04-05T12:29:52 1775392192

Or be a lot uglier. See: Microsoft replacing its own API surfaces with binary-compatible representations to workaround companies like Adobe adding perf improvements like bypassing the kernel-provided kernel object constructors because it saved them a few cycles to just hard-code the objects they wanted and memcpy them into existence.

cogman10 · 2026-04-05T15:22:40 1775402560

Microsoft's whole "Let's just ship all the dlls" attitude is a big part of the reason a windows install is like 300GB now.

Eventually you'd expect that something has to give.

account42 · 2026-04-07T11:50:48 1775562648

Slow pace is appropriate for a mature kernel that the entire world relies on.

reisse · 2026-04-05T10:57:22 1775386642

Not sure it is true anymore. I've encountered few userspace breaks in io_uring, at least.

jeffbee · 2026-04-05T01:27:47 1775352467

Funny how "use hugepages" is right there on the table and 99% of users ignore it.

bombcar · 2026-04-05T01:44:31 1775353471

I’m absolutely flabbergasted by the performance left on the table; even by myself - just yesterday I learned Gentoo’s emerge can use git and be a billion times faster.

globular-toast · 2026-04-05T08:55:27 1775379327

The time spent by emerge is utterly dwarfed by the time spent to build the packages, so who cares? Maybe it's different if installing a binary system but don't think most people are doing that.

bombcar · 2026-04-05T16:54:30 1775408070

If you can emerge in 2.86s user you can do it right before you emerge world, meaning it's all "done in one interaction" (even if the actual emerge takes an hour - you don't have to look at it.

Whereas if emerge is taking 5-10 minutes, you have to remember to come back to it, or script it.

LtdJorge · 2026-04-05T13:12:07 1775394727

When using multiple overlays, emerge-webrsync is ungodly slower compared to git.

account42 · 2026-04-07T11:52:27 1775562747

That's really not universally true. Building can be parallelized on modern multi-code CPUs (minus configure), emerge cannot and portage is really really slow.

justinclift · 2026-04-05T02:22:13 1775355733

Note that it's just not a single post, and there's additional further information in following the full thread. :)

adrian_b · 2026-04-05T08:01:42 1775376102

Yes, and in the following messages the conclusion was that the regression is mitigated when using huge pages.

devchix · 2026-04-05T23:09:16 1775430556

This seems bad, Splunk advises you to turn off THP due its small read/write characteristics: https://help.splunk.com/en/splunk-enterprise/release-notes-a...

Bad because as of Splunk 10.x, Splunk bundles postgres to integrate with their SOAR platform. Parenthetically, this practice of bundling stuff with Splunk is making vuln remediation a real pain. Splunk bundles its own python, mongod, and now postgres, instead of doing dependency checking. They're going to have to keep doing it as long as they release a .tgz and not just an RPM. The most recent postgres vuln is not fixed in Splunk.

menaerus · 2026-04-06T09:12:01 1775466721

Huge pages and THP are not the same thing.

jeltz · 2026-04-05T08:41:34 1775378494

Which you always should use anyway if you can.

justinclift · 2026-04-05T15:47:33 1775404053

Hmmm, it's not always that clear cut.

For example, Redis officially advised people to disable it due to a latency impact:

https://redis.io/docs/latest/operate/oss_and_stack/managemen...

Pretty sure Redis even outputs a warning to the logs upon startup when it detects hugepages are enabled.

Note that I'm not a Redis expert, I just remember this from when I ran it as a dependency for other software I was using.

jeltz · 2026-04-06T01:57:28 1775440648

1) That is about transparent huge pages which is a different thing and 2) it is always clear cut for PostgreSQL. If you can you should always use huge pages (the non-transparent kind).

fabian2k · 2026-04-05T17:09:22 1775408962

That's transparent huge pages, which are also not the setting recommended for PostgreSQL.

gmokki · 2026-04-06T16:06:59 1775491619

Java can work with transparent hugepages (in addition to preallocated hugepages), but you just use +AlwaysPreTouch to map them in during the startup so that at runtime there won't be any delays or jitter. Redis should add a similar option

TacticalCoder · 2026-04-05T02:18:18 1775355498

AIUI in that thread they're saying "0.51x" the perf on a 96-core arm64 machine and they're also saying they cannot reproduce it on a 96-core amd64 machine.

So it's not going to affect everybody both running PostgreSQL and upgrading to the latest kernel. Conditions seems to be: arm64, shitloads of core, kernel 7.0, current version of PostgreSQL.

That is not going to be 100% of the installed PostgreSQL DBs out there in the wild when 7.0 lands in a few weeks.

torginus · 2026-04-05T07:39:47 1775374787

It's a huge issue of ARM based systems, that hardly anyone uses or tests things on them (in production).

Yes, Macs going ARM has been a huge boon, but I've also seen crazy regressions on AWS Graviton (compared to how its supposed to perform), on .NET (and node as well), which frankly I have no expertise or time digging into.

Which was the main reason we ultimately cancelled our migration.

I'm sure this is the same reason why its important to AWS.

p_l · 2026-04-05T10:29:51 1775384991

Macs are actually part of pain point with ARM64 Linux, because the Linux arm set er tend to use 64 kB pages while Mac supports only 4 and 16, and it causes non trivial bugs at times (funnily enough, I first encountered that in a database company...)

zamalek · 2026-04-05T05:02:49 1775365369

It was later reproduced on the same machine without huge pages enabled. PICNIC?

anarazel · 2026-04-05T05:20:43 1775366443

Yes, I did reproduce it (to a much smaller degree, but it's just a 48c/96t machine). But it's an absurd workload in an insane configuration. Not using huge pages hurts way more than the regression due to PREEMPT_LAZY does.

With what we know so far, I expect that there are just about no real world workloads that aren't already completely falling over that will be affected.

pgaddict · 2026-04-05T12:54:19 1775393659

So why does it happen only with hugepages? Is the extra overhead / TLB pressure enough to trigger the issue in some way? Of is it because the regular pages get swapped out (which hugepages can't be)?

anarazel · 2026-04-05T13:00:05 1775394005

I don't fully know, but I suspect it's just that due to the minor faults and tlb misses there is terrible contention with the spinlock, regardless of the PREEMPT_LAZY when using 4k pages (that easily reproducible). Which is then made worse by preempting more with the lock held.

MBCook · 2026-04-05T03:28:00 1775359680

So perhaps this is a regression specifically in the arm64 code, or said differently maybe it’s a performance bug that has been there for a long time but covered up by the scheduler part that was removed?

adrian_b · 2026-04-05T08:04:20 1775376260

The following messages concluded that using huge pages mitigates the regression, while not using huge pages reproduces it.

db48x · 2026-04-05T04:55:00 1775364900

Could be either of those, or something else entirely. Or even measurement error.

jeltz · 2026-04-05T09:08:35 1775380115

Turns out the amd machine had huge tables enabled and after disabling those the regression was there on and too. So arm vs amd was a red herring.

Of course not a nice regression but you should not run PostgreSQL on large servers without huge pages enabled so thud regression will only hurt people who have a bad configuration. That said I think these bad configurations are common out there, especially in containerized environments where the one running PostgreSQL may not have the ability to enable huge pages.

whizzter · 2026-04-05T09:21:43 1775380903

Still that huge a regression that affects multiple platforms doesn't sound too neat, did they narrow down the root cause?

db48x · 2026-04-05T17:07:00 1775408820

That should be obvious to anyone who read the initial message. The regression was caused by a configuration change that changed the default from PREEMPT_NONE to PREEMT_LAZY. If you don’t know what those options do, use the source. (<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...>)

db48x · 2026-04-05T10:07:19 1775383639

Yes, I had a good laugh at that. It might technically be a regression, but not one that most people will see in practice. Pretty weird that someone at Amazon is bothering to run those tests without hugepages.

scottlamb · 2026-04-05T14:24:11 1775399051

I doubt they explicitly said "I'll run without huge pages, which is an important AWS configuration". They probably just forgot a step. And "someone at Amazon" describes a lot of people; multiply your mental probability tables accordingly.

db48x · 2026-04-05T16:53:28 1775408008

The number of people at Amazon is pretty much irrelevant; the org is going to ensure that someone is keeping an eye on kernel performance, but also that the work isn’t duplicative.

Surely they would be testing the configuration(s) that they use in production? They’re not running RDS without hugepages turned on, right?

scottlamb · 2026-04-06T04:25:06 1775449506

> The number of people at Amazon is pretty much irrelevant; the org is going to ensure that someone is keeping an eye on kernel performance, but also that the work isn’t duplicative.

I'd guess they have dozens of people across say a Linux kernel team, a Graviton hardware integration team, an EC2 team, and a Amazon RDS for PostgreSQL team who might at one point or another run a benchmark like this. They probably coordinate to an extent, but not so much that only one person would ever run this test. So yes it is duplicative. And they're likely intending to test the configurations they use in production, yes, but people just make mistakes.

db48x · 2026-04-06T07:49:45 1775461785

True; to err is human. But it is weird that they didn’t just fire up a standard RDS instance of one or more sizes and test those. After all, it’s already automated; two clicks on the website gets you a standard configuration and a couple more get you a 96c graviton cpu. I just wonder how the mistake happened.

menaerus · 2026-04-06T09:23:05 1775467385

You're assuming that they ran the workload with huge-pages disabled unintentionally.

db48x · 2026-04-06T12:06:22 1775477182

No… I’m assuming that they didn’t use the same automation that creates RDS clusters for actual customers. No doubt that automation configures the EC2 nodes sanely, with hugepages turned on. Leaving them turned off in this benchmark could have been accidental, but some accident of that kind was bound to happen as soon as the tests use any kind of setup that is different from what customers actually get.

menaerus · 2026-04-06T14:41:21 1775486481

You're again assuming that having huge pages turned on always brings the net benefit, which it doesn't. I have at least one example where it didn't bring any observable benefit while at the same time it incurred extra code complexity, server administration overhead, and necessitated extra documentation.

scottlamb · 2026-04-07T15:28:43 1775575723

FYI: huge pages isn't just a system-wide toggle, but a variety of things you can do:

* explicit huge pages

* transparent huge pages system-wide default

* app-specific or even mapping-specific toggles

* various memory allocator settings to raise its effectiveness

It would be really surprising to me to see a workload for which it's optimal to not use huge pages anywhere on the system.

menaerus · 2026-04-08T05:59:20 1775627960

It is a system-wide toggle in a sense that it requires you to first enable huge-pages, and then set them up, even if you just want to use explicit huge pages from within your code only (madvise, mmap). I wasn't talking about the THP.

When you deploy software all around the globe and not only on your servers that you fully control this becomes problematic. Even in the latter case it is frowned upon by admins/teams if you can't prove the benefit.

Yes, there are workloads where huge-pages do not bring any measurable benefit, I don't understand why would that be questionable? Even if they don't bring the runtime performance down, which they could, extra work and complexity they incur is in a sense not optimal when compared to the baseline of not using huge-pages.

scottlamb · 2026-04-12T23:37:29 1776037049

> Yes, there are workloads where huge-pages do not bring any measurable benefit

I really doubt it, except of course workloads where you just use a trivial amount of memory to begin with. In systems I've seen, anywhere from 5% to 15% of the CPU time is spent waiting for TLB misses. It's obvious then that huge pages can be hugely beneficial if properly used; by definition they hugely relieve TLB pressure.

You can of course end up in situations where transparent TLB scanning is worse than nothing, but that's exactly why I pointed out there's a variety of ways to use huge pages.

menaerus · 2026-04-15T07:58:47 1776239927

You don't seem to understand the idea that CPU spending time on TLB misses and at the same time seeing no measureable effects in E2E performance because much larger bottleneck is elsewhere can be both valid simultaneously. In database kernels with large and unpredictable workloads, high IO and memory footprint, this is certainly easy to prove.

scottlamb · 2026-04-16T01:16:55 1776302215

I think you're moving the goalpost here. There's a measurement improvement to CPU usage. You're over-provisioned on CPU and don't care. Fine.

master_crab · 2026-04-05T02:28:40 1775356120

For production Postgres, i would assume it’s close to almost no effect?

If someone is running postgres in a serious backend environment, i doubt they are using Ubuntu or even touching 7.x for months (or years). It’ll be some flavor of Debian or Red Hat still on 6.x (maybe even 5?). Those same users won’t touch 7.x until there has been months of testing by distros.

crcastle · 2026-04-05T02:45:33 1775357133

Ubuntu is used in many serious backend environments. Heroku runs tens of thousands (if not more) instances of Ubuntu on its fleet. Or at least it did through the teens and early 2020s.

https://devcenter.heroku.com/articles/stack

nine_k · 2026-04-05T03:19:53 1775359193

Do they upgrade to the new LTS the day it is released?

sakjur · 2026-04-05T09:06:04 1775379964

Ubuntu's upgrade tools wait until the .1 release for LTSes, so your typical installation would wait at least half a year.

crcastle · 2026-04-05T03:29:18 1775359758

Not historically.

rvnx · 2026-04-05T04:15:56 1775362556

and they are right, this is because a lot of junior sysadmins believe that newer = better.

But the reality:

  a) may get irreversible upgrades (e.g. new underlying database structure) 
  b) permanent worse performance / regression (e.g. iOS 26)
  c) added instability
  d) new security issues (litellm)
  e) time wasted migrating / debugging
  f) may need rewrite of consumers / users of APIs / sys calls
  g) potential new IP or licensing issues

etc.

A couple of the few reasons to upgrade something is:

  a) new features provide genuine comfort or performance upgrade (or... some revert)
  b) there is an extremely critical security issue
  c) you do not care about stability because reverting is uneventful and production impact is nil (e.g. Claude Code)

but 99% of the time, if ain't broke, don't fix it.

https://en.wikipedia.org/wiki/2024_CrowdStrike-related_IT_ou...

miki123211 · 2026-04-05T06:32:33 1775370753

On the other hand, I suspect LLMs will dramatically decrease the window between a vulnerability being discovered and that vulnerability being exploited in the wild, especially for open-source projects.

Even if the vulnerability itself is discovered through other means than by an LLM, it's trivial to ask a SOTA model to "monitor all new commits to project X and decide which ones are likely patching an exploitable vulnerability, and then write a PoC." That's a lot easier than finding the vulnerable itself.

I won't be surprised if update windows (for open source networked services) shrink to ~10 minutes within a year or two. It's going to be a brutal world.

mr_toad · 2026-04-05T10:23:23 1775384603

Too often I see IT departments use this as an excuse to only upgrade when they absolutely have to, usually with little to no testing in advance, which leaves them constantly being back-footed by incompatibility issues.

The idea of advanced testing of new versions of software (that they’ll be forced to use eventually) never seems to occur, or they spend so much time fighting fires they never get around to it.

gjvc · 2026-04-05T08:59:58 1775379598

all fair points, on the other hand, as a general rule, isn't it important to stay on currently-supported versions of pieces of software that you run?

ymmv, but in my experience projects like postgresql which have been reliable, tend to continue to be so.

rixed · 2026-04-05T06:45:07 1775371507

There is serious as in "corporate-serious" and serious as in "engineer-serious".

zbentley · 2026-04-05T16:46:27 1775407587

I’ve seen more 5k+-core fleets running Ubuntu in prod than not, in my career. Industries include healthcare, US government, US government contractor, marketing, finance.

rixed · 2026-04-05T21:08:33 1775423313

In other words, those industries that used to run windows before ?

zbentley · 2026-04-06T15:00:34 1775487634

I'd say about 2/3 of the places I've worked started on Linux without a Windows precedent other than workstations. I can't speak for the experience of the founding staff, though; they might have preferred Ubuntu due to Windows experience--if so, I'm curious as to why/what those have to do with each other.

That said, Ubuntu in large production fleets isn't too bad. Sure, other distros are better, but Ubuntu's perfectly serviceable in that role. It needs talented SRE staff making sure automation, release engineering, monitoring, and de/provisioning behave well, but that's true of any you-run-the-underlying-VM large cloud deployment.

pmontra · 2026-04-05T05:42:42 1775367762

A customer of mine is running on Ubuntu 22.04 and the plan is to upgrade to 26.04 in Q1 2027. We'll have to add performance regression to the plan.

wongogue · 2026-04-05T08:14:59 1775376899

Are you running ARM servers?

fxtentacle · 2026-04-05T13:21:13 1775395273

.. which confirms all of my stereotypes. Looks like the AWS engineer who reported it used a m8g.24xlarge instance with 384 GB of RAM, but somehow didn't know or care to enable huge pages. And once enabling them, the performance regression disappears.

bushbaba · 2026-04-05T14:01:48 1775397708

Because such settings aren’t obvious to those not familiar with them. LLMs should make discoverability easier though

perrygeo · 2026-04-05T14:59:44 1775401184

Honest question: what's the value of running the benchmark and reporting a performance regression if the author is not familiar with basic operation of the software? I'd argue that not understanding those settings disqualifies you from making statements about it.

cogman10 · 2026-04-05T15:28:46 1775402926

The performance was reduced without a settings change. That is still a regression even if huge pages mitigates the problem.

I'd be curious to know if there's still a regression with hugepages turned on in older kernels.

If you are benchmarking something and the only changed variable between benchmarks is the kernel, that is useful information. Even if your environment isn't correctly setup.

justinclift · 2026-04-05T15:51:05 1775404265

Some software clearly wants hugepages disabled, so it's not always the slam dunk people seem to be making it out to be.

ie Redis:

https://redis.io/docs/latest/operate/oss_and_stack/managemen...

perrygeo · 2026-04-10T22:47:49 1775861269

Yet we're talking about postgres, specifically. The whole point is that benchmarks about postgres better know how to configure postgres or their conclusions be irrelevant at best. What does redis have to do with this discussion?