Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
systemd 100% cpu hang? – Proxmox Support Forum (proxmox.com)
180 points by lionkor on March 25, 2023 | hide | past | favorite | 141 comments


We just spent multiple hours in a team of multiple people debugging this issue. Systemd doesnt work. Checked all disks, fstab, recovery system, etc. and there was nothing clearly wrong.

Turns out something in proxmox (maybe a service?) doesn't understand daylight savings time (Dublin).

The only way to know was to google "proxmox systemd 100% cpu" and find above post.

Christ.

Edit: The fix, of course, is `ln -sf /usr/share/zoneinfo/Etc /etc/localtime`

Edit 2: Looks like that just unsets the timezone. It's too late for me to find the real fix, but you basically want to set the timezone to something else (like UTC).


Sorry this happened to you, these kind of problems tend to be vague and difficult to patch up.

I've advocated for over a decade as a systems engineer not to set system time to anything other than UTC. Not all languages or applications compensate for DST very well, and depending on your applications relationship with time it's a non-trivial problem to solve. What is far more trivial to do is to ship logs to a log server and let the UI of the log server translate time for the viewer of the logs. Almost every time I have argued with executives, managers, and SWEs without SE experience and lost something unexplainable and detrimental happens around DST.

Edit:

For the uninitiated I'll try to paint a clearer picture. All system time in Linux is tracked in seconds past Jan 1, 1970. The problem occurs in the translation between UTC and local time, which is handled by a library/package/module in your application. If your application is time sensitive and not looking for a literal time traveling event then things can get real weird, real fast. If there is a bug in that library/package/module things will also get real weird, real fast.


> All system time in Linux is tracked in seconds past Jan 1, 1970.

> If your application is time sensitive and not looking for a literal time traveling event...

One of my least-favourite facts is that these statements are misleading. Since Linux uses UTC and not TAI, it no longer tracks an absolute measurement of seconds since the unix epoch as it takes “leap seconds” in to account. As well, because of leap-seconds you can absolutely have time-travelling events.

https://developers.redhat.com/blog/2015/06/01/five-different...


> Since Linux uses UTC and not TAI

Unix, not linux. I don’t even think they can track TAI, but I definitely have never seen one which does, they all define unix time as 86400*days since epoch + seconds since midnight.


On Linux the clock_gettime syscall can absolutely return TAI with CLOCK_TAI


Apps shouldnt use anything but UCT for their timekeeping, there is no reason to do anything in the timezone's time.

Setting the timezone should have no such effect.


You and I can agree on that in principle but there's a lot of reasons an application may read from system time. If the application reads the wrong system time source, then they get unexpected time travel. Even things like the RTC can experience time travel under certain conditions. My point was, using anything other than UTC is just unnecessary complexity and often enough ends in complicated problems.


If your app interacts with humans or machines in other time zones, tracking in a different time zone can be necessary. Though often people don’t want to specify time zones with their complexities but instead specify times at locations. But I agree, generally your service doesn’t need time zones. The exception is necessary though when time zones get ambiguous or change (e.g. daylight savings going away)


Are you unsetting your timezone, I hesitate to sound authoritative about because I really don't know, and have not found the correct man page yet. but usually you symlink /etc/localtime to the tzfile you want as your local time. what happens when you symlink it to a directory? in your case /usr/share/zoneinfo/Etc

My guess it that it effectively unset it, in your case that probably does nothing(system time is utc and dublin is in utc) what was it set to before?

edit: all I can find is localtime(5) and it only says that /etc/localtime is the local time zone file.


Yes, it basically makes it useless, as far as I can tell. I don't know what the proper fix is, because timedatectl was broken.


This bug seems familiar. But I'm a little surprised the fix for this wasn't already in the version of systemd chosen for the release.

March 2022 https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/19668...

March 2021 (Fedora) https://bugzilla.redhat.com/show_bug.cgi?id=1941335

systemd upstream fix https://github.com/systemd/systemd/pull/19075


servers should always be UTC


We share this planet with people that don't use UTC. That's just the way it is. If some computers are set to time zones other than UTC, then non-UTC time zones must be correctly handled, even "servers".

There are terminal servers, for example, which are basically workstations. There are servers configured by other people for local time, and we may need to configure our own servers into its network.

There is server software out there that uses the host timezone. Heck, huge vendors do this regularly. This happens in log files, in schedulers, in notification emails, and so on. People want to see their log timestamps in their local time, they want to schedule in local time, and they want their emails to show events in their local time.

Yes, optimally, all times should always stored in UTC or UTC+offset formats, then displayed in local time only "at the end client". Generally, this is actually what happens. Windows, MacOS, UNIX, Linux, and Android all store the system clock as UTC and all system APIs use this.

So in some sense, we are all already using UTC on our servers. The time zone setting doesn't change the system clock to something other than UTC, it just tells user-mode software how the human users expect to see time formatted for display.

The problem is that server software especially is terrible at this. Just... bad. The worst offenders are text-based log files, which almost always get a formatted timestamp with an unspecified time zone. Could be UTC, could be the server's local time, could be Mars time, who knows?

In summary: don't blame system administrators for correctly configuring server settings. Blame the lazy software developers who can't be bothered to use ISO timestamps that specify time zones unambiguously, irrespective of the time zone setting.

Next thing you'll tell me to stop using space characters in file names...


> There are terminal servers, for example, which are basically workstations.

For a terminal server it is not uncommon when different users connect from places with different time zones. It would make sense to have TZ in a user profile but keep UTC as a server system locale.


Windows doesn't by default. Annoying when dual booting


Add the below as a .reg file and run it, will tell windows to use hwclock as UTC. (adding in case anyone else gets annoyed with the clock mixup situation on dual-booting)

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\TimeZoneInformation] "RealTimeIsUniversal"=qword:00000001


bizarrely long response!


This thread is a good proxy for why commercial computing is so toxic; people can't even agree on the time of day.

We share this planet with people that don't use UTC. That's just the way it is. If some computers are set to time zones other than UTC, then non-UTC time zones must be correctly handled, even "servers".

This is just patronizing hand-wringing; well done you.

For global systems, a single time convention is the basis of co-ordination. So many people on this thread have seem to have a "pets not cattle" viewpoint, it is quite concerning.


They are UTC - the actual time on the system is measured in seconds since midnight Jan 1 1970 (https://en.wikipedia.org/wiki/Unix_time). This time is then converted to local time per the OS timezone setting. A lot of apps (e.g. databases) will also have their own time zone conversions based on context (e.g. per database connection settings).

As far as practical use of UTC vs local time: I am a developer that happens to work with customers directly. It is difficult enough to get customer to provide timestamps for issues they report in local time, I cannot imagine they would be able to convert to UTC.

Love the local time!


> This time is then converted to local time per the OS timezone setting.

What exactly do you mean by this? As in, what part of your software stack is doing this. IIRC if you set your OS to be in UTC then anything at the OS level will only speak UTC. Databases and your own software will do what they are told to do but systemd etc. will all speak UTC only. Even running date in the shell should give you a UTC timestamp.


Your system time is totally different from what userspace cares about. Go ahead, run "TZ=UTC date" (or a different timezone if you're using UTC already).

I'm not even sure what you mean by the OS speaking a certain timezone. Basic "time()" will return seconds from the epoch UTC regardless of your system timezone. Things that care about timezones will use relevant conversions.


Of course userspace may convert from what time() returns by spec to any given timezone. But it only does so if you instruct it like you did with TZ= or if it is programmed to do so somehow. I have not delved into systemd’s code but I can’t imagine it arbitrarily converts anything away from UTC time if /etc/localtime is set to UTC.


TZ doesn't do anything magic. It just tells libc you want your local timezone to be something different than the default. It affects output formatting in a way that the app mostly doesn't need to deal with if it only works with one timezone. I'm not even sure what claim you're making now, but the original msg is just wrong. If an app uses libc, it uses TZ setting - regardless if it's systemd or date or something else.

> but I can’t imagine it arbitrarily converts anything away from UTC time if /etc/localtime is set to UTC.

It really depends on which part of systemd you mean. There's lots of code in there and quite a few bits deal with timezones explicitly. (Since it sets it for users to begin with)

But for basic display it sure uses TZ and localtime as set by the user - and it can be different than the system default. https://github.com/systemd/systemd/blob/cccc14c5a88f000931c6...


You and I are saying the same thing in different ways, except the part about systemd. I didn’t realize it did anything with timezones because I keep thinking of it as just modern initd but I guess it goes beyond that. I didn’t realize that if you remove all mention of local timezones from /etc/ that a user could still break systemd by setting their own TZ.


I’ve seen junior sysadmins and developers repeating this for years also. Don’t know where it comes from.

Apart from this issue, I’ve never once see an error due to the tz presentation settings on a server, and I have spent 20 years running many thousands of servers at a time. shrug.

Let them have to adjust in grafana/kibana if it helps them sleep I guess..


The OS tracks time in epoch but your application doesn't. Bugs like this happen because a library that does time conversion for your application has a bug or your application is time sensitive but doesn't know how to handle time travel.


Beat me by five minutes. 100% agree.


My challenge has been getting the user to mention their timezone when providing a local time. I can usually guess within a couple of hours, and then find logs that confirm it.


Local time is a benefit for users. As much as UTC might be beneficial for sysadmins, and technical teams.

The only question is, is the implementation user focused or developer prioritized


It is entirely reasonable for a personal server to be set to the timezone it is physically located in.


I'm no expert, but if your software can't deal with timezones, be worried. So yes, absolutely, it should be perfectly okay to use your local timezone.


Timezones are disaster. Once upon a time it was difficult and time consuming to make changes to them. Then the world began developing more and "accurate" timekeeping devices. Atomic clocks were developed and began to be miniaturized. Then someone had the brilliant that incredibly accurate clocks could be used to determine location. GPS was born.

A few decades pass and we now all have highly accurate "clocks" that we carry around in our pockets that receive GPS signals and sync their idea of time to what the satellite thinks. Sounds great, right?

Except, humans don't actually use atomic clocks as the basis for their timekeeping. We look outside and expect the time on the clock to mostly match with the position of the sun in the sky.

Once leap seconds became entangled in them, the whole thing basically a toxic pile of radioactive waste. Timezones are now being updated multiple times a year since the internet means we can update them at will. The whole thing is now a giant mess of toxic radioactive waste that can never be correct for any extended period of time.

We, the software industry, did this to ourselves. We built this, we have to live with it, and it's never going away.


computers are good at keeping track of things that humans aren't. tracking timezones is in your computer's wheelhouse, don't be scared of it.


I'm not scared of it. It's unnecessary excessive complexity that we have foisted onto ourselves. It has caused countless bugs and will most likely cause countless more before I die. Show me the software that copes with timezone changes at runtime.


Every phone when people travel?

It really isn’t that hard. What makes you pull your hair out is dealing with 3rd party code that handles it incorrectly. Do everything internal in UTC or always attach a timezone to dates, your choice.

When you cross the boundary either ingesting naive dates or displaying them read the system time zone again and convert.

I mean it’s kinda annoying but “naive” times is the time equivalent of storing arrays without their length. I can’t fathom why we even allow it.


humans having to convert times from UTC to local timezone will also result in countless hours wasted in troubleshooting.


That's what happens when you let a sect of Sun-worshipping astrologers redefine the minute to have up to 61 seconds...


Generally I agree (mine are in America/New_York), but I did find [0] and fix [1] a weird bug in Debian's cron that only surfaced when I moved. I was in America/Chicago, moved to America/New_York, and then changed TZ on the servers once they were set up. Afterwards, I noticed that a daily cronjob that was supposed to fire at 0845 was instead happening at 0945.

[0]: https://gist.github.com/stephanGarland/b7cdd963e0ac53ea42f8e...

[1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1019716


Worked at a very big social media company and we set them all to Pacific since that’s where we were located. The servers themselves were physically located in different time zones. There’s no hard and fast rule what TZ’s servers need to be in IMHO.


It shouldn't actually matter. Hardware clocks should usually be in UTC, but the timezone should be "whatever makes sense."

Webapps should be timezone aware via the browser or user preferences, internal timezone should be whatever makes sense for logging et al.


Or the timezone the sole user is primarily located in.


this position is getting a lot of flak, but I agree since we don't live in magic world where all software can handle time zones well.

I strongly prefer everything in UTC and changing time zone is an issue for only the vary top layer of the display/user interface/frontend whatever if absolutely needed - generally libraries designed for displaying have much more robust timezone handling than just random daemons and backend software.


> servers should always be UTC

All of the servers I run are located in Toronto, Canada: why do I care about UTC? What would having all of them being UTC get me?

A few jobs ago I took care of servers that were in: California, Ontario, Quebec, Switzerland, Singapore. UTC made sense there.


One upside is that there is no, and there cannot be any, daylight savings time jumps, or any other offset changes, now or in the future.

Local time is governed by local authorities; UTC, much less so.


Why is that an upside?

I mean, clocks go out of sync, every few (days? Weeks?) I assume the ntp client has adjusted actual local time on your server; not just the time zone your server translates it to when you run “date” we are talking about here…

If it can handle that, it can handle a tz with dls time.


Clocks going backwards for a whole hour is massively more trouble than clocks compensating a few seconds of discrepancy.

Usually being a few seconds ahead is compensated by running the clock slower for some time. Showing the clock to compensate a whole hour breaks many kinds of assumptions. But jumping the clock actually back an hour breaks even more assumptions that look entirely sane.

Better off without it.


> why do I care about UTC?

Eliminates all the software bugs due to non-monotonically increasing time.

If you get bought by a company in the EU/UK or vice versa then it ultimately makes it easier to deal with time as you become part of a global company.


Clarity as to your server time no matter what time zone nonsense a government devises.


Using UTC when all of my servers are in one timezones actually decreases clarity, as I live in America/Toronto with my daily existence, and which is what my laptop and (mechanical) wristwatch† are set to. If I less(1) the logs I now have to do mental math as to "when" things happen because the hour number it says in the logs does not match my human reality.

If I look at something now (at Mar 25 21:09) the logs would say "Mar 26 01:09" which is not "now" in my mind.

† Though I've been toying with the idea of getting a GMT watch "for fun". However I'll probably go with a chronograph—more useful feature on a day-to-day basis. Not as many choices if I want to get both complications though.


I'm curious what your logs look like for 1am local time on Nov 6, 2022? I believe you set your clock back on that day, so 1am-2am would have happened twice.

Maybe your logs mention the timezone / UTC offset? Otherwise it would seem possible to get confused.


I have no idea because (a) I didn't experience any problems that needed debugging at that moment so I've never looked, and (b) I wasn't awake looking at the logs for fun at the time.

But I do sometimes have to look at the logs on some random Tuesday, mid-morning, when someone says they can't log into our SSH bastion hosts (but it worked yesterday), I can see there's failed password attempts at 09:59, 10:03, and 10:04 (versus 13:59, 14:03, and 14:04 (i.e., UTC+4), which I would then have to do mental math to bring back to local time), and so it turns out they entered their password wrong a few of times so fail2ban blocked their IP

Wallclock time is a human construct created for human convenience. Having UTC when all my servers are in America/Toronto is not convenient for me.


  Having UTC when all my servers are in America/Toronto is not convenient for me.
Until you have to debug something that straddles DST changes.


> Until you have to debug something that straddles DST changes.

I will perhaps consider doing it at that point, but in ~20 years of being a sysadmin I have never needed to, so… ¯\_(ツ)_/¯

Therefore I optimize for the common case: looking at logs when events do not straddle DST changes.


Nonesense. You can still print the offset and even if you don't, you can always infer which hour you are in from context.


Or around any year that the daylight saving rules change.


We don’t watch logs for fun. If issues get reported in local time, using UTC logs simply shifts the burden on an issue reporter (who may be clueless in this regard) and/or an investigator who now has to convert dates back and forth in their communication and probably while viewing, because let’s be honest barely anyone will scurpulously convert all dates into UTC with an additional tool, when it’s easy enough to subtract time in your head.

We could use both dates in logs, but that would eat an unreasonable amount of columns and confuse naive greps. We could also use non-standard time formats, e.g. <localtime8601>[<dst switch warning>] <unixtime>, but standard log handling tools aren’t built for it.



Maybe you aren't on top of keeping your timezone database up to date, maybe you have to deal with dates spanning 2005, maybe Ontario will finally abolish DST. Using UTC simplifies time handling with very little downside.


I lived through the changes of 2005 as a sysadmin:

* https://en.wikipedia.org/wiki/Energy_Policy_Act_of_2005#Chan...

From a Unix perspective (Solaris, Linux, IRIX), it was fine. And Ontario has provisionally passed getting rid DST changes (sadly in favour of going to year-round DST), and IMHO it will be fine again if it is ever finalized:

* https://toronto.ctvnews.ca/ontario-passes-legislation-to-mak...


Sure. And we all survived Y2K. Point being that setting your servers to UTC is one less thing to worry about. I'd flip the argument on its head and posit that there's no compelling reason to use anything other than UTC.

So long as you keep your TZ database up-to-date local time is likely to be not-so-problematic. Once you have that one server in a closet that didn't get updated, all bets are off.


> Point being that setting your servers to UTC is one less thing to worry about.

It's only one less thing to worry about if you worry about it in the first place. I do not.

> I'd flip the argument on its head and posit that there's no compelling reason to use anything other than UTC.

There is a compelling, or at least useful, reason for me; from another reply I did:

Using UTC when all of my servers are in one timezones actually decreases clarity, as I live in America/Toronto with my daily existence, and which is what my laptop and (mechanical) wristwatch are set to. If I less(1) the logs I now have to do mental math as to "when" things happen because the hour number it says in the logs does not match my human reality.

If I look at something now (at Mar 25 21:09) the logs would say "Mar 26 01:09" which is not "now" in my mind.


Ontario already did, on condition Quebec and New York do too [0]

[0]: https://www.ola.org/en/legislative-business/bills/parliament...


pid 1 shouldn't choke because of a timezone issue.


Servers (and every other Unix/Linux box) ALWAYS use UTC. The time zone is only specified for the purpose of DISPLAYING the time in some locale where the system resides. The only things that should care about the time zone are those that generate output for human consumption (such as log entries), and this is usually automated.

For a system to not boot in a particular time zone is a serious bug, but you should not blame the problem on the time zone, you should blame the problem on whatever is broken in some critical piece of code that probably should not be using local time.


The server's clock should indeed be UTC, with the timezone translation done according to the user profile configuration.


upvoting because i fiercely agree with you. servers should be UTC.


Assuming servers should be UTC because timezones are hard, you could say servers should be airgapped because security is hard. Or that you should only type ASCII because Unicode is hard.

I mean, yes, you can do that, but don't pretend that a program not handling unicode is the user's fault for typing unicode (s/unicode/timezones, s/typing/using).

EDIT: I'm just saying that you can't blame the user for using the features that come with the system.


It’s all true but you will have a harder life if you give all your servers Unicode names and directories and set them to some obscure time zone that is 15 minutes off.


> […] set them to some obscure time zone that is 15 minutes off.

For anyone curious:

> On the flip side, there’s one border where traveling north or south puts you ahead or behind just 15 minutes. That would be the border between India (UTC +5.5) and Nepal (UTC +5.75).

* https://qz.com/357697/time-zone-deviants-part-i-the-stranges...

* https://www.worldtimeserver.com/learn/unusual-time-zones/


The inverse of this is your test systems should be set to Nepal and India and use Unicode names for everything that has a text field.

It’ll smoke out all sorts of fun.


(you appear to have triggered some downvote fairies by having a differing opinion, have an up to compensate a little! I disagree too as documented below but unless something truly offensive is said I'm of the option that downvoting is generally a bad response)

> you could say servers should be airgapped because security is hard

Many effectively do, and I wish more would do, as a default starting point. Maybe not physically air-gapped but firewall all incoming and outgoing then selectively allow packets through as specifically needed.

Translating to timezones: default to UTC and configure hosted software to display local timezones as needed. Only have the system in a different TZ if something won't work properly otherwise (including it can't be configured to display in something other than the system TZ) or is date sensitive and buggy in a way that it gets confused about yesterday/today/tomorrow when local midnight and UTC midnight differ.

Also if considering resources used by people in multiple timezones, you have to pick something and translate for others, you might as well pick UTC as some things assume it.

> Or that you should only type ASCII because Unicode is hard.

Again yes, by default, and always for things like hostnames, and adjust at needed or when you know Unicode is supported end-to-end (or that where it isn't, something just looks wrong and firefighting that is preferable to not sticking to ASCII in the first place). Though it helps from my point of view that the most obvious omission from ASCII in my locale is our currency symbol.

I say the same for spaces in filenames too: while you perhaps shouldn't have to avoid them, in practise you are setting yourself up for potential trouble later if you don't.

> EDIT: I'm just saying that you can't blame the user for using the features that come with the system.

I agree completely.

But, for instance, while I won't victim blame when someone accidentally leaves doors unlocked and has things stolen, I do make an effort to be damn sure my doors and windows are secure and recommend others do too. There is a difference between victim blaming and recommending preventative defaults because you know there is shit code it there.


It doesn't really matter if it's hard or not, you will simply just keep running in to problems like this without such a policy.


Or use better written software that does work.


So, get rid of systemd over this? Seems impractical.


Don't you mean /usr/share/zoneinfo/Etc/UTC ? If I remember correctly, /usr/share/zoneinfo/Etc is a directory, and /etc/localtime should point to a zoneinfo file.


Thread is specific about linking to Etc. Also from the linked thread:

> I've noticed systemd (PID=1) is running at 100% CPU utilization. That's not normal. I ran this command:

   strace -f -p 1
> The process was reporting continuously attempt to access /etc/localtime file. That's also unusual.

> I managed to get the output I was receiving on the console while tracking systemd process.

    stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3522, ...}) = 0
    stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3522, ...}) = 0
    stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3522, ...}) = 0
Maybe if it's endlessly failing to stat the target, any valid target might help.


Strace is great but it only shows syscalls. the section of code calling the stat will have to be found.

stat returns 0 on successful completion.


Hm, not sure, already closed the ssh. But it did work for us.

The correct fix would be timedatectl, but as you can guess, that's broken together with systemd.


Almost all software bugs come from managing state. Which is why hoarding state management in PID 0 is a dumb idea. I said this when systemd was gaining traction and I'm still saying it now.

When Debian switched to systemd in the default was when I began to seriously use OpenBSD and I've never regretted the decision.


> Which is why hoarding state management in PID 0 is a dumb idea.

Because the Unix process API is unreliable and unsafe ( http://catern.com/process.html ), managing processes is difficult to impossible in a general way under Unix systems including Linux. Linux gained some features to help with this, but for the longest time, PID 1 was the only process that could manage its child processes in a way that wasn't fraught with race conditions -- meaning systemd or something like it was the only way to have safe, effective process management because it tracked all that state in PID 1.

Blame Unix being broken, not systemd, for the problems systemd solves.


Because the litany of shittily written bash scripts don’t hoard state, or what exactly are you saying?

The only thing I dislike about systemd is that it is written in C, but otherwise it is a single purpose solution to the surprisingly complex problem of managing services during boot and running. I fail to see how state is relevant here — if it were written in Haskell it would also have state, almost by definition it has to handle external state. No Monad would help you there, there is just correct and not correct software.


> single purpose solution

You can't be saying this with a straight face. systemd project subsumes functionality of

- device hotplug

- user session manager

- cron

- logging

- network management

- time zone management

- time sync

- cgroup manager

- deleting /tmp files

- DNS resolver

- UEFI boot manager

and also dozen other things I've forgotten.


You do realize that there is a systemd project and a systemd program. Is the plasma5 shell bloated because under the KDE umbrella there are hundreds of other projects?


https://bugs.launchpad.net/bugs/1966800 looks like a potentially related Ubuntu bug, although the underlying bug is in systemd. The commit that fixed it was in 2021: https://github.com/systemd/systemd-stable/commit/a8b66ca9af8...

The Ubuntu bug is from 2022.

Curious that proxmox is hit in 2023. Proxmox must happen to have a particular systemd timer event setup by default that triggers this.


Proxmox runs Debian under the hood. Proxmox 7.x is based on Debian 11.x:

* https://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_7.0


This is just another one of those reasons I just can't use Debian responsibly.

Backport hacks always come back to bite you in the end when your distro is running unsupported software, and it feels like I see something at least annually about Debian's "ensure everything is old so all bugs are predictable" ends up just causing pain.


It's the same for every distro. Millions of 5+ year old Ubuntu LTS boxes everywhere. End of life this April. And predictably people will just double down, and keep using an unsupported thing, because upgrades are scary/hard/etc.


Since this only seems to happen in Ireland, could it be because Ireland technically has summer time as its standard time, and winter time as negative DST? Not sure how that's implemented in tzdata, but it could very well be that some applications can't handle that.


Here's the tzdata: https://github.com/eggert/tz/blob/71faa2a55db2c9f21f4099b58c...

Looks like the negative DST has already caused problems in the past and applications that can't handle it (ICU & OpenJDK) have to build tzdata as per the rearguard section / ziguard.awk.


I have to say these time zone files are fascinating. Any place/time when a jurisdiction has changed the definition of time is in those files, along with a detailed history of why.

Example, California had an energy crisis soon after WWII, and changed when day light savings started/ended. There are time zone rules that track this history exactly, including a note about how clocks lost 6 minutes a day because PG&E changed AC from 60 to 59.5 Hz:

https://github.com/eggert/tz/blob/71faa2a55db2c9f21f4099b58c...


I am very surprised ICU can't handle it. ICU is the de facto standard of handling crazy localisation data


As far as I recall, Ireland is the only country to have its timezones setup that way.


I see 3 countries with negative DST in their RULE records:

* Eire after 1971: https://github.com/eggert/tz/blob/main/europe#L526

* Morocco after 2019: https://github.com/eggert/tz/blob/main/africa#L928

* Namibia between 1994 and 2017: https://github.com/eggert/tz/blob/main/africa#L1167


Plus Czechoslovakia 1946/47 and Chile since 2016, see https://en.wikipedia.org/wiki/Winter_time_%28clock_lag%29


If that resulted in a backwards time it wasn’t expecting ..,


Heh, database corruption expected in -1... -2... -3...


It should be easy enough to test -- set the system to Europe/London or Europe/Lisbon.


Yes I also run my servers on UTC.

But.

It's 2023. Setting a non-UTC timezone shouldn't break your system (or applications). Bugs happen and this is one of them, but "set your system to UTC" isn't an appropriate response here.


It's not a fix, it's a mitigation.

It's perfectly appropriate to provide a workaround before the problem has actually been resolved.


Bugs should be fixed indeed but using anything but UTC on a server is swimming against the tide. Bugs get fixed in old software over the time but new software written every day without consideration that time can jump around. And even old software from time to time is being rewritten in new languages with old knowledge / workarounds thrown away with bathwater (old code). So I don't believe we will run out of bugs related to DST switch anytime soon.


An updated systemd package that includes a fix [0] is in progress on getting rolled out, currently on the pvetest repository.

With that the following reproducer is fixed:

    TZ=Europe/Dublin faketime 2023-03-26 systemd-analyze calendar --iterations=5 'Sun *-*-\* 01:00:00'
[0]: https://git.proxmox.com/?p=systemd.git;a=commit;h=e7eb2b5864... (just noticed that I wrote Ireland/Dublin instead of Europe/Dublin in the commit message)

[1]: https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_t...


How is this anything other than a systemd bug?

For servers I manage, I absolutely refuse to use an OS with systemd. There have been too many unpredictable cases.


Do you mind sharing some? I would need some real examples to show to the team.


The above, of course.

The DNS related issues seen (have they now been resolved?) where it takes over the DNS resolver functions and refuses to honor what you have put in e.g. /etc/resolv.conf ; a web search for "systemd breaks normal dns resolv.conf" will give you a lot of results ; unsure if this is now fully fixed.

Strange things with LXC based VMs moved/migrated between hosts, even if an offline move; I think in 1 case I just ended up creating a new VM from scratch; then rsync'ing everything over from the host node into the VM's directory.


I thought that having the network subsystem manage resolv.conf dynamically had been pretty standard since well before systemd came along.

e.g. Most DHCP clients have done that since... well, probably since DHCP was invented. See, for example:

https://wiki.debian.org/resolv.conf

If you want a static resolv.conf, but you're not using a completely static network config and have some kind of dynamic network management daemon, be that `dhclient`, or `NetworkManager`, or `systemd-resolved`, you've always needed to explicitly configure it to leave resolv.conf alone.

Not sure how this is a systemd issue?


The DNS issue here seems like you just wanted to use DNS like you had before despite the OS changing. That isn't really a bug, just a change in how the OS operates. If you don't want that, you can disable that feature.


Which services are enabled by default is up to each distribution. In you case I'm guessing it might be systemd-resolved or systemd-resolvconf, if not NetworkManager. Just disable what you don't want.

In terms of problematic default services, I'd look at GNOME. systemd is just the medium.


I guess it can't be all DST timezones, because here in PDT nothing broke during the transition two weeks ago.


Same here. I am adjacent two vendor-supported Proxmox clusters (Motorola Emergency CallWorks 911 phone system) and there was no outage / issue last fall moving from EDT to EST, nor two weeks ago flipping back. We’re a 24x7 operation and would have heard about any outage (either planned to address the issue before it occurred, or triage during an unplanned outage).


Seems to be (as of now) Ireland?


I ran into issue, where I shutdown system with Proxmox before start of EDT, shipped it to co-worker and when he powered it on after daylight saving time was in effect, time was off by 1 hour for Proxmox and all LXC containers.

Fix was to re-symlink `/etc/localtime` to proper time zone.

Didn't want or cared to dig into this deeper, but interesting to see that there are more issues about this.


I don't understand why someone would want their servers running on anything but UTC and only use time zones in the application itself.


Agree. In this case it was technical debt, unfortunately.


There is a similar problem with Dublin timezone in Davx5 / iCal4j: https://github.com/bitfireAT/davx5-ose/discussions/265 (and noone cares)


System time should be utc. Your software should use a timezonedb or offset when displaying it to show you YOUR local time


Luckily, it did not happen to me. I manage the "timezone" by default.

For anyone interested, I manage around 20+ Proxmox machines with these Ansible roles:

https://github.com/liv-io/ansible-roles-debian

Example playbook:

https://github.com/liv-io/ansible-playbooks-example/blob/mas...


Hugged to death, archive: https://web.archive.org/web/20230325234412/https://forum.pro...

EDIT: Seems to be back.


100% expect this to do with mixing Unix timestamps and timezones. Constant recurring footgun in my career.


This should be the CEST Austria Daylight Savings change tonight. https://www.visitingvienna.com/visitorinfo/time/

Proxmox is based out of Vienna, Austria.

So a Timing glitch.


Is the extra "s" after "daylight saving" canon yet?


It's like advertising how to purchase Advanced Tickets to a concert or event.

Because who wants to be bothered with Novice or Intermediate Tickets?


Many places in the Southern Hemisphere have been on daylight savings time over the past few months, I'm wondering how come this bug did not surface earlier.


Just the usual reminder that systemd is crap and that is terrible for us that it was forced shoved to everyone in Debian and co.

The main issue was always this one, having side functions like dealing with timezones breaking your pid 1. In such a case it is almost impossible to resolve the problem from the system itself you have to mount the drive with another OS or reinstall when possible.


im sure any init script has similar issues if (like systemd in this case?) not updated for a while


I love proxmox but there’s a reason it’s rarely used in enterprise production…


I think Proxmox adoption (both enterprise and SMB) has increased a lot in the last 6-12 months re VMware's new pricing changes.


References?


Ah, this is why I saw an update for `tzdata` this morning in my proxmox (?).


Why wold you use daylight savings on a server, let alone a hypervisor?


i hate systemd it's always getting hung up on shutdown. "a stop job is running for blah blah" yea no. when i'm ready to shut down, it needs to shut down. immediately.


No one else does.

When I shutdown I would like a shutdown sig to be sent to my database and then continue once it gets a successful response. No one wants that to just be killed because that would cause data loss.

When I shutdown I want my filesystem synced so things mid-write don't experience data loss.

The problem is badly written scripts that keeps sending a "one more min" response. But you could override that if you actually cared instead of just whining.


Also, systemd comes with a timeout, after which it kills the app anyways, so I think you can surely configure that 90s timeout to be something like 5s or so, which then is much faster.

But as you wrote, If someone really cares, they would fix it.

PS: But what usually gets me about these 90s waits whenever I get them is that the message does not say which thing (unit, etc) is the issue. THAT is something worth criticising.


To my understanding systemd does proper job here, as it supposed to be, highlighting the issues with other subsystems - be it badly written scripts or misorder of shutdown sequence, those parts were ignored by "traditional" init systems before.


Exactly my point.

If it is taking 5 mins for my db to respond with "success" then either fix the db or switch it.

That's not systemd's fault that the database is rubbish.


If your database loses data when killed it is not worth the bits it is made of.


Is a transaction is still in flight and is not committed then you will lose it.

Just because you decided to kill the database while it was still in use, that's your fault.


why are your servers not UTC? sheesh.


We had a requirement from a Government agency to store the date/time in the system's local time (Ireland, in our case). All certs issued had to have the time it was issued, and they had to be local.


I hope a complaint has been filed about this policy.


haha, amazing! i’ll put my pitchfork away :)


All systems in common use, use UTC for the underlying representation.

The concept of setting the time zone is a high level library thing already and all these systems, their standard time functions simply provide a time since some fixed epoch.

Ie see C time(), gettimeofday, clock_gettime, GetSystemTime. They’re all “UTC” based.

If a piece of server software is going out of its way to deal with local time APIs and blows up because of it - well it either legitimately needs to deal in local time, or it’s so poorly designed that simply keeping the system locale set to UTC isn’t somehow a magic fix. That’s just an adhoc assumption.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: