248 days of uptime by any chance? edit: I saw the same on a fleet of thousands o... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jtwaleson on Aug 24, 2018 \| parent \| context \| favorite \| on: Former Tesla Firmware Engineer Discusses the Syste... 248 days of uptime by any chance? edit: I saw the same on a fleet of thousands of JVMs which hung on 100% CPU after 248 days very consistently. Closest thing to an explanation I ever got was perhaps it is storing uptime in hundredths of a second (why not ms???) in signed 32 bit integers, see: https://ma.ttias.be/248-days/ In the end we solved it by restarting with a cronjob between 2am and 4am after 247 days...

Bender on Aug 25, 2018 [–]

One thing to look at is the sum of Anonpages if you have THP enabled. That was enabled by default after CentOS6.2. The usage itself isn't an issue, but there is a known memory leak in THP and the fragmentation can get wedged after a couple hundred days based on usage characteristics of the server.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact