Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My first compiler bug was in my first year at Google. I'd just introduced a new system for the animation that updates your position while driving in Google Maps. It was perfectly buttery smooth as planned, except on my manager's commute the next day, where it constantly lurched back and forth. The others on the team were convinced that it had to be something with my code, but I didn't think it could be, because my code had no conditional statements and should either always be right or always be wrong.

It kind of looked like it was being fed nonsense speed values, so I got the GPS log from my manager and checked - but no weird speed values, actually a remarkably clean GPS log. Replayed his GPS on my phone - worked perfectly fine, buttery smooth. Eventually it came out that it only happened on my manager's phone. Borrowing said phone and narrowing things down with printf, I showed that my core animation function was being called with the correct values (a, b, c, d) but was being run with the wrong ones (a, a, c d). This is when my manager thought to mention that he was running the latest internal alpha preview of Android.

Searching Android's bug tracker for JIT bugs, I found that they had a known register aliasing bug. Honestly I have no idea how it ran well enough to get to my code in the first place. But I tagged my weird animation bug as related to that (they didn't really believe me) and ignored it until they fixed their thing, at which point it went away.



JIT bugs must be horrible to debug because they happen in full running programs and might not always result in crashes. RyuJIT of dotnet had a bug a few years ago which caused certain calculations on certain specific conditions to go wrong.


I ran into several other JIT bugs working on Google Maps. Since I worked on location, my code was the first complex thing that ran under most flows, and rather prone to being the first thing to crash when the system behaved in a very dumb way. About once a year we'd get a spike of crash reports from some ancient version of Android where a third party had built a custom JIT to get the system to run on very cheap hardware. Then the crash would be something absurd, like a null pointer on a value that was never assigned null, or an ArrayIndexOutOfBounds on a statically sized array that was only ever accessed with constants.

Sometimes someone would actually try to solve these, but I preferred the times where someone just changed the Proguard config and hope it shuffled things around enough that it didn't trigger whatever bug the JIT had.


> spike of crash reports from some ancient version of Android

Quasi-related question: app installs/updates have been very very consistently SIGBUSing system_server with BUS_ADRALN on my ancient Galaxy Note 3 for about a year, but downgrading the Play Store app to the factory version (5.5.12 :D) makes the problem go away completely.

I've tentatively considered writing something up for issuetracker.google.com, but besides tons of logcats I'm not sure what info to provide, and I also wonder if the info would be considered useful due to the age of the device. Any advice on whether/how to proceed would be appreciated!


I think the biggest problem you're going to run into is that there are very few people assigned to maintaining old systems, and it's quite hard to get these issues brought to anyone's attention. The ones I mentioned only got fixed because they crept up to the top most common causes of crashes.

A bug report would be the first thing I'd want for something like this - in addition to logcat info it includes a bunch of process information, and apps can add custom data to the report. It'd also be nice to know anything to narrow down when the problem started - whatever bounds you can provide on when you were last absolutely certain it was up to date and working, plus your best guess. Someone's going to have to dig a Galaxy Note 3 out of a drawer somewhere, go into the archives of old Play Store versions, and try installing them until they find the version that broke it, so anything that makes that binary search process less terrible would be a start.


Thanks for replying!

> there are very few people assigned to maintaining old systems, and it's quite hard to get these issues brought to anyone's attention. The ones I mentioned only got fixed because they crept up to the top most common causes of crashes.

Thanks for this info. It kinda matched my own intuition about what to expect, but I wasn't sure where/how I might find out for sure. It's nice to be able to calibrate how to optimize effort.

> A bug report would be the first thing I'd want for something like this - in addition to logcat info it includes a bunch of process information, and apps can add custom data to the report.

Ah, of course. That's straightforward to do; I can just watch logcat for system_server crashes then automatically take bug reports when they occur.

> It'd also be nice to know anything to narrow down when the problem started - whatever bounds you can provide on when you were last absolutely certain it was up to date and working, plus your best guess.

IIRC™, this has been happening from day 1 when I was given this phone (it was previously sitting unused in a drawer, yay).

> Someone's going to have to dig a Galaxy Note 3 out of a drawer somewhere, go into the archives of old Play Store versions, and try installing them until they find the version that broke it, so anything that makes that binary search process less terrible would be a start.

Oh meep. Of course.

I'd have no problem loaning the device (perhaps the carrier-specific image it's using is implicated), but that's probably a huge pile of overhead to deal with. I wonder if I can help out with the bisection process myself?

Hmm. Given that this is the Play Store and the device is (currently) not rooted, I couldn't do it the "russian" way with random APK sites even if I wanted to.

I'm very curious what the internal path is here - does it depend on rooting or is there a way to isolate devices and send them specific versions of apps?

(Now I'm imagining being given a (bespoke) HTTP endpoint to hit (just for this) to switch versions...)

Maybe the best path forward would just be to root this thing already. I'd be very surprised if rooting affected the situation.

Thanks very much for the thought about bisecting GPS though. Definitely hadn't thought of that myself at this point, and don't think I would have anytime soon.


We just used adb install. It might've involved installing a special version of Android, but I deliberately forgot about that process because it sucked. Either way, I'd be kind of surprised if Play wasn't a special case, no clue what they do. (I'm no longer at Google, can't even check.)

Good luck with your quest though!


> I'd be kind of surprised if Play wasn't a special case

It probably is, but there's also the

  Failure [INSTALL_FAILED_ALREADY_EXISTS]
brick wall which makes perfect sense, is not especially specific to the Play Store, and will likely require root (then some careful `mv`s) to squish. Yayyy.

> Good luck with your quest though!

Much appreciated :) thanks again for the insight/feedback!


I caught a JIT bug in .NET back when I worked at Microsoft. Unfortunately I don't remember the specifics, but it was doing something wrong when trying to optimize a bit of code using AVX instructions.

I was immediately sure it was a compiler bug when I realized I could make the code work correctly if I changed the order of variable declarations. It would happen on something like:

int a; int b; float c;

(again, that's only illustrative - too long ago for me to remember the specifics)

But not with:

int a; float c; int b;

Folks in my team didn't want to believe it at first :)

Edit: found it here - https://github.com/dotnet/runtime/issues/17395

Not quite as I described but close.


That not only shows it's a compiler bug (assuming the program had no undefined behavior), it gives a general technique for generating compiler tests. Take any program, randomly change the order of declarations in a way that doesn't alter the meaning of the program, and see if both versions have the same behavior on some set of inputs.

This is an example of "metamorphic testing".


Mozilla engineer Yulia Startsev has a series of live-coding YouTube videos called "Compiler Compiler" about developing and debugging Firefox's SpiderMonkey JS interpreter and JIT:

https://www.youtube.com/results?search_query=Mozilla+Hacks+C...


Yep, JIT bugs are a bear.

We use LuaJIT pretty extensively and some codepaths trigger a crash-on-assert which we've confirmed is a JIT bug. But not consistently, the weather has to be just right. 95% of the time, randomly inserting `assert(true)` into the code un-breaks it.

We're using the beta, so this is an expected sort of thing. I'd love to be the guy who squashes it, but that's way outside my wheelhouse. I'm left crossing my fingers and hoping one of the updates makes it go away.

Globally disabling the JIT is a nice way to confirm that using LuaJIT was a good idea, though. Instant code starts taking seconds to complete.


Back when I worked on IBM's J9, the nightmare scenario was either a dropped memory fence, or the map keeping track of links between register / stack references and GC heap objects getting screwed up. Things could go on a very long time before those errors manifest visibly.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: