If you have a data race, your program can have all sorts of inconsistent state in all sorts of objects -- even standard ones. In Java, I don't think you have anything like "undefined behavior", sometimes jokingly but meaningfully called catch-fire or launch-the-nukes semantics, but that doesn't prevent a data race from breaking program invariants in ways that don't immediately or ever generate exceptions.
There was a little optimization post a few days ago: https://news.ycombinator.com/item?id=36618344. Basic summary of the program, given a NUL terminated buffer like "spppssss", return an int where each 's' adds 1, and each 'p' takes 1 away. Several of the alternate implementations in the HN comments involved using strlen first, which has the nice of effect of being highly vectorized, and knowing the size, the compiler can better vectorize the actual logic loop, and a human can vectorize better still. If we imagine another piece of code that concurrently modifies the input buffer to write a NUL to the 0 index, it would change the length to 0. To be generous and to simplify the discussion, let's say the concurrent modification happens to become visible to our thread after strlen but before the following loop. Is there now a bug in this implementation?
If the point is that java.lang.String must ensure its pre-conditions and post-conditions are always maintained, because it's so fundamental to other security functions of the JVM, then there is no solution that allows for untrusted/buggy shared memory concurrency. Any suggested modifications to the constructor like "add a lock" or "copy the buffer first" will not work, and are missing the point entirely.
If the issue is the string intern-ing of such a dodgy String, then the best you can do is code intern() more defensively, to throw out detectably dodgy input. I could then understand saying intern() has a bug, but I don't think that was the point of the post, since there is no discussion of the intern() code.
That's a lot of talking around the issue and I think you're missing it.
We're talking about constants being corrupted in code which has has no control or data flow relationship with the code doing the corrupting. Code which is totally bug free can exhibit impossible behavior.
That might not sound disturbing to you if you're taking about strlen(), because languages with unrestricted pointers are full of undefined behavior. But Java isn't like that. Java doesn't come with nasal demons.
I didn't say intern() has a bug.
The bug is Java having undefined behavior of the kind which breaks a language invariant as strong as string equality.
You're arguing that there isn't a bug because non-determinism comes from concurrency. Well, you can't have concurrency without non-determinism. That doesn't mean the language needs to stop working.
This is perhaps an unnecessary clarification, but there's plenty of non-determinism that's perfectly fine. Data races are non-deterministic and bad, but "non-determinism" itself is not a scare word.
I spent a good decade of my career writing C#, which is similar enough to Java for these discussions, and the strlen issue I talked about is not limited to C -- it is also an issue for a managed language like C# or Java. I think you would do well to discuss the problem, instead of dismissing it because myself and the original post I referenced are tainted by the nasal demons of undefined behavior in C. Don't use that as an escape hatch from dealing with the general issue presented in the problem.
I know of no language in existence where even in the face of data races, there is a guarantee that all data invariants are maintained otherwise an exception is thrown. Note that Rust is not such a language, as its goal is to make data races compilation errors, but unsafe code can introduce data races that do not generate compilation errors.
I would also suggest you offer even a sketch of a fix for the String(char[]) constructor if you think that is the source of the bug. I don't think one is possible. I'm not aware of any method to unilaterally atomically copy an unbounded char[], for example. Even if there is, I suggest you go look for other functions that take other kinds of mutable objects as parameters -- I doubt that for each and every case there is a method to unilaterally atomically deep copy those objects. Such copying is just one suggestion, feel free to resolve it any way you think works.
If you don't want to sketch a solution, another experiment you can try is to use a non-thread-safe mapping type like TreeMap, and make structural modifications (add/remove elements, not just modify existing values) from multiple concurrent threads without any synchronization. Can you guarantee that in all cases where invariants are broken, that an exception will be thrown?
I think it would be prudent to do something in intern(), and one could arguably call this a bug in intern(). The string intern table is invisibly and pervasively read shared state, and relatively uncommonly mutated. It would make sense to take extra care when mutating the shared state to attempt to ensure no invalid strings are added.
There was a little optimization post a few days ago: https://news.ycombinator.com/item?id=36618344. Basic summary of the program, given a NUL terminated buffer like "spppssss", return an int where each 's' adds 1, and each 'p' takes 1 away. Several of the alternate implementations in the HN comments involved using strlen first, which has the nice of effect of being highly vectorized, and knowing the size, the compiler can better vectorize the actual logic loop, and a human can vectorize better still. If we imagine another piece of code that concurrently modifies the input buffer to write a NUL to the 0 index, it would change the length to 0. To be generous and to simplify the discussion, let's say the concurrent modification happens to become visible to our thread after strlen but before the following loop. Is there now a bug in this implementation?
If the point is that java.lang.String must ensure its pre-conditions and post-conditions are always maintained, because it's so fundamental to other security functions of the JVM, then there is no solution that allows for untrusted/buggy shared memory concurrency. Any suggested modifications to the constructor like "add a lock" or "copy the buffer first" will not work, and are missing the point entirely.
If the issue is the string intern-ing of such a dodgy String, then the best you can do is code intern() more defensively, to throw out detectably dodgy input. I could then understand saying intern() has a bug, but I don't think that was the point of the post, since there is no discussion of the intern() code.