Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> UTF-8, for example, allows to encode the character "ä" as \xc3\xa4 OR \x61\xcc\x88. They look visually identical, yet fail any string comparison.

Well-designed high-level programming language (see Perl6) compares strings depending on how they look, not on their binary representation [0].

For example let's play with Perl6 REPL:

  > my $a = Blob.new(0xc3, 0xa4)
  Blob:0x<c3 a4>
  > my $b = Blob.new(0x61, 0xcc, 0x88)
  Blob:0x<61 cc 88>
  > $a eqv $b
  False
  > $a.decode() eqv $b.decode()
  True
  > $a.decode() eq $b.decode()
  True
As another example, Python3 would fail:

  >>> a = b'\xc3\xa4'
  >>> b = b'\x61\xcc\x88'
  >>> a == b
  False
  >>> a.decode() == b.decode()
  False
Perl6 is definitely going to dominate the world one day.

[0] https://perl6advent.wordpress.com/2015/12/07/day-7-unicode-p...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: