Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is, the tokenizer isn't reversible (and it adds spaces all over the place).

But a lot of these I should be able to add to my regex that converts the output back into more human readable format (in the raw output, there's a space before every punctuation mark so I already remove those extraneous spaces from periods, commas, etc).

I just haven't gotten around to adding in any heuristics specifically for code but adding a bit more post-processing is on my to-do list.



I updated my regexes to clean up some of the tokenizer noise last night. So many of the formatting in the code snippets should look a bit more natural now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: