I tend to agree with Wattenberg and Viégas that it's interesting to treat all words and punctuation equally, but it's certainly a matter of opinion and it would be simple enough to tokenise the input data differently.
Yes, I saw that reference after posting it here. It makes sense. I also see it is understanding combination of 2 and 3 words together, very brilliant!
Recently I have dabbled into d3 and used your site for lot of inspiration. I created something very similar to analyze text from web pages but using bubbles