Any way to shift the text processing to the client? I'd like to use the bookmarklet on some academic papers (many behind a paywall) and the few I've tried only seem to parse the abstract...I assume this is because the text processing is happening server-side, but I could be wrong.
Alternatively, could you release your backend code as well? I'd like to run this on larger corpora.
It attempts to access URLs directly but this only works if the server sends the appropriate CORS headers (hardly ever).
Otherwise, it falls back to using a proxy, which means the client only sees what the proxy sees. However, you can also paste raw text on the main page.
I could imagine modifying the bookmarklet so it lifts the text directly from the browser instead of just copying the URL. This would solve the proxy issue neatly and would also work for local-only or intranet sites, for which the proxy also fails.
Alternatively, could you release your backend code as well? I'd like to run this on larger corpora.
Very elegant and useful project!