Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I always found useful something along the lines of

  pdftotext -layout file.pdf | grep -E ...
for PDFs, good to see a Swiss Army knife utility for all sorts of file though!


rga uses pdftotext (from poppler) internally for pdfs, except wraps it in parallelization and a very fast cache layer, since you usually want to do multiple queries per file :)




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: