Put together a document text extraction server using Apache Tika (with ~30 lines...

		replicantrose on Feb 6, 2024 \| parent \| context \| favorite \| on: An underrated alternative to Unstructured/Nougat f... Put together a document text extraction server using Apache Tika (with ~30 lines of code) that can be used to vectorize text for retrieval-augmented generation or to create LLM training datasets. Much credit to the tika-python project for making the Python bindings!