Fast text and metadata extraction from documents using Apache Tika compiled to native code
pip install iscc-tika