What's It?
PDFMiner is a suite of programs that aims to help analyzing text data from PDF documents. It includes a PDF parser, a PDF renderer (though only rendering text is supported for now), and a couple of nice tools to extract texts. Unlike other PDF-related tools, it allows to obtain the exact location of texts in a page, as well as other layout information such as font size or font name, which could be useful for analyzing the document.
Features:
- Written entirely in Python. (for version 2.5 or newer)
- Supports up to PDF-1.7 specification.
- Supports Non-ASCII languages and vertical writing scripts.
- Supports Various font types (Type1, TrueType, Type3, and CID).
- Supports Basic encryption (RC4).
- Supports PDF to HTML conversion.
- Supports Outline (TOC) extraction.
- Supports Tagged contents extraction.
No comments:
Post a Comment