Memo
[ocrmypdf/OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF)
ocrmypdf/OCRmyPDF 离线的图片、PDF OCR工具,使用 Python 编写,跨平台 CLI 工具。 Main features - Generates a searchable PDF/A file from a regular PDF - Places OCR text accurately below the image to ease copy / paste -...
ocrmypdf/OCRmyPDF
离线的图片、PDF OCR工具,使用 Python 编写,跨平台 CLI 工具。
Main features
- Generates a searchable PDF/A file from a regular PDF
- Places OCR text accurately below the image to ease copy / paste
- Keeps the exact resolution of the original embedded images
- When possible, inserts OCR information as a "lossless" operation without disrupting any other content
- Optimizes PDF images, often producing files smaller than the input file
- If requested, deskews and/or cleans the image before performing OCR
- Validates input and output files
- Distributes work across all available CPU cores
- Uses Tesseract OCR engine to recognize more than 100 languages
- Keeps your private data private.
- Scales properly to handle files with thousands of pages.
- Battle-tested on millions of PDFs.