Ravenport: python-pyocr

python-pyocr

Port variant	v12
Summary	Wrapper for OCR engines (Tesseract, etc) (3.12)
Package version	0.8.5
Homepage	No known homepage
Keywords	python
Maintainer	Python Automaton
License	Not yet specified
Other variants	v13
Ravenports	Buildsheet \| History
Ravensource	Port Directory \| History
Last modified	15 NOV 2024, 16:08:50 UTC
Port created	02 FEB 2018, 15:29:04 UTC

Subpackage Descriptions

single

# PyOCR PyOCR is an optical character recognition (OCR) tool wrapper for python. That is, it helps using various OCR tools from a Python program. It has been tested only on GNU/Linux systems. It should also work on similar systems (*BSD, etc). It may or may not work on Windows, MacOSX, etc. ## Supported OCR tools * Libtesseract (Python bindings for the C API) * Tesseract (wrapper: fork + exec) * Cuneiform (wrapper: fork + exec) ## Features * Supports all the image formats supported by [Pillow], including jpeg, png, gif, bmp, tiff and others * Various output types: text only, bounding boxes, etc. * Orientation detection (Tesseract and libtesseract only) * Can focus on digits only (Tesseract and libtesseract only) * Can save and reload boxes in hOCR format * PDF generation (libtesseract only) ## Limitations * hOCR: Only a subset of the specification is supported. For instance, pages and paragraph positions are not stored. ## Installation ```sh sudo pip3 install pyocr # Python 3.X ``` or the manual way: ```sh mkdir -p ~/git ; cd git git clone https://gitlab.gnome.org/World/OpenPaperwork/pyocr.git cd pyocr make install # will run 'python ./setup.py install' ``` ## Usage ### Initialization ```Python from PIL import Image import sys import pyocr import pyocr.builders tools = pyocr.get_available_tools() if len(tools) == 0: print("No OCR tool found") sys.exit(1) # The tools are returned in the recommended order of usage tool = tools[0] print("Will use tool '%s'" % (tool.get_name())) # Ex: Will use tool 'libtesseract' langs = tool.get_available_languages() print("Available languages: %s" % ", ".join(langs)) lang = langs[0] print("Will use lang '%s'" % (lang)) # Ex: Will use lang 'fra' # Note that languages are NOT sorted in any way. Please refer # to the system locale settings for the default language # to use. ``` ### Image to text ```Python txt = tool.image_to_string( Image.open('test.png'), lang=lang, builder=pyocr.builders.TextBuilder() ) # txt is a Python string word_boxes = tool.image_to_string( Image.open('test.png'), lang="eng", builder=pyocr.builders.WordBoxBuilder() ) # list of box objects. For each box object: # box.content is the word in the box # box.position is its position on the page (in pixels) # # Beware that some OCR tools (Tesseract for instance) # may return empty boxes line_and_word_boxes = tool.image_to_string( Image.open('test.png'), lang="fra", builder=pyocr.builders.LineBoxBuilder()

Configuration Switches (platform-specific settings discarded)

PY312 ON Build using Python 3.12 PY313 OFF Build using Python 3.13

Package Dependencies by Type

Build (only)	python312:dev:std python-pip:single:v12 autoselect-python:single:std
Build and Runtime	python312:primary:std
Runtime (only)	tesseract:tools:std python-Pillow:single:v12

Download groups

main	mirror://PYPIWHL/91/1c/3ef6485732685ad9c938c9ebb3fad0570f58bdc54e1242cdfa40040a630e

Distribution File Information

3a534eee5ac6ce681159d67528b8953f0f2a7a5aad8733373ea7b3e8ac13a0b4 40003 python-src/pyocr-0.8.5-py3-none-any.whl

Ports that require python-pyocr:v12

No other ports depend on this one.