Skip to content

Tesseract Device

The PymordialTesseractDevice (src/pymordialblue/devices/tesseract_device.py) implements the OCR capability using the local Tesseract binary.

Setup

The device attempts to resolve the Tesseract executable in two ways: 1. System Path: Uses tesseract from the system PATH. 2. Configured Path: Uses the path specified in extract_strategy.tesseract.tesseract_cmd in configs.yaml. 3. Bundled Binary: Checks for a bundled binary in bin/tesseract/tesseract.exe (common for portable distributions).

Text Extraction

The primary method is extract_text. It accepts an image (path, bytes, or numpy array) and an optional "Extract Strategy".

adb_stream_frame = controller.get_frame()
text = controller.ui.read_text(adb_stream_frame)

Text Finding

find_text(search_text, image) attempts to return the (x, y) center coordinates of the matching text. It uses pytesseract.image_to_data to get bounding boxes for all detected words and searches for the keyword.

Preprocessing Integration

Tesseract accuracy is heavily dependent on image quality. The device integrates deeply with PymordialExtractStrategy. When you call extract_text, you can pass a strategy:

from pymordialblue.utils.extract_strategies import DefaultExtractStrategy

# Use a specific strategy that denoises and thresholds the image
my_strategy = DefaultExtractStrategy()
text = controller.ui.read_text(frame, strategy=my_strategy)

See Extract Strategies for more details.