Accessible Thai Document Conversion System
Convert Thai and multilingual documents into accessible formats for blind and visually impaired users. This form supports individual upload, institutional batch processing, and future API integration.
Output targets include accessible HTML, DOCX, TXT, EPUB, MP3 audio, DAISY, Braille BRF, and tagged PDF/PDF-UA.
Conversion Options
Input: PDF, DOCX, TXT, JPG, PNG, TIFF, ZIP
Audio: MP3 and DAISY-ready structure
Braille: BRF output option
Batch: Upload or watch-folder processing
Minimum Open-Source Configuration
The demonstration system is designed around proven open-source technologies. These components form the minimum technical foundation required to process Thai documents, manage batch conversion, and generate accessible output formats.
OCRmyPDF
Adds searchable text layers to scanned PDF files.
Tesseract OCR
Performs optical character recognition with Thai and English language support.
PaddleOCR
Optional OCR engine for improved Thai character recognition and complex layouts.
Poppler
Converts PDF pages into images for OCR and preprocessing.
ImageMagick
Prepares and optimizes scanned images before OCR processing.
Python
Main processing language for document workflows, automation, and AI integration.
FastAPI
Provides REST API access for external systems and institutional integration.
Celery
Handles background jobs and large-scale batch processing.
Redis
Acts as the queue broker for conversion jobs and retry handling.
Watchdog
Monitors incoming directories and triggers automatic batch processing.
Pandoc
Converts structured content into HTML, DOCX, EPUB, and plain text outputs.
Calibre
Supports eBook conversion and EPUB handling.
Piper TTS
Generates offline text-to-speech audio output where Thai voice models are available.
eSpeak NG
Provides lightweight speech synthesis and fallback audio generation.
Liblouis
Converts text into digital Braille formats such as BRF.
DAISY Pipeline
Supports generation of accessible DAISY-style structured audio publications.
veraPDF
Validates accessible PDF output against PDF/UA requirements.
Ace by DAISY
Checks EPUB accessibility and identifies accessibility issues.
MySQL / MariaDB
Stores metadata, job status, user requests, audit records, and output references.
Debian Linux
Provides a stable and secure server operating environment for deployment.
Recommended Initial Server Configuration
For the initial demonstration and launch phase, the system should run on a dedicated server with sufficient storage and processing capacity for OCR, audio generation, and batch conversion workloads.