5 Commits

Author SHA1 Message Date
Aaron Roberts
04bbbebd5a Remove Freeform and Find from UI. Allow Description to be added to Reviewed job 2026-06-29 13:09:01 +01:00
Aaron Roberts
fd747e6c23 Add job tracking with PostgreSQL, image storage, and review workflow
- Add PostgreSQL service to docker-compose with health check and postgres_data volume
- Mount ./ocr_images as bind volume for persistent image storage
- Add backend/database.py with schema init and get_db() context manager
- Add 5 new API endpoints: POST /api/jobs, GET /api/jobs (search), GET /api/jobs/{id},
  GET /api/jobs/{id}/image, PUT /api/jobs/{id}/review
- Jobs are saved with author/book/chapter/page metadata, auto UUID, and submitted_at timestamp
- Jobs start as 'unreviewed'; review captures edited text, reviewer name, and reviewed_at
- Add MetadataForm.jsx (author/book/chapter/page inputs) to the New Job panel
- Add JobsPanel.jsx with search/filter, paginated list, and detail pane with review form
- Add "Commit Job" button to ResultPanel (plain_ocr mode only) with success/error feedback
- Add "New Job" / "Browse Jobs" navigation to the app header

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-09 16:48:12 +01:00
Claude
e578276d3e Add PDF processing and multi-format document conversion
Features added:
- PDF to image conversion with configurable DPI
- Multi-page PDF processing with OCR
- Export to Markdown, HTML, DOCX, and JSON formats
- Automatic image extraction from PDFs
- Formula and formatting preservation
- Real-time progress tracking for multi-page documents

Backend changes:
- New /api/process-pdf endpoint for PDF processing
- pdf_utils.py: PDF conversion and image extraction utilities
- format_converter.py: Document format conversion (MD, HTML, DOCX)
- Updated dependencies: PyMuPDF, img2pdf, python-docx, markdown

Frontend changes:
- File type toggle (Image OCR / PDF Processing)
- PDFProcessor component with format selection
- Updated ImageUpload to support both images and PDFs
- Progress bars for multi-page processing
- Download options for converted documents

Documentation:
- Updated README with PDF processing features
- Added API documentation for /api/process-pdf endpoint
- Added format conversion examples
2025-11-15 14:25:09 +00:00
Ray Dumasia
3efc4da7ff Add in .env.example for setting ports, fix upload limit, fix bounding box, can now dismiss previous image, change markdown expectation to HTML - not MD. updated README with nvidia driver/container instructions 2025-10-21 21:35:17 +01:00
Ray Dumasia
aec04f6eb4 Initial commit 2025-10-21 01:32:09 +01:00