ATR Teaching Resource ATR Teaching Resource ATR
  • Introduction
  • eScriptorium
  • Models
    • Open-Source Models
    • HTR-United
  • Modern Approaches
    • TrOCR
    • Vision Language Models
  • Workshops
    • Workshops
    • DMSI 2026 Kalamazoo
  • Quiz
  • Literature
  1. Models
  2. Open-Source Models
  • Home
  • Introduction
  • eScriptorium
  • Models
    • Open-Source Models
    • HTR-United
  • Modern Approaches
    • TrOCR
    • Vision Language Models
  • Workshops
    • DMSI 2026 Kalamazoo
  • Quiz
  • Literature

On this page

  • Overview
  • Model Comparison Table
  • Generic and Multi-Language Base Models
    • CATMuS Medieval 1.6.0
    • TRIDIS v1 / v2
    • Generic CREMMA 1.0.1
  • French and Occitan Models
    • Bicerin 1.0/1.1 (CREMMA Medieval)
    • FROC / model_froc
  • Hebrew and Geniza Models
    • BiblIA Family
    • MiDRASH Geniza 01
  • Greek Models
    • Meleagre-NFD-finetuned
    • GreekHTR / greekmix_01
  • Old Norse / Old Icelandic Models
    • OICEN-HTR Bundle
    • Bifrost 0.1
  • Middle High German Models (Inzigkofen)
  • Where to Find More Models
  • Edit this page
  • Report an issue
  1. Models
  2. Open-Source Models

Open-Source Models for Kraken / eScriptorium

A curated overview of publicly available HTR models for medieval and early modern manuscripts, organized by language and script family.

Overview

The Kraken/eScriptorium ecosystem has produced a growing library of publicly available HTR models. Most are distributed via Zenodo or GitHub as .mlmodel files (Kraken’s PyTorch-based format) and can be imported directly into eScriptorium or applied from the Kraken command line.

The landscape has shifted markedly since 2024: the field has moved from small, project-specific models toward broad, transferable base models — chiefly CATMuS Medieval and TRIDIS v2 — which are now used as starting points for community fine-tuning.

How to use these models in eScriptorium: Download the .mlmodel file from Zenodo, then upload it via Models → Import model in eScriptorium. For command-line use: kraken -i image.jpg output.txt ocr -m model.mlmodel

Model Comparison Table

Model Script / Script type Language(s) Reported accuracy Year
CATMuS Medieval 1.6.0 Mixed Latin scripts Fr, La, Es, It, … CER < 5% (optimal conditions) 2025
TRIDIS v2 Textualis, Cursiva La, Fr, Es CER ~0.11–0.15 (external sets) 2024
Generic CREMMA 1.0.1 Mixed, 8th–15th c. La, Old French Not reported 2023
Bicerin 1.1.0 Mixed medieval Old French 95.30% accuracy 2022
FROC / model_froc Praegothica, Textualis Old Fr, Old Oc CER 7.83% (test) 2018
BiblIA_01 Hebrew scripts (mixed) He, Aram Not reported 2021
MiDRASH Geniza 01 Geniza fragments He, Aram, Judeo-Arabic Not reported 2025
Meleagre-NFD-finetuned Byzantine Greek minuscule Greek 91.05% accuracy 2024
GreekHTR / greekmix_01 Greek minuscule, 9th–12th c. Greek Not reported (preview) 2025
OICEN Combined 0.1 Old Icelandic, mixed Old Icelandic/Norse CER ~2.6% (Char. 97.4%) 2025
Bifrost 0.1 Old Norse manuscripts Old Norse Not reported 2025
textualis_anna_jaeck Textualis, single hand Middle High German CER 4.88% 2025/26
bastarda_jos_von_pfullendorf Bastarda, single hand Middle High German CER 3.35% 2025/26
cursive_johannes_jaeck Cursive, single hand Middle High German CER 4.29% 2025/26

Generic and Multi-Language Base Models

These models cover broad domains and are the recommended starting point for new projects.

CATMuS Medieval 1.6.0

CATMuS Medieval 1.6.0

The current state-of-the-art generic base model for medieval Latin-script manuscripts. Released in 2025 as part of the Consistent Approaches to Transcribing ManuScripts (CATMuS) initiative.

Zenodo: 10.5281/zenodo.15030337 CC BY 4.0 16.4 MB Old/Middle French · Latin · Old Spanish · Italian CER < 5% (optimal)

  • Training data: > 160,000 lines, > 5 million characters, > 200 manuscripts/incunabula, 10 languages
  • Transcription policy: Strictly graphematic; abbreviations not expanded; Unicode NFD
  • Time range: 8th–16th century
  • Best use: Fine-tuning starting point for any Western Latin-script manuscript; general transcription of French, Latin, or Iberian medieval material

Limitations: No single canonical benchmark metric; high corpus heterogeneity; may generalize poorly to very idiosyncratic hands without fine-tuning.

TRIDIS v1 / v2

TRIDIS v2

A semi-diplomatic model optimized for documentary manuscripts — charters, registers, feudal books, and administrative records.

Zenodo v1: 10.5281/zenodo.10788591 Zenodo v2: 10.5281/zenodo.13862096 Latin · Old French · Old Spanish v1: CER ~0.11–0.15 on external sets

  • Training data: v1: 1,855 pages, 120k lines; v2: adds 115k lines from Königsfelden, Monumenta Luxemburgensia, and others
  • Transcription policy: Semi-diplomatic
  • Time range: 11th–16th century; emphasis on late medieval documentary material
  • Best use: Charters, cartularies, account books, and administrative records in Latin/Old French/Old Spanish

Limitations: Less suited to literary manuscripts or projects requiring strictly graphematic output.

Generic CREMMA 1.0.1

Generic CREMMA for Medieval Manuscripts 1.0.1

A broad Latin/Old French model from the CREMMA project (Corpus et Reconnaissance d’Écritures Médiévales Manuscrites).

Zenodo: 10.5281/zenodo.7631619 CC BY 4.0 22.8 MB Latin · Old French

  • Training data: 7 sub-corpora, 45,885 lines, 1,357,646 characters, 76 manuscripts (8th–15th c.)
  • Transcription policy: Guideline-based (CREMMA/SegmOnto)
  • Best use: Broad Latin/French manuscripts when CATMuS is not yet available or not suitable

French and Occitan Models

Bicerin 1.0/1.1 (CREMMA Medieval)

Bicerin 1.1.0

The mature CREMMA medieval model with a strong focus on Old French manuscripts from the 12th–15th century.

Zenodo: 10.5281/zenodo.6669553 CC BY 4.0 / CC BY-SA 2.0 (check per artifact) Old French 95.30% accuracy

  • Training data: 22,662 lines in 16 manuscripts
  • Note: License inconsistency between Zenodo record and GitHub — verify before reuse.

FROC / model_froc

FROC-MSS

A transparent, well-documented model for Anglo-Norman praegothica and gothic textualis in Old French and Old Occitan.

GitHub: FROC-MSS CC BY 4.0 Old French · Old Occitan CER 7.83% (test)

  • Training data: 3,636 lines from 4 manuscripts; 80/10/10 split; allographic transcription; NFD
  • Time range: 12th–13th century
  • Best use: Anglo-Norman or Occitan documentary and literary hands; methodologically valuable as a fully transparent baseline.

Hebrew and Geniza Models

BiblIA Family

BiblIA_01 + Ashkenazi_01 / Italian_01 / Sephardi_01

Four complementary models for medieval Hebrew manuscripts: one general model and three script-specialized variants for Ashkenazi, Italian, and Sephardi book hands.

BiblIA_01: 10.5281/zenodo.5468286 Ashkenazi_01: 10.5281/zenodo.5468478 Italian_01: 10.5281/zenodo.5468573 Sephardi_01: 10.5281/zenodo.5468665 CC BY-SA 4.0 (check sofer_mahir repo for NC variant) Hebrew · Aramaic ~16 MB each

  • Training data: 202 images from BnF and Vatican medieval Hebrew manuscripts
  • Transcription: ALTO 4.2 XML with Unicode; editorial markup for additions, deletions, abbreviations
  • Best use: Primary open resource for medieval Hebrew manuscript recognition; covers the three main regional book-script traditions.

The sofer_mahir repository also includes two segmentation models for Hebrew manuscripts (regions + margins/paratext).

MiDRASH Geniza 01

MiDRASH Geniza 01

A dedicated model for Cairo Geniza fragments, covering documentary and literary texts in multiple Jewish languages.

Zenodo: 10.5281/zenodo.18732245 CC BY-NC-SA 4.0 23 MB Hebrew · Judeo-Arabic · Jewish Babylonian Aramaic

  • Training data: Fine-tuned on documentary and literary Geniza texts; exact size not reported
  • Transcription: MiDRASH guidelines; abbreviations not expanded; NFKD normalization
  • Released: December 2025
  • Best use: The clearest public model for mixed Geniza fragments in the Kraken/eScriptorium ecosystem.

Greek Models

Meleagre-NFD-finetuned

HTR Model Palatinus graecus 23 (Meleagre-NFD-finetuned)

A narrow specialist model for one specific Byzantine Greek manuscript — Codex Palatinus graecus 23 (Palatine Anthology), 10th century.

Zenodo: 10.5281/zenodo.10932751 CC BY 4.0 Ancient Greek 91.05% accuracy

  • Training data: 70 pages of Cod. Pal. gr. 23; NFD normalized
  • Best use: Edition work on this specific codex or closely related Byzantine book hands.

GreekHTR / greekmix_01

Greek Handwritten Text Recognition Model (9th–12th c.)

A preview model for Byzantine Greek minuscule from the 9th to 12th century, trained on manuscripts from the Vatican Library and the Patristic Text Archive.

Zenodo: 10.5281/zenodo.15838142 CC BY-SA 4.0 20 MB Ancient Greek

  • Status: Preview; dataset not yet released
  • Best use: Experimental baseline for patristic/Byzantine minuscule; treat as a starting point for fine-tuning rather than a production model.

Old Norse / Old Icelandic Models

OICEN-HTR Bundle

OICEN Combined 0.1

A bundle of fine-tuned models for Old Icelandic and Old Norse manuscripts, all built on CATMuS Medieval 1.6.0 as base.

Zenodo: 10.5281/zenodo.15389282 CC BY-SA 4.0 Old Icelandic / Old Norse Combined: 97.41% Char. / 91.42% Word accuracy

Individual models: - AlexS v1.0 — Alexanders saga (AM 519 a 4to); 98.81% char. accuracy - MB v0.3 — Möðruvallabók (AM 132 fol); 99.01% char. accuracy (likely overfitted) - CodWorm v1.0 — Codex Wormianus (AM 242 fol); 99.10% char. accuracy - Combined v0.1 — all three corpora merged; 97.41% char. accuracy

Transcription: Menota-based facsimile annotations; eScriptorium used for text-to-line alignment.

An instructive example of how rapidly a hand-specific model overfits when training data is very homogeneous.

Bifrost 0.1

Bifrost 0.1

A proof-of-concept FAIR-oriented release for Old Norse manuscripts, built on CATMuS Medieval.

HAL: hal-05088317 CC BY 4.0 Old Norse

  • Training data: Selected leaves from 2 manuscripts; test on 4 manuscripts
  • Status: Paper published November 2025 (HAL); standalone Zenodo model record not yet deposited as of May 2026.
  • Best use: Community release demonstrating the CATMuS fine-tuning workflow for Old Norse; not yet benchmarked at production scale.

Middle High German Models (Inzigkofen)

Inzigkofen Manuscript Models (15th century)

Three single-hand models for 15th-century Gothic scripts from the manuscripts of the Augustinian canonesses in Inzigkofen (Staatsbibliothek zu Berlin).

no public Zenodo record yet CC BY 4.0 Middle High German

Model Script Images Base model CER
textualis_anna_jaeck Textualis 49 TRIDIS v2 4.88%
bastarda_jos_von_pfullendorf Bastarda 24 CATMuS Medieval 3.35%
cursive_johannes_jaeck Cursive 38 CATMuS Medieval 4.29%

A textbook example of successful single-hand fine-tuning with very small training sets (24–49 images).


Where to Find More Models

  • HTR-United catalogue: https://htr-united.github.io/ — searchable index of community models and datasets
  • Zenodo OCR/HTR community: https://zenodo.org/communities/ocr_models/
  • GitHub: HTR-United org: https://github.com/HTR-United
  • Hugging Face: Some models are mirrored at https://huggingface.co/ (search for kraken or htr)
Back to top

Reuse

CC BY 4.0
eScriptorium
HTR-United

Automated Text Recognition Teaching Resource

GitHub · CC BY 4.0

  • Edit this page
  • Report an issue

Introduction · Models · Quiz