Description This result was created with the ToC Extraction software of University of Wuerzburg. Existing ToC algorithms are either based on formal or functional properties of the ToC. Our implementation merges the best of the two classes. It combines formal information that is retrieved during the optical character recognition process and functional information from text analysis. The implementation works on the output of standard OCR software. Several analysis modules assess ToC link candidates and weigh them according to different heuristics. As these modules work completely independent, new components can be added easily to further increase the reliability. It is also possible to use a tailored subset of modules to account for specific properties of the underlying book. Furthermore, our implementation uses smart assumptions during ToC reconstruction to minimize the computational effort. Thus, the presented approach not only has a very high accuracy, but also is very efficient in terms of speed.
  Precision Recall F-Measure
Titles 31,89% 35,42% 31,65%
Levels 22,81% 26,82% 23,29%
Links 27,06% 29,56% 26,95%
Complete entries 19,16% 21,97% 19,61%
Entries disregarding depth 27,06% 29,56% 26,95%