Structure Extraction competition @ ICDAR - Training Data

Training Data

In 2009 and 2011, the construction of the ground truth was the fruit of a collaborative effort led during the competition. The incentive and reward of annotators was to gain exclusive access to the ground truth until the following competition. In 2013, the ground truth was built by an external service provider and made freely available right after the competition ended. Currently, the ground truth for all rounds of the competition is freely available. You can hence start running your experiments right away!

You can freely download the 100-book subset of the 2009 ground truth set.

You may also download the full 2009 ground truth, which contains 527 ToCs annotated by the joint effort of the participants. This data set is described in the IJDAR paper presented below.

You may also download the full 2011 ground truth, which contains 513 ToCs annotated by the joint effort of the participants. This data set is described in the ICDAR 2011 paper overviewing the competition (see our references page).

You may also download the full 2013 ground truth, which contains 967 ToCs annotated by an external provider, hence granting higher consistency. This data set is described in the ICDAR 2013 paper overviewing the competition (see our references page).

All archives include evaluation software: title-based (.exe) and link-based (python script).

Description

The general competition methodology is described in the following 2010 IJDAR paper:

Antoine Doucet, Gabriella Kazai, Bodin Dresevic, Aleksandar Uzelac, Bogdan Radakovic and Nikola Todic, "Setting up a Competition Framework for the Evaluation of Structure Extraction from OCR-ed Books" (draft), in International Journal of Document Analysis and Recognition (IJDAR), special issue on "Performance Evaluation of Document Analysis and Recognition Algorithms", 22 pages, 2010. [ BibTex ]

Contact - Registration

Antoine Doucet: "antoine DOT doucet AT unicaen DOT fr"