Description |
In the first step each page in a book is assigned with a logical page number. Then, all TOC pages are detected and TOC structure extraction is performed on each. After all words in TOC pages are labeled the words are grouped into entries. Each entry is given a depth level acording to the entry clustering. Also, each entry is assigned with a link by searhing for the entry title on the target page (even if the entry does not have page number; if it does a logical page numbers are used). |