Extraction of key fields from administrative documents
Newspaper article separation
Text similarity and word embedding
Document analysis
Document stream segmentation
Natural language processing
Morphological analysis of Arabic dialects
Natural language processing of Tunisian dialect using tools and ressources of standard Arabic
Contribution of the diacritisation on morphosyntactic analysis of standard Arabic
POS tagging of standard Arabic
NLP resources
Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers [download]
Benchmark for the evaluation of named entity recognition over ancient documents [download]
Benchmark for the evaluation of entity linking over ancient documents [download]
Publications
Journal papers
ACM Computing surveys 2023 [PDF] [BibTex]
Maud Ehrmann Ahmed Hmadi , Elvys Linhares Pontes, Mateo Romanello et Antoine Doucet Named Entity Recognition and Classification on Historical Documents: A Survey
JNLE 2022 [PDF] [BibTex] Ahmed Hmadi , Elvys Linhares Pontes, Nicolas Sidère, Mickaël Coustaty et Antoine Doucet In-depth analysis of the impact of OCR errors on named entity recognition and linking
Natural Language Engineering Journal
IJDL 2021 [PDF] [BibTex]
Elvys Linhares Pontes, Luis Adrian Cabrera-Diego, Jose G. Moreno, Emanuela Boros,
Ahmed Hamdi, Nicolas Sidère, Mickaël Coustaty et Antoine Doucet MELHISSA: A Multilingual Entity Linking Architecture for Historical Press Articles
International Journal of Digital Libraries
International conferences
ECIR 2023 (Core A) [PDF] [BibTex]
Štěpán Šimsa, Milan Šulc, Michal Uřičář, Yash Patel, Ahmed Hamdi , Matěj Kocián, Matyáš Skalický, Jiří Matas, Antoine Doucet, Mickaël Coustaty, Dimosthenis Karatzas DocILE Benchmark for Document Information Localization and Extraction
The 45th European Conference on Information Retrieval
ECIR 2023 (Core B) [PDF] [BibTex]
Štěpán Šimsa, Milan Šulc, Matyáš Skalický, Yash Patel, Ahmed Hamdi DocILE 2023 Teaser: Document Information Localization and Extraction
The 45th European Conference on Information Retrieval
ICCCI 2022 (Core B) [PDF] [BibTex]
Sana Hamdi, Ahmed Hmadi , Sadok Ben Yahia BERT and Word Embedding for Interest Mining of Instagram Users
The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
SIGIR 2021 (Core A*) [PDF] [BibTex] Ahmed Hmadi , Elvys Linhares Pontes, Emanuela Boros, Tuyet Hai Nguyen Thi, Hackl Günter, Jose G. Moreno et Antoine Doucet A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers
The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
ICDAR 2021 (Core A) [PDF] [BibTex] Ahmed Hmadi , Elodie Carel, Aurélie Joseph, Mickaël Coustaty et Antoine Doucet Information extraction from invoices
The 16th International Conference on Document Analysis and Recognition
TPDL 2020 (Core B) [PDF] [BibTex] [Slides] Ahmed Hmadi, Axel-Jean Caurant, Nicolas Sidère, Mickaël Coustaty and Antoine Doucet
Assessing and Minimising the Impact of OCR Quality on Named Entity Recognition
Digital Libraries for Open Knowledge - 24th International Conference on Theory and Practice of Digital Libraries. Springer, 2020, pp. 87–101. DOI: 10.1007/978-3-030-54956-5\_7. https://doi.org/10.1007/978-3-030-54956-5\_7
CONLL 2020 (Core A) [PDF] [BibTex]
Emanuela Boros, Ahmed Hamdi, Elvys Linhares Pontes, Luis Adrian Cabrera-Diego, Jose G. Moreno Nicolas Sidère and Antoine Doucet Alleviating Digitization Errors in Named Entity Recognition for Historical Documents
Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 431--441.
https://www.aclweb.org/anthology/2020.conll-1.35
ICADL 2020 (Core A) [PDF] [BibTex] [Slides]
Vinh-Nam Huynh, Ahmed Hamdi and Antoine Doucet When to Use {OCR} Post-correction for Named Entity Recognition?
Digital Libraries at Times of Massive Societal Transition - 22nd International Conference on Asia-Pacific Digital Libraries. Lecture Notes in Computer Science 12504 (2020), pp, 33-42. DOI: 10.1007/978-3-030-64452-9\_3. https://doi.org/10.1007/978-3-030-64452-9\_3
ICADL 2020 (Core A) [PDF] [BibTex]
Elvys Linhares Pontes, Luis Adrian Cabrera-Diego, Jose G. Moreno, Emanuela Boros, Ahmed Hamdi, Nicolas Sidère, Mickaël Coustaty
et Antoine Doucet Entity Linking for Historical Documents: Challenges and Solutions
Digital Libraries at Times of Massive Societal Transition - 22nd International Conference on Asia-Pacific Digital Libraries. Lecture Notes in Computer Science 12504 (2020), pp, 215-231. DOI: 10.1007/978-3-030-64452-9\_19. https://doi.org/10.1007/978-3-030-64452-9\_19
JCDL 2019 (Croe A*) [PDF] [BibTex] [Poster] Ahmed Hamdi, Axel Jean-Caurant, Nicolas Sidere, Mickaël Coustaty and Antoine Doucet An Analysis of the Performance of Named Entity Recognition over OCRed
Documents
19th ACM/IEEE Joint Conference on Digital Libraries, pp, 333-334. DOI: 10.1109/JCDL.2019.00057.
https://doi.org/10.1109/JCDL.2019.00057
ICADL 2019 (Core A) [PDF] [BibTex]
Elvys Linhares Pontes, Ahmed Hamdi , Nicolas Sidere
et Antoine Doucet Impact of OCR Quality on Named Entity Linking
Digital Libraries at the Crossroads of Digital Information for the
Future - 21st International Conference on Asia-Pacific Digital Libraries.
Lecture Notes in Computer Science 11853 (2019) pp, 102-11.
DOI: 10.1007/978-3-030-34058-2\_11. https://doi.org/10.1007/978-3-030-34058-2\_11
DAS 2018 (Core B) [PDF] [BibTex] Ahmed Hamdi, Mickaël Coustaty, Aurélie Joseph, Vincent Poulain D'Andecy, Antoine Doucet and Jean-Marc Ogier Feature Selection for Document Flow Segmentation
13th IAPR International Workshop on Document Analysis Systems. IEEE Computer Society, pp, 245--250. DOI:10.1109/DAS.2018.66. https://doi.org/10.1109/DAS.2018.66
MT SUMMIT 2013 (Core B) [PDF] [BibTex] Ahmed Hamdi , Rahma Boujelbane, Nizar Habash and Alexis Nasr The Effects of Factorizing Root and Pattern Mapping in Bidirectional Tunisian - Standard Arabic Machine Translation
MT Summit 2013. https://hal.archives-ouvertes.fr/hal-00908761
Internationaux workshops
CEUR@CLEF 2020 [PDF] [BibTex]
Emanuela Boros, Elvys Linhares Pontes, Luis Adrian Cabrera-Diego,
Ahmed Hamdi, Jose G. Moreno, Nicolas Sidère and Antoine Doucet Robust Named Entity Recognition and Linking on Historical Multilingual
Documents
Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, 2020. http://ceur-ws.org/Vol-2696/paper\_171.pdf
WML@ICDAR 2017 [PDF] [BibTex] Ahmed Hamdi, Joris Voerman, Mickaël Coustaty, Aurélie Joseph, Vincent Poulain D'Andecy and Jean-Marc Ogier Machine Learning vs Deterministic Rule-Based System for Document Stream
Segmentation
First Workshop of Machine Learning, 14th {IAPR} International Conference on Document Analysis and Recognition, WML@ICDAR 2017. IEEE, pp, 77-82. DOI: 10.1109/ICDAR.2017.332 https://doi.org/10.1109/ICDAR.2017.332
WANLP@ACL 2015 [PDF] [BibTex] Ahmed Hamdi, Alexis Nasr, Nizar Habash and Nuria Gala POS-tagging of Tunisian Dialect Using Standard Arabic Resources and
Tools
Proceedings of the 2nd Workshop on Arabic Natural Language Processing
Processing, ANLP@ACL 2015. Association for Computational Linguistics.
DOI: 10.18653/v1/W15-3207. https://doi.org/10.18653/v1/W15-3207
VarDial@COLING 2014 [PDF] [BibTex] Ahmed Hamdi, Nuria Gala and Alexis Nasr Automatically building a Tunisian Lexicon for Deverbal Nouns
Proceedings of the $1^{st}$ Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects, VarDial@COLING 2014. Association for Computational Linguistics and Dublin City University, pp, 95-102. DOI: 10.3115/v1/W14-5311.
https://doi.org/10.3115/v1/W14-5311
Non-peer reviewed papers
arXiv preprint 2023 [PDF] [BibTex]
Štěpán Šimsa, Milan Šulc, Michal Uřičář, Yash Patel, Ahmed Hamdi , Matěj Kocián, Matyáš Skalický, Jiří Matas, Antoine Doucet, Mickaël Coustaty, Dimosthenis Karatzas DocILE Benchmark for Document Information Localization and Extraction
arXiv preprint arXiv:2302.05658
arXiv preprint 2023 [PDF] [BibTex]
Štěpán Šimsa, Milan Šulc, Matyáš Skalický, Yash Patel, Ahmed Hamdi DocILE 2023 Teaser: Document Information Localization and Extraction
arXiv preprint arXiv:2301.12394
arXiv preprint 2021 [PDF] [BibTex]
Maud Ehrmann Ahmed Hmadi , Elvys Linhares Pontes, Mateo Romanello et Antoine Doucet Named Entity Recognition and Classification on Historical Documents: A Survey
arXiv preprint arXiv:2109.11406 (une version plus courte sera bientôt publiée sur ACM Computing Survey)
Projects delivrables
[PDF] [BibTex] Ahmed Hamdi, Elvys Linhares Pontes and Antoine Doucet Named Entity Recognition and Linking on historical newspapers.
NewsEye 2020
[PDF] [BibTex] Ahmed Hamdi, Thi Tuyet Haï Nguyen and Antoine Doucet Stance detection on historical newspapers. NewsEye 2020
Thesis dissertation
[PDF]
[BibTex] Ahmed Hamdi Traitement automatique du dialecte tunisien à l'aide d'outils
et de ressources de l'arabe standard : application à l'étiquetage morphosyntaxique
Aix-Marseille University
National conferences
CORIA 2021 [PDF] [BibTex]
Emanuela Boros, Ahmed Hamdi, Elvys Linhares Pontes, Luis Adrian Cabrera-Diego, Jose G. Moreno Nicolas Sidère and Antoine Doucet Atténuer les erreurs de numérisation dans la reconnaissance d’entités nommées pour les documents historiques
Conférence francophone en Recherche d’Information et Application
TALN 2013 [PDF] [BibTex] Ahmed Hamdi, Rahma Boujelbane, Nizar Habash and Alexis Nasr Un système de traduction de verbes entre arabe standard et arabe dialectal par analyse morphologique profonde
Traitement Automatique des Langues Naturelles,
The Association for Computer Linguistics, pp, 395-406. https://www.aclweb.org/anthology/F13-1029
RECITAL 2012 [PDF] [BibTex] Ahmed Hamdi Apport de la diacritisation de l'analyse morphosyntaxique de l'arabe Proceedings of the Joint Conference JEP-TALN-RECITAL 2012. ATALA/AFCP pp, 247-254. https://www.aclweb.org/anthology/F12-3019