Wajdi Zaghouani

Research Interests

Annotation, Arabic NLP, Corpus creation, Named entities extraction, Lexicon creation, Text mining, electronic dictionnaries, PropBank, Machine translation,Text categorisation, Information retrieval, Computational linguistics, TreeBank.

Recent projects

- Arabic Spelling Correction
- Quranic Arabic PropBank
- Arabic Pilot Propbank
- Arabic Treebank annotation
- Arabic Part of speech annotation
- Fairuz : The arabic heritage historical dictionary
- The department of Education AlKitaab

 


University Education

2006-2009

Master of linguistics, University of Montreal, Quebec, Canada  

1999-2002

Bachelor degree in computational linguistics, University of Quebec at Montreal, Quebec, Canada

1996 - 1999

D.U.E.L in French language and literature , University of Kairouan, Raqqada, Tunisia   

 

Selected Publications

 

Books and Book Chapters

 

1) Wajdi Zaghouani. 2011.  Le repérage automatique des entités nommées dans la langue arabe.  Book published by Les Éditions universitaires européennes. 156p. ISBN  9786131565953, Germany.

2)  J. VÉRONIS, O. HAMON, C. AYACHE, R. BELMOUHOUB, O. KRAIF, D. LAURENT, T.M.H. NGUYEN, N. SEMMAR, F. STUCK, W. ZAGHOUANI. (2008).  La campagne d'évaluation ARCADE II. In Chaudiron, S. & Choukri, K. (Eds.) L'évaluation des technologies de traitement de la langue (pp 47-69). Paris:  Hermes Science Publications, IC2 Cognition Collection.  ISBN 978-2-7462-1992-2.

 

Journals and Conferences

 

1)    Ossama Obeid, Wajdi Zaghouani, Behrang Mohit, Nizar Habash, Kemal Oflazer and Nadi Tomeh. A Web-based Annotation Framework For Large-Scale Text Correction. In Proceedings of IJCNLP’2013,  Nagoya, Japan.

 

2)    Abdelaati Hawwari, Mohsen Rashwan and Wajdi Zaghouani. 2013. A Lexical Semantic Resource for Quranic Morphological Patterns. The International conference for the development of Quranic studies. http://www.quranicconferences.com/ . Riyadh, KSA. 16-20 February 2013.

 

3)    Wajdi Zaghouani. Arabic Natural Language Processing and the Future. In proceedings of the CECTAL’13, Montreal, Canada. Sept 26th 2013.

 

4)    Hawwari, A.; Zaghouani, W.; O'Gorman, T.; Badran, A.; Diab, M., "Building a lexical semantic resource for Arabic morphological Patterns," Communications, Signal Processing, and their Applications (ICCSPA), 2013 1st International Conference on , vol., no., pp.1,6, 12-14 Feb. 2013. https://www2.aus.edu/conferences/iccspa/

 

5)    Wajdi Zaghouani. 2012. RENAR: A Rule-Based Arabic Named Entity Recognition System. ACM Trans. Asian Lang. Inf. Process. 11(1): 2 (2012).

 

6)    Wajdi Zaghouani, Abdelati Hawwari and Mona Diab. 2012. A Pilot PropBank Annotation for Quranic Arabic. In Proceedings of the first workshop on Computational Linguistics for Literature, NAACL-HLT 2012, Montreal, Canada.

 

7)    Mohammed Maamouri, Wajdi Zaghouani, Violetta Cavalli-Sforza, Dave Graff and Mike Ciul. 2012. Developing ARET: An NLP-based Educational Tool Set for Arabic Reading Enhancement. In Proceedings of The 7th Workshop on Innovative Use of NLP for Building Educational Applications, NAACL-HLT 2012, Montreal, Canada.

 

8)    Wajdi Zaghouani. Vers la création d'un corpus annoté sémantiquement pour la langue Arabe.2012. To be presented at the Informatique Cognitive 2012 (IC’2012) conference, Montreal, Canada, 6-7 June 2012.

 

9)    Wajdi Zaghouani. 2012. Étude sur la composition des noms de personnes dans la langue arabe. In proceedings of the 25th Journées de linguistique de Laval. 9-11 March 2011, Laval University , Québec, Canada.

 

10)  Wajdi Zaghouani. 2011. Le développement d'un corpus annoté sémantiquement pour la langue arabe. ACFAS 2011, May 9-13 2011, Sherbrooke, QC, Canada.

 

11)  Wajdi Zaghouani. 2011. RENAR : un système de repérage automatique des entités nommées pour la langue arabe. In Traitement automatique des langues : analyses et applications workshop, ACFAS 2011, May 13 2011, Sherbrooke, QC, Canada.

 

12)  Wajdi Zaghouani , Mona Diab , Aous Mansouri, Sameer Pradhan and Martha Palmer.2010.  The Revised Arabic PropBank. In  proceedings of the  4th  Linguistic  Annotation  workshop  ACL  held in  Uppsala. July 15-16 2010.

 

13) Eric Atwell, Kais Dukes, Abdul-Baquee Sharaf, Nizar Habash, Bill Louw,Bayan Abu Shawar,Tony McEnery,Wajdi Zaghouani, Mahmoud El-Haj. 2010.Understanding the Quran: a new Grand Challenge for Computer Science and Artificial Intelligence. In Grand Challenges in Computing Research for 2010 and beyond. part of  ACM-BCS Visions of Computer Science conference. 13-16 April 2010, Edinburgh University

 

14) Wajdi Zaghouani, Ralf Steinberger and Bruno Pouliquen.2010. A resource-light Arabic Named Entity Recognition system . Georgetown University Round Table 2010. Arabic Language and Linguistics, March 12 - 14 2010.

 

15) Mohamed Maamouri, Ann Bies, Seth Kulick, Wajdi Zaghouani, Dave Graff and Mike Ciul.
2010.From Speech to Trees: Applying Treebank Annotation to Arabic Broadcast News.  In Proceedings of LREC 2010, Valetta, Malta, May 17-23, 2010.

 

16) Wajdi Zaghouani, Bruno Pouliquen, Mohamed Ebrahim and Ralf Steinberger .2010. Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic. In Proceedings of LREC 2010, Valetta, Malta, May 17-23, 2010.

 

17) Wajdi Zaghouani. 2009. Le repérage automatique des entités nommées dans la langue arabe : vers la création d'un système à base de règles. Master dissertation, University of Montreal. Under the supervision of Dr. Patrick Drouin and Dr. Richard Kittredge.

 

18) Mona Diab, Aous Mansouri, Martha Palmer, Olga Babko-Malaya,Wajdi Zaghouani, Ann Bies, Mohammed Maamouri. A Pilot Arabic Propbank; LREC 2008, Marrakech, Morocco, May 28-30, 2008.

 

19) Bruno Pouliquen, Marco Kimler, Ralf Steinberger,  Camelia Ignat, Tamara Oellinger, Ken Blackler, FlavioFuart, Wajdi Zaghouani, Anna Widiger, Ann-Charlotte Forslund, Clive Best (2006). Geocoding multilingual texts: Recognition, Disambiguation and Visualisation. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006), pp. 53-58. Genoa, Italy, 24-26 May 2006.

 

20) Yun-Chuang Chiao, Olivier Kraif, Dominique Laurent, Thi Minh Huyen Nguyen, Nasredine Semmar, François Stuck, Jean Véronis, Wajdi Zaghouani (2006). Evaluation of multilingual text alignment systems: the ARCADE II project. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006). Genoa, Italy, 24-26 May 2006.

 

21) Pouliquen Bruno, Ralf Steinberger, Camelia Ignat, Irina Temnikova, Anna Widiger, Wajdi Zaghouani & Jan Žižka (2005). Multilingual person name recognition and transliteration. Journal CORELA - Cognition, Représentation, Langage. Numéros spéciaux, Le traitement lexicographique des noms propres. Available online at: http://edel.univ-poitiers.fr/corela/document.php?id=490. ISSN 1638-5748.

 

22) Pouliquen Bruno, Ralf Steinberger, Camelia Ignat, Irina Temnikova, Wajdi Zaghouani & Jan Žižka “Detection of person names and their translations in multilingual news”, Colloque Traîtement lexicographique des noms propres, Tours, 24 March 2005.

 

23) Zaghouani Wajdi. “Cognitive sciences & linguistics” , Teluq-UQAM , 23 September 2003

 

24) Zaghouani Wajdi. “AUTO-ÉVAL : an automatic text evaluation system “, In proceedings of the CESLA colloquim, Montréal, UQAM, 2002.

 

Conferences Presentations

 

1)    Wajdi Zaghouani. Arabic Natural Language Processing and the Future. In proceedings of the CECTAL’13, Montreal, Canada. Sept 26th 2013.

 

2)    Wajdi Zaghouani, Ossama Obeid,  Behrang Mohit and Kemal Oflazer.  2013. The Qalb Project: Building Resources and Systems for the automatic Correction of Arabic Text. Best Poster at the 2013 Meetings of the Mind, Carnegie Mellon University, Doha, Qatar. http://issuu.com/carnegiemellonqatar/docs/mom_digest_2013_print/1

 

3)     Wajdi Zaghouani. 2013. "RENAR: A Rule-Based Arabic Named Entity Recognition System".   Oral presentation at the Carnegie Mellon University Natural Language Processing Weekly Round Table.

 

4)    Wajdi Zaghouani. Building a Lexical Semantic Resource for Arabic Morphological Patterns. 2013. Oral presentation at the Arabic Natural Language Processing session in the International Conference on Communications,
Signal Processing and their Applications (ICCSPA’13). Sharjah, UAE.

 

5)    Wajdi Zaghouani and Abdelaati Hawwari. La construction d'une ressource lexicale pour la langue arabe. May 9th 2013.  ACFAS Colloquium, Laval University, Quebec. http://www.acfas.ca/evenements/congres/programme/81/300/305/d

6)    Wajdi Zaghouani. 2013. The recent adavnce of PropBank Annotation for Arabic. Oral presentation at the UQAM - AECSL Round Table, Montreal, Canada.

 

7)    Wajdi Zaghouani. 2013. "RENAR: A Rule-Based Arabic Named Entity Recognition System".   Oral presentation at the Carnegie Mellon University NLP Round Table.

 

8)    Wajdi Zaghouani. 2012. Developing ARET: An NLP-based Educational Tool Set for Arabic Reading Enhancement. Oral presentation at the 7th Workshop on Innovative Use of NLP for Building Educational Applications, NAACL-HLT 2012, Montreal, Canada.

 

9)    Wajdi Zaghouani. Vers la création d'un corpus annoté sémantiquement pour la langue Arabe.2012.  Oral presentation at the the Informatique Cognitive 2012 (IC’2012) conference, Montreal, Canada, 6-7 June 2012. https://sites.google.com/site/informatiquecognitive2012/programme

 

10) Wajdi Zaghouani, Abdelati Hawwari and Mona Diab. 2012. A Pilot PropBank Annotation for Quranic Arabic. Oral presentation at  the first workshop on Computational Linguistics for Literature, NAACL-HLT 2012, Montreal, Canada.

 

11) Wajdi Zaghouani. 2011. Le construction d'un corpus annoté sémantiquement pour la langue arabe. Oral presentation at  the ACFAS 2011, May 9-13 2011, Sherbrooke, QC, Canada.

 

12) Wajdi Zaghouani. 2011. RENAR : un système de repérage automatique des entités nommées pour la langue arabe.  Oral presentation at Traitement automatique des langues : analyses et applications workshop, ACFAS 2011, May 13 2011, Sherbrooke, QC, Canada.

 

13) Wajdi Zaghouani. 2011. Étude sur la composition des noms de personnes dans la langue arabe. Oral presentation at the 25th Journées de linguistique de Laval. 9-11 March 2011, Laval University , Québec, Canada.

 

14) Wajdi Zaghouani, Ralf Steinberger and Bruno Pouliquen.2010. A resource-light Arabic Named Entity Recognition system . Georgetown University Round Table 2010. Arabic Language and Linguistics, March 12 - 14 2010.

 

15) Pouliquen Bruno, Ralf Steinberger, Camelia Ignat, Irina Temnikova, Wajdi Zaghouani & Jan Žižka “Detection of person names and their translations in multilingual news”, ColloqueTraîtement lexicographique des noms propres, Tours, 24 March 2005.

 

16)  Wajdi Zaghouani. “Cognitive sciences & linguistics” , Teluq-UQAM , 23 September 2003

 

 

17) Zaghouani Wajdi. “AUTO-ÉVAL : an automatic text evaluation system “, Oral presentation at the CESLA colloquim, Montréal, UQAM, 2002.

 

Published Resources

 

1)    Arabic Treebank Part 1 Version 4.1. LDC Catalog, LDC2010T13. 2010. Mohamed Maamouri, Ann Bies, Seth Kulick, Fatma Gaddeche, Wigdan Mekki, Sondos Krouna, Basma Bouziri, Wajdi Zaghouani. Linguistic Data Consortium.

 

2)    Arabic Treebank: Part 2 v 3.1. LDC2011T09. ISBN 1-58563-590-1. 2011. Mohamed Maamouri, Ann Bies, Seth Kulick, Fatma Gaddeche, Wigdan Mekki, Sondos Krouna, Basma Bouziri, Wajdi Zaghouani.  Linguistic Data Consortium

 

3)    Arabic Arabic Treebank: Part 3 v 3.2. 2010. LDC Catalog No. : LDC2010T08. 2010. Mohamed Maamouri, Ann Bies, Seth Kulick, Sondos Krouna, Fatma Gaddeche, Wajdi Zaghouani. Linguistic Data Consortium

 

4)    Arabic Treebank: Part 8 v 1.1. 2010 .LDC Catalog No. LDC2010E11* . 2010. Mohamed Maamouri, Ann Bies, Seth Kulick, Wajdi Zaghouani.  Linguistic Data Consortium

 

5)    Arabic Treebank: Part 9 v 1.0. 2010. LDC Catalog No. LDC2010E19* . 2010. Mohamed Maamouri, Ann Bies, Seth Kulick, Wajdi Zaghouani. Linguistic Data Consortium

 

6)    Arabic Treebank: Part 10 v 1.0 2010. LDC Catalog No. LDC2010E22*. 2010. Mohamed Maamouri, Ann Bies, Seth Kulick, Fatma Gaddeche, Sondos Krouna, Wajdi Zaghouani. Linguistic Data Consortium

 

7)    Arabic Treebank Part 13 V1.0, CatalogID: LDC2011E18*.  Mohamed Maamouri, Ann Bies, Seth Kulick, Sondos Krouna, Dalila Tabassi, Michael Ciul, Wajdi Zaghouan. Linguistic Data Consortium

 

8)    Arabic Treebank Part 14 V2.0 - CatalogID: LDC2011E52*. Mohamed Maamouri, Ann Bies, Seth Kulick, Sondos Krouna, Dalila Tabassi, Michael Ciul, Wajdi Zaghouan .Linguistic Data Consortium

 

9)    Arabic Treebank Part 15 V2.0 - CatalogID: LDC2012E10*. Mohamed Maamouri, Ann Bies, Seth Kulick, Sondos Krouna, Dalila Tabassi, Michael Ciul, Wajdi Zaghouani. Linguistic Data Consortium

 

10) Arabic Treebank Part 16 V2.0 - CatalogID: LDC2011E116*. Mohamed Maamouri, Ann Bies, Seth Kulick, Sondos Krouna, Dalila Tabassi, Michael Ciul, Wajdi Zaghouani. Linguistic Data Consortium.

Reports and unpublished manuscripts

 

1)    Wajdi Zaghouani, Nizar Habash, Behrang Mohit. The Qatar Arabic Language Bank Annotation Guidelines (QALB). To be published. 2014.

2)    ARABIC PROPBANK ANNOTATION GUIDELINES. 2006. Olga Babko-Malaya, Aous Mansouri, Wajdi Zaghouani. May 2006.

3)    Zaghouani Wajdi. “Evaluation of arabic-english Machine translation systems”, non published manuscript,2004

4)    Sylvie Guillemin-Lanne and Wajdi Zaghouani. 2008. Arabic Named entity recognition. Technical report, Temis, France.

5)    Wajdi Zaghouani and Sylvie Guillemin-Lanne. 2008. The Arabic LUXID Text Mining Skill Cartridge. Technial report, Temis, France.

Reports and unpublished manuscripts

ARABIC PROPBANK ANNOTATION GUIDELINES. 2006. Olga Babko-Malaya, Aous Mansouri, Wajdi Zaghouani. May 2006.

Zaghouani Wajdi. “Evaluation of arabic-english Machine translation systems”, non published manuscript,2004

Sylvie Guillemin-Lanne and Wajdi Zaghouani. 2008. Arabic Named entity recognition. Technical report, Temis, France.

Wajdi Zaghouani and Sylvie Guillemin-Lanne. 2008. The Arabic LUXID Text Mining Skill Cartridge. Technial report, Temis, France.

 

Honors / Awards

1)    Best Poster Award, Post-graduate category. Meetings of the minds, Carnegie Mellon University. The QALB Project : Building Resources and Systems for the Automatic Correction of Arabic Text.

                http://issuu.com/carnegiemellonqatar/docs/mom_digest_2013_print/1

2)    2011-2014. Joseph-Armand Bombardier Canada Graduate Scholarships Program Doctoral Scholarships (http://www.sshrc-crsh.gc.ca/results-resultats/2011/cgs_docs_2011.pdf)

3)    Best poster award at th Cognitive Informatics Conference. 2012. The University of Quebec in Montreal (UQAM), 5-6 June 2012. Montreal, Canada

4)    ATALA student travel grant to participate in the TALN 2010 conference in Montreal.


Conference / Workshop Program Committees

 

1)    Chair of  Arabic Natural Language Processing session at the 13th International Conference on Communications Signal Processing and their Applications, Sharjah, UAE.

2)    President and Chair of the CEC-TAL 2014 Conference, Tunisia

3)    Member of the RANLP’2013 Program committee, Sofia, Bulgaria.

4)    President and Chair of the CEC-TAL 2013 Conference, Montreal, Canada

 

5)    Program committee member of the 2012 LREC workshop on religious text 

 

6)    Member of the RANLP’2011 Program committee

 

International Journals / Grant Reviewing

 

1) National Science Foundation (NSF) research proposal expert reviewer.

2) Review of articles in the Journal of Natural Language Engineering

3) Review of articles in Language Resources and Evaluation journal.

Citation Metrics

 

 Citations 157; h-index 7; i10-index 4. Google Scholar (last updated Jun 14, 2013).

 

Language Skills

Arabic Standard:

Native language.

Tunisian Arabic

Native level

Arabic Dialects:

North African, Egyptian, Levantine and Gulf Arabic : fair level

English:

Good level

French:

Near Native level

Italian:

Good level


Employment

Sep. 2012- to present : Research Associate, Carnegie Mellon University (Qatar). Tasks includes:

- Guidelines Creation for Qatar Arabic Language Bank Project.

- Managing a team of 10 annotators

- Corpus Creation

- Progress reports

- Act as a lead annotator and annotation manager (Workflow management)

 

Feb. 2006 – Dec. 2011: Visiting Scholar, Programmer Analyst and Annotation team manager at the Linguistic Data Consortium <http://www.ldc.upenn.edu>. Tasks includes:

 

-          Helping the project manager in the NSF grants (proposals, post award tracking…)

-          Recruiting and managing annotators for projects such as the Iraqi Pronunciation dictionary.

-          Remotely Managing Treebank annotators from various regions in the world such as Philadelphia, New York, Paris, Tunisia.

-          Pronunciation dictionary development and QC control.

-          Annotators training for the PropBank project

-          Pilot Arabic Semantic role labeling over the Arabic Treebank (Propbank Project).

-          Rules creation for an English / Arabic word alignment project

-          Improvement of the Buckwalter Arabic Morphological analyzer.

-          Arabic Treebank annotation tools maintenance.

-          Arabic Treebank QC control tools maintenance

-          Assigning annotation work to various annotators.

-          Supervising the lexicon building for an Iraqi Arabic dialect dictionary.

-          Workflow management of the Arabic Treebank annotation effort.

-          Writing technical reports.

 

Jan. 2009- Aug. 2010 : Independent consultant for Colorado University with Dr. Martha Palmer (http://clear.colorado.edu/start/index.html). All work done remotely. Tasks: Arabic lexicon building and annotation to cover all Arabic verbs in the Penn Treebank (more details in the listed publications).

 

Dec. 2007 –Jul. 2008 : Independent Consultant for the French company NLP  company TEMIS based in Paris  www.temis.com

 

Working remotely from Montreal / Philadelphia. Tasks:

 

-          The creation of regular expressions Rules for a rule based information extract system (LUXID Text Mining Skill Cartridge).

-          Grammar rules creation for French and Arabic to extract Named entities.

-          Quality control of the named entities extraction tool.

-          Evaluation the system created during a NIST evaluation workshop.

-          Integration and testing of an Arabic POS tagger to improve the performance of the Arabic skills cartridge.

-          Building large lexicon of people, locations and organizations for the Arabic language.

 

Mar. 2005 –Jan. 2006 : Research Intern at the Language technology group JRC  (Joint Research Center of the European Commission in Italy ) <http://langtech.jrc.it/>. Tasks:

 

-       Grammars, patterns and rules creation for the Arabic EMM module (named entities, quotations).

-       Arabic transliteration mapping rules creation.

-       Semi-automatic named entities list building.

-       Helped in the improvement of existent named entity database.

-       Various other tasks related to EMM news explorer, EMM news brief

-       Arabic / French corpus collection from newswire.

-       Improved the Medisys system by increasing the terminology and the rules for the Arabic version

-       Building a terrorism lexical multilingual terminology (Arabic and Farsi)            

 

Sep. 2004 – Mar. 2005  :Teaching assistant at the University of Montreal language LAB. Main task : Helping French Second language students to improve their French pronunciation.    

 

Sep. 2002- Jun. 2004 : Computational linguist at NSTEIN technologies  http://nstein.com/ in Montreal, now the company become part of OPENTEXT Inc.

Tasks:

-       The adaptation of machine learning and rule based Part of speech tagger to the Arabic language.

-       Rules creation for GPHIN Global Public Health Intelligence Network (English, French, Italian, Arabic).

-       Assessment and Improvement of the Sakhr Machine translation SDK included with the GPHIN system.

-       Building an Arabic lexical database from various sources.

-       Arabic-english cross language information retrieval system improvement.

-       Rules creation for the Arabic module of the Nstein concept extractor (simple and complex concepts), which is extracts the people, the locations and the organizations.

-       Rules creation and lexicons building for the Nfinder module for the extraction of named entities.

-       Adapting the Ncategorizer (text categorization module) to the Arabic using the clustering technique.

 

Jan. 2000- Dec. 2001 : Conception of the Website journal UQAM (University of Quebec at Montreal). Tasks includes Web site edition, French Proof reading and writing news reports in French.

 

Feb. 2000- Dec 2001 : Student researcher for the French verb lemmatization project at the ATO lab (University of Quebec at Montreal).

Arabic NLP Links