A Domain-Based Approach to Extract Arabic Person Names Using N-Grams and Simple Rules

Abstract: Named Entity Recognition (NER) is considered an important task in many human language technologies including information extraction, Natural Language Processing (NLP) and Machine Translation. This is believed to be a challenging task for Arabic language. Most of the existing research studies deal only with names that are found in Modern Standard Arabic (MSA) sources such as news. In this study, we aim at building Classical Arabic name list or Gazetteer which represents an important part of a lively Arabic literature and culture. To achieve this goal, we propose a new approach for extracting Arabic Person Names (APNs). This approach constitutes a new model for extracting named entities from unstructured Arabic text without the need for Part of Speech (POS) tagging and/or morphological analysis. The proposed approach is based on formulating a model that is established on a specific domain. For this study, we use an authentic text in the literature of Islamic-Arabic studies viz, the “Hadith”. This domain is related to the Prophet Mohammad’s Peace Be Upon Him (PBUH) sayings. To achieve aims of this study, we use NLP and text mining techniques to extract and build an accurate standard list of classical APNs. Also, We built a standard evaluation classical names list in order to evaluate our approach. Results show very good precision of around 84%.

HOME JOURNALS CONTACT

Asian Journal of Information Technology

A Domain-Based Approach to Extract Arabic Person Names Using N-Grams and Simple Rules

Mohammad Alhawarat

How to cite this article

Mohammad Alhawarat , 2015. A Domain-Based Approach to Extract Arabic Person Names Using N-Grams and Simple Rules. Asian Journal of Information Technology, 14: 287-293.