Asian Journal of Information Technology

Year: 2015

Volume: 14

Issue: 8

Page No. 287 - 293

DOI: 10.36478/ajit.2015.287.293

Download PDF References Abstract

A Domain-Based Approach to Extract Arabic Person Names Using N-Grams and Simple Rules

Abstract: Named Entity Recognition (NER) is considered an important task in many human language technologies including information extraction, Natural Language Processing (NLP) and Machine Translation. This is believed to be a challenging task for Arabic language. Most of the existing research studies deal only with names that are found in Modern Standard Arabic (MSA) sources such as news. In this study, we aim at building Classical Arabic name list or Gazetteer which represents an important part of a lively Arabic literature and culture. To achieve this goal, we propose a new approach for extracting Arabic Person Names (APNs). This approach constitutes a new model for extracting named entities from unstructured Arabic text without the need for Part of Speech (POS) tagging and/or morphological analysis. The proposed approach is based on formulating a model that is established on a specific domain. For this study, we use an authentic text in the literature of Islamic-Arabic studies viz, the “Hadith”. This domain is related to the Prophet Mohammad’s Peace Be Upon Him (PBUH) sayings. To achieve aims of this study, we use NLP and text mining techniques to extract and build an accurate standard list of classical APNs. Also, We built a standard evaluation classical names list in order to evaluate our approach. Results show very good precision of around 84%.

How to cite this article:

Mohammad Alhawarat , 2015. A Domain-Based Approach to Extract Arabic Person Names Using N-Grams and Simple Rules. Asian Journal of Information Technology, 14: 287-293.

DOI: 10.36478/ajit.2015.287.293

URL: https://medwelljournals.com/abstract/?doi=ajit.2015.287.293

Related Links

Journals By Subject

Asian Journal of Information Technology

A Domain-Based Approach to Extract Arabic Person Names Using N-Grams and Simple Rules

How to cite this article: