The Arabic Corpus {compiled by Dr. Mourad Abbas ( http://sites.google.com/site/mouradabbas9/corpora ) The corpus Khaleej-2004 contains 5690 documents. It is divided to 4 topics (categories). The corpus Watan-2004 contains 20291 documents organized in 6 topics (categories). Researchers who use these two corpora would mention the two main references:
(1) For Watan-2004 corpus
———————-
M. Abbas, K. Smaili, D. Berkani, (2011) Evaluation of Topic Identification Methods on Arabic Corpora,JOURNAL OF DIGITAL INFORMATION MANAGEMENT,vol. 9, N. 5, pp.185-192.
2) For Khaleej-2004 corpus
———————————
M. Abbas, K. Smaili (2005) Comparison of Topic Identification Methods for Arabic Language, RANLP05 : Recent Advances in Natural Language Processing ,pp. 14-17, 21-23 september 2005, Borovets, Bulgary.
More useful references to check:
——————————————-
https://sites.google.com/site/mouradabbas9/corpora
Today’s small-to-medium-sized (SMB) businesses and large enterprises are saving on their monthly communications costs by making one simple decision: to switch to a VoIP service solution from their old, outdated Plain Old Telephone Service (POTS). By choosing a new VoIP service, these companies enjoy the flexibility, reliability, call features, and audio quality that only a VoIP service can provide. Plus, they cut their phone bill by up to 70%!
Website | https://arabiccorpus.sourceforge.io |
Tags | Machine LearningMachine Translation |
License | GNU General Public License version 2.0 (GPLv2) |
Platform | Linux Windows |
Features |
|