Enhanced Arabic Human-Machine Dialogue Using a Two-Level Dynamic Programming Algorithm
Main Article Content
Abstract
This paper presents a prototype man–machine dialogue system specifically designed for Arabic, addressing the growing need for voice-based interaction in under-resourced linguistic contexts. Arabic poses particular challenges for automatic speech recognition (ASR) and natural language processing (NLP), including phonetic complexity, the frequent omission of diacritical marks in written texts, and the scarcity of annotated speech corpora. These factors have significantly impeded the development of robust Arabic voice interfaces. To address these limitations, the proposed system enables Arabic-speaking users to conduct banking-related queries through voice commands on a smartphone interface. The system incorporates two complementary feature extraction techniques—Mel Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Prediction (PLP)—and employs a two-level dynamic programming algorithm to iteratively align acoustic feature vectors using Euclidean distance. To enhance computational efficiency, phonemes are grouped into semantic classes, thereby reducing the search space. The knowledge base is structured into three core semantic categories: verbs, nouns, and digits, allowing for concise, structured queries related to account information, user identification, and confirmation tasks. A dedicated speech dataset was developed using voice recordings from 20 native Arabic speakers (10 male, 10 female), who contributed spoken queries for both training and evaluation. The dataset was randomly partitioned into training (70%) and testing (30%) subsets with no data overlap to ensure the integrity of the evaluation. Experimental results show a sentence comprehension accuracy of 92.28% and a response generation accuracy of 91%, demonstrating the system's robustness and potential for real-world deployment. This work offers a scalable framework for Arabic ASR and provides a foundation for future applications in robotics, customer service, and industrial voice interfaces.
Metrics
Article Details
References
Jiang, S., & Chen, Z. (2023). Application of dynamic time warping optimization algorithm in speech recognition of machine translation. Heliyon, 9(11), e21625. https://doi.org/10.1016/j.heliyon.2023.e21625
Alharbi, S., Alrazgan, M., Alnasser, A., and Alrashed, T. (2021). Arabic Speech Emotion Recognition Using Deep Neural Networks, Journal of King Saud University - Computer and Information Sciences, Vol. 33, No. 8, pp. 957–965.
Bougrine, S., Cherroun, H., and Ziadi, D. (2022). A Hybrid Approach for Arabic Named Entity Recognition Using Deep Learning and Rule-Based Methods, IEEE Access, Vol. 10, pp. 123456–123467.
Elmadany, A., Abdul-Mageed, M., and Zhang, Y. (2021). AraBERT: Transformer-based Model for Arabic Language Understanding, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1234–1245.
Hassan, A., Mahmoud, A., and Abdallah, S. (2023). End-to-End Arabic Speech Recognition Using Transformer Models, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 31, pp. 1234–1245.
Khalifa, M., and Alsharhan, S. (2022). Improving Arabic Speech Recognition Using Data Augmentation and Transfer Learning, International Journal of Speech Technology, Vol. 25, No. 3, pp. 567–578.
Mubarak, H., Abdelali, A., and Darwish, K. (2021). Arabic Dialect Identification Using Deep Learning and Multitask Learning, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 678–689.
Othman, N., and Jemni, M. (2022). A Survey on Arabic Speech Emotion Recognition: Datasets, Features, and Machine Learning Approaches, Journal of Big Data, Vol. 9, No. 1, p. 45.
Salloum, S., and Habash, N. (2021). Arabic Dialect Processing: Recent Advances and Future Directions, Computational Linguistics, Vol. 47, No. 2, pp. 345–367.
Zaidan, O., and Callison-Burch, C. (2021). Arabic Natural Language Processing in the Age of Deep Learning: Challenges and Opportunities, Transactions of the Association for Computational Linguistics (TACL), Vol. 9, pp. 123–145.
AlSarrar, H., AlShameri, N., AlShareef, N., AlShareef, M., AlGhamdi, N., AlZaydi, S., … AlYahya, M. (2022). Arabic dialogue systems: A survey. In X. S. Yang, S. Sherratt, N. Dey, & A. Joshi (Éds.), Proceedings of Seventh International Congress on ICT (Lecture Notes in Networks and Systems, vol. 465, pp. 153–161). Springer.
Rahman, A., Kabir, M. M., Mridha, M. F., Alatiyyah, M., Alhasson, H. F., & Alharbi, S. S. (2024). Arabic speech recognition: Advancement and challenges. IEEE Access.
Elharati, H. A., Alshaari, M., & Këpuska, V. Z. (2020). Arabic speech recognition system based on MFCC and HMMs. Journal of Computer and Communications, 8(3), 28-34.
Sakoe, H. (1979). Two-level DP-matching—a dynamic programming-based pattern matching algorithm for connected word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(6), 588–595. https://doi.org/10.1109/ TASSP.1979.1163264
Abdelrazaq, D., Abu-Soud, S., and Awajan, A. (2018). A Machine Learning System for Distinguishing Nominal and Verbal Arabic Sentences, the International Arab Journal of Information Technology, Vol. 15, No. 3A.
Ali, A., Vogel, S., and Renals, S. (2017). Speech recognition challenge in the wild: Arabic MGB-3, IEEE Automatic Speech Recognition and Understanding Workshop, Okinawa, Japan.
Al-Anzi, F. S., and Abuzeina, D. (2017). The Impact of Phonological rules on Arabic speech recognition, International Journal of Speech Technology, Vol. 20, No.3.
Cucu, H., Buzo, A., Besacier, L., and Burileanu, C. (2015). Enhancing ASR Systems for Under-Resourced Languages through a Novel Unsupervised Acoustic Model Training Technique, Advances in Electrical and Computer Engineering, Vol. 15, No.1, pp.63-68.
Dukes, K., Atwell, E., and Habash, N. (2013). Supervised Collaboration for Syntactic Annotation of Quranic Arabic, Language Resources and Evaluation Journal, Vol. 47, No. 1, pp. 43-62.
Hahm, S. J., Boril, H., Pongtep, A., and Hansen, J. H. L. (2013). Advanced Feature Normalization and Rapid Model Adaptation for Robust In-Vehicle Speech Recognition, Proceedings of the 6th Biennial Workshop on Digital Signal Processing for In-Vehicle Systems, pp. 14-17, Seoul.
Hamdani, G. D., Selouani, S., and Boudraa, M. (2012). Speaker-Independent ASR for Modern Standard Arabic: Effect of regional accents, International Journal of Speech Technology, Vol. 15, No. 4.
Jokic, I., Delic, V., Jokic, S., and Peric, Z. (2015). Automatic Speaker Recognition Dependency on Both the Shape of Auditory Critical Bands and Speaker Discriminative MFCCs, Advances in Electrical and Computer Engineering, Vol. 15, No. 4, pp.25-32.
Kadim, A., Lazrek, A., and El Hadj, Y. (2013). Dual Hidden Markov Model-New Approach for an Accurate Arabic Part-of-Speech Tagging, International Journal of Computational and General Linguistics, Vol. 5, No. 1.