Improving Persian Dependency-Based Parser Using Deep Learning

Document Type : Computer Networking-Amin Hosseini

Authors

1 Department of Computer Eng., Faculty of Electrical & Computer Eng., The University of Kashan, Kashan, Iran.

2 Department of Computer Eng., Faculty of Electrical & Computer Eng., University of Kashan, Kashan, Iran.

Abstract

One of the most important problems in computational linguistics is the grammar and, consequently, syntactic structures and structural parsing. The structural parser tries to analyze the relationships between words and to extract the syntactic structure of the sentence. The dependency-based structural parser is proper for free-word-order and morphologically-rich languages such as Persian. The data-driven dependency parser performs the categorization process based on a wide range of features, which, in addition to the problems such as sparsity and curse of dimensionality, it requires the correct selection of the features and proper setting of the parameters. The aim of this study is to obtain high performance with minimal feature engineering for dependency parsing of Persian sentences. In order to achieve this goal, the required features of the Maximum Spanning Tree Parser (MSTParser) are extracted with a Bidirectional Long Short-Term Memory (Bi-LSTM) Network and the edges of the dependency graph is scored by that. Experiments are conducted on the Persian Dependency Treebank (PerDT) and the Uppsala Persian Dependency Treebank (UPDT). The obtained results indicate that the definition of new features improves the performance of the dependency parser for Persian. The achieved unlabeled attachment scores for PerDT and UPDT are 90.53% and 87.02%, respectively.

Keywords

Main Subjects


[1]    Liddy, E. D., "Natural language processing", 2001.
[2]    Nadkarni, P. M., Ohno-Machado, L., and Chapman, W. W., "Natural language processing: an introduction",  Journal of the American Medical Informatics Association, vol. 18 No, 5, pp. 544-551, 2011.
[3]    Jurafsky, D., and James, H., "Speech and language processing an introduction to natural language processing", computational linguistics, and speech, 2000.
[4]    Sakaguchi, K., and Nagata, R., "Phrase structure annotation and parsing for learner English. Information and Media Technologies", vol. 12, pp. 316-339, 2017.
[5]    Khatun, A., and Hoque, M. M., "Statistical parsing of Bangla sentences by CYK algorithm", International Conference on Electrical, Computer and Communication Engineering,, pp. 655-661, 2017.
[6]    Nivre, J., "Dependency grammar and dependency parsing", MSI report, pp. 1-32, 2005.
[7]    Zhang, X., Cheng, J., and Lapata, M., "Dependency parsing as head selection", arXiv preprint arXiv:1606.01280, 2016.
[8]    Grella, M., "Notes About a More Aware Dependency Parser", arXiv preprint arXiv:1507.05630, 2015.
[9]    Falavarjani, S. A. M., and Ghassem-Sani, G., "Advantages of dependency parsing for free word order natural languages", International Conference on Current Trends in Theory and Practice of Informatics,  January 24, 2015, pp. 511-518, 2015.
[10] Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, NA., "Transition-based dependency parsing with stack long short-term memory", arXiv preprint arXiv:1505.08075, 2015.
[11] Kübler, S., McDonald, R., and Nivre, J., "Dependency parsing", Synthesis Lectures on Human Language Technologies, vol. 1, pp. 1-127, 2009.
[12] Plank, B., and Van Noord, G., "Grammar-driven versus data-driven: which parsing system is more affected by domain shifts?", Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the common ground, pp. 25-33, 2010.
[13] Khallash, M., Hadian, A., and Minaei-Bidgoli, B., "An empirical study on the effect of morphological and lexical features in Persian dependency parsing", Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages. pp 97-107, 2013.
[14] McDonald, R., Crammer, K., and Pereira, F. C., "Spanning tree methods for discriminative training of dependency parsers", Technical Reports (CIS), pp. 1-55, 2006.
[15] Estiri, A., Kahani, M., Hoseini, M., and Asgarian, E., "Designing Persian language parser tool", International Conference on Asian Language Processing, 2012.
[16] Seraji, M., Bernd, B., and Nivre, J., "ParsPer: A dependency parser for Persian", International Conference on Dependency Linguistics (DepLing 2015), August 24-26, 2015, Uppsala, Sweden, pp. 300-309, 2015.
[17] Nivre, J., Hall, J., and Nilsson, J., "Maltparser: A data-driven parser-generator for dependency parsing", Proceedings of Language Resources and Evaluation Conference, pp. 2216-2219, 2006.
[18] McDonald, R., Pereira, F., Ribarov, K., and Hajič, J.,  "Non-projective dependency parsing using spanning tree algorithms", Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 523-530, 2005.
[19] Bohnet, B., and Kuhn, J., "The best of both worlds: a graph-based completion model for transition-based parsers", Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 77-87, 2012.
[20] Bohnet, B., Nivre, J., "A transition-based system for joint part-of-speech tagging and labeled non-projective dependency parsing", Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1455-1465, 2012.
[21] Martins, A. F., Smith, N. A., Xing, E. P., Aguiar, P. M.,  Figueiredo, M. A., "Turbo parsers: dependency parsing by approximate variational inference", Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 34-44, 2012.
[22] Seraji, M., Megyesi, B., and Nivre, J., "Dependency parsers for Persian", 24th International Conference on Computational Linguistics, 8-15 pp. 35-44, 2012.
[23] Lazemi, S., Ebrahimpour-Komleh, H., "Feature engineering in Persian dependency parser", Journal of AI and Data Mining, vol. 7 No.30, pp. 467-474, 2018.
[24] Shamsfard, M., "Challenges and open problems in Persian text processing", Proceedings of LTC, vol. 11, pp.65-69, 2011.
[25] Rasooli, M. S., Kouhestani, M., and Moloodi, A., "Development of a Persian syntactic dependency treebank", Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 306-314, 2013.
[26] Seraji, M., Jahani, C., Megyesi, B., and Nivre, J., "A Persian treebank with Stanford typed dependencies", Proceedings of Language Resources and Evaluation Conference, Reykjavik, Iceland, pp. 796-801, 2014.
[27] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R., "Dropout: a simple way to prevent neural networks from overfitting", The journal of machine learning research, vol. 15, No. 1, pp. 1929-1958, 2014.
[28] Grave, E., Bojanowski, P., Gupta, P., Joulin, A., and Mikolov, T., "Learning word vectors for 157 languages. ", arXiv preprint arXiv:1802.06893, 2018.
[29] Sarabi, Z., Mahyar, H., and Farhoodi, M., "ParsiPardaz: Persian language processing toolkit", Computer and Knowledge Engineering, pp. 73-79, IEEE, 2013.
[30] Seraji, M., Megyesi, B., and Nivre, J., "A basic language resource kit for Persian", Eight International Conference on Language Resources and Evaluation, pp. 2245-2252, 2012.
CAPTCHA Image