Knowledge-Based Semantic Information Indexing and Management Framework: Integration of Structured Knowledge and Information Management Systems*

Document Type : Semantic Technology-Kahani

Authors

Department of Computer Engineering, Bu-Ali Sina University, Hamedan, Iran.

Abstract

One of the most challenging aspects of developing information systems is the processing and management of large volumes of information. One way to overcome this problem is to implement efficient data indexing and classification systems. As large volumes of generated data comprise of non-structured textual data, developing text processing, management and indexing frameworks can play an important role in providing users with accurate information according to their preferences. In this paper, a novel method of semantic information processing, management and indexing is introduced. The main goals of this study is to integrate structured knowledge of ontology and Knowledge Bases (KBs) in the core components of the method, to enrich the contents of the documents,  to have  multi-level semantic network representation of textual resources, to introduce a hybrid weighting schema (salient score) and finally to propose a hybrid method of semantic similarity computation. The structured knowledge of ontology and KBs are integrated from all aspects of the proposed method. The obtained results indicate the accuracy and optimal performance of the proposed framework. The obtained results suggest that using knowledge-based models leads to higher performance and accuracy in identifying and classifying documents according to user preferences; however, if learning-based models are not provided with sufficient amount of training data, they cannot yield satisfying results. The results also demonstrate that the complete integration of ontology and KBs in information systems can significantly contribute to a better representation of documents and evidently superior functionality of information processing, management and indexing systems.

Keywords


  • Fernández, M., Cantador, I., López, V., Vallet, D., Castells, P., Motta, E., “Semantically enhanced Information Retrieval: An ontology-based approach”, Web Semantics: Science, Services and Agents on the World Wide Web, 9, pp. 434–452, 2011.
  • Liu, B., “Web Data Mining - Exploring Hyperlinks, Contents, and Usage Data”, Springer-Verlag Berlin Heidelberg, 2007.
  • Bouadjeneka, M. R., Hacidc, H., Bouzeghoubd, M., “Social networks and information retrieval, how are they converging? A survey, a taxonomy and an analysis of social information retrieval approaches and platforms”, Information Systems, 56, pp. 1–18, 2016.
  • Baeza-Yates, R. A., Ribeiro-Neto, B., “Modern Information Retrieval”, 2nd edition, Addison-Wesley Longman Publishing Co., 2010.
  • Belkin, N. J., “Some(what) grand challenges for information retrieval”, SIGIR Forum, vol. 42, p. 47–54, 2008.
  • Steichen, B., Ashman, H., Wade, V., “A comparative survey of Personalized Information Retrieval and Adaptive Hypermedia techniques”, Information Processing and Management, 48, pp. 698–724, 2012.
  • Kolomiyets, O., Moens, M-F., “A survey on question answering technology from an information retrieval perspective”, Information Sciences, 181, pp. 5412–5434, 2011.
  • Kara, S., Alan, Ö., Sabuncu, O., Akpınar, S., Cicekli, N. K., Alpaslan, F.N., “An ontology-based retrieval system using semantic indexing”, Information Systems, 37, pp. 294-305, 2012.
  • Jayaratne, M., Haththotuwa, I., Arachchi, C. D., Perera, S., Fernando, D., Weerakoon, S., “iSeS: Intelligent semantic search framework”, In Proceedings of 6th Euro American Conference on Telematics and Information Systems (EATIS),
  • Jamgade, A. N., and Shivkumar, J. K., "Ontology based information retrieval system for Academic Library." In Proceedings of International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), IEEE, 2015.
  • Bansal, R., Jyoti, B. K. K., “Ontology-Based Ranking in Search Engine”, In: Aggarwal V., Bhatnagar V., Mishra D. (eds), Big Data Analytics. Advances in Intelligent Systems and Computing, 654, pp. 97-109, 2018.
  • B. Croft, J. Lafferty, J., “Language Modeling for Information Retrieval”, Kluwer Academic Publishers, 2013.
  • Crestani, F., de Campos, L., Fernandez-Luna, J., Huete, J., “Ranking structured documents using utility theory in the Bayesian Network retrieval model”, Notes Comput. Sci., vol. 2857, pp. 168–182, 2003.
  • Kim, K.-M., Hong, J.-H., Cho, S.-B., “A semantic Bayesian network approach to retrieving information with intelligent conversational agents”, Information Processing Management, 43, pp. 225–236, 2007.
  • Bassil, Y., Semaan, P., “Semantic-Sensitive Web Information Retrieval Model for HTML Documents”, European Journal of Scientific Research, 69, pp. 1-11, 2012.
  • Bhushan, S. N. B., Danti, A., “Classification of text documents based on score level fusion approach”, Pattern Recognition Letters, 94, pp. 118-126, 2017.
  • Ramli, F., Noah, S. A., Kurniawan, T. B., "Ontology-based information retrieval for historical documents", In Proceedings of Third International Conference on Information Retrieval and Knowledge Management (CAMP), 2016.
  • Daoud, M., Tamine, L., Boughanem, M., “A personalized search using a semantic distance measure in a graph-based ranking model”, Journal of Information Science, 37, pp. 614–636, 2011.
  • Uthayan, K. R., Anandha Mala, G. S., “Hybrid Ontology for Semantic Information Retrieval Model Using Keyword Matching Indexing System”, The Scientific World Journal, 2015, pp. 1-9, 2015.
  • Tarus, J. K., Niu, Z., Yousif, A., “A hybrid knowledge-based recommender system for e-learning based on ontology and sequential pattern mining”, Future Generation Computer Systems, 72, pp. 37-48, 2017.
  • Mirończuk, M., Protasiewicz, J., "A recent overview of the state-of-the-art elements of text classification", Expert Systems with Applications, 106, pp. 36-54, 2018.
  • Kim, H. K., Kim, H., Cho, S., “Bag-of-concepts: Comprehending document representation through clustering words in distributed representation.”, Neurocomputing, 266, pp. 336-352, 2017.
  • Lease, M., “An Improved Markov Random Field Model for Supporting Verbose Queries”, In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, 2009.
  • Metzler, D., Croft, W.B., “A Markov random field model for term dependencies”, In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval – SIGIR, ACM Press, 2005.
  • Lease, M., Allan, J., Croft, W. B., “Regression Rank: Learning to Meet the Opportunity of Descriptive Queries”, In Proceedings of the 31st European Conference on Information Retrieval (ECIR), 2009.
  • Li, Y., Wei, B., Liu, Y., Yao, L., Chen, H., Yu, J., Zhu, W., “Incorporating Knowledge into neural network for text representation”, Expert Systems With Applications, In Press - Accepted Manuscript, 2017.
  • Pérez-Agüera, J. R., Arroyo, J., Greenberg, J., Iglesias, J. P., Fresno, V., “Using BM25F for semantic search”, In Proceedings of the 3rd International Semantic Search Workshop on – SEMSEARCH, ACM Press, 2010.
  • Pinheiro de Cristo, M. A., Calado, P. P., de Lourdes da Silveira, M., Silva, I., Muntz, R., Ribeiro-Neto, B., “Bayesian belief networks for IR”, International Journal of Approximate Reasoning, 34, pp. 163–179, 2003.
  • Zhang, J., Yuan, H., “A comparative study on collectives of term weighting methods for extractive presentation speech summarization”, In Proceedings of IALP: International Conference on Asian Language Processing,
  • Gupta, Y., Saini, A., Saxena, A. K., “A new fuzzy logic based ranking function for efficient Information Retrieval system”, Expert Systems with Applications, 42, pp. 1223-1234, 2015.
  • Lastra-Díaz, J. J., García-Serrano, A., “A new family of information content models with an experimental survey on WordNet”,Knowledge based systems, 89, pp. 509–526, 2015.
  • Wei, T., Lu, Y., Chang, H., Zhou, Q., Bao, X., “A semantic approach for text clustering using WordNet and lexical chains”,Expert Systems with applications, 42, pp. 2264–2275, 2015.
  • Mitra, B., Craswel, N., “Neural Text Embeddings for Information Retrieval”, In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining,
  • Ferruci, D., Lally, Uima, A., “an architectural approach to unstructured information processing in the corporate research environment”, Natural Language Engineering, 10, pp. 327–348, 2004.
  • Etzioni, O., Cafarella, M. J., Downey, D., maria Popescu, A., Shaked, T., Soderland, S., Weld, D. S., Yates, A., “Unsupervised named-entity extraction from the web": an experimental study”, Artificial Intelligence, 165, pp. 91–134, 2005.
  • Banko, M., Etzioni, O., “The tradeoffs between open and traditional relation extraction”, In Proceedings of ACL-08: HLT, Association for Computational Linguistics, 2008.
  • Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D., “Semantic annotation, indexing, and retrieval”, Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 2, pp. 49–79, 2004.
  • Mooney, R. J., Bunescu, R., “Mining knowledge from text using information extraction”, SIGKDD Explorations Newsletter, vol. 7, pp. 3–10, 2005.
  • Gutierrez, F., Dejing, D., Stephen, F., Daya, W., Hui. Z., "A hybrid ontology-based information extraction system", Journal of Information Science, 42, pp. 798-820, 2016.
  • Ciravegna, F., Chapman, S., Dingli, A., Wilks, Y., “Learning to Harvest Information for the Semantic Web”, In Proceedings of the 1st European Semantic Web Symposium (ESWS-2004), 2004.
  • Kiyavitskaya, N., Zeni, N., Cordy, J. R., Mich, L., Mylopoulos, J., “Cerno: light-weight tool support for semantic annotation of textual documents”, Data and Knowledge Engineering, vol. 68, pp. 1470–1492, 2009.
  • Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S., Yates, A., “Web-scale information extraction in know-it-all: (preliminary results)”, In Proceedings of WWW ’04: the 13th International Conference on World Wide Web, ACM, 2004.
  • Ramakrishnan, C., Kochut, K., Sheth, A. P., “A framework for schema-driven relationship discovery from unstructured text”, In Proceedings of International Semantic Web Conference,
  • Xu, C., Wang, J., Wan, K., Li, Y., Duan, L., “Live sports event detection based on broadcast video and web-casting text”, In Proceedings of the Fourteenth annual ACM international conference on Multimedia, ACM, 2006.
  • Saggion, H., Cunningham, H., Bontcheva, K., Maynard, D., Hamza, O., Wilks, Y., “Multimedia indexing through multi-source and Multilanguage information extraction: The MUMIS project”, Data and Knowledge Engineering, 48, pp. 247–264, 2004.
  • Yang, Y., Li, L., “Research on sports game news information extraction”, In proceedings of International Conference on Natural Language Processing and Knowledge Engineering,
  • Wessman, A., Liddle, S. W., Embley, D. W., “A generalized framework for an ontology-based data-extraction system”, In Proceedings of Fourth International Conference on Information Systems Technology and its Applications, 2005.
  • Gangemi, A., Catenacci, C., Battaglia, M., “Inflammation ontology design pattern: an exercise in building a core biomedical ontology with descriptions and situations”, in D.M. Pisanelli (Ed.), Ontologies in Medicine, IOS Press, 2004.
  • Oberle, D., Ankolekar, A., Hitzler, P., Cimiano, P., Sintek, M., Kiesel, M., Mougouie, B., Baumann, S., Vembu, S., Romanelli, M., Buitelaar, P., Engel, R., Sonntag, D., Reithinger, N., Loos, B., Zorn, H.-P., Micelli, V., Porzel, R., Schmidt, C., Weiten, M., Burkhardt, F., Zhou, J., “DOLCE ergo SUMO: on foundational and domain models in the Smart-Web integrated ontology (SWIntO)”, Journal of Web Semantics, vol. 5, pp. 156–174, 2007.
  • Muller, H.-M., Kenny, E. E., Sternberg, P.W., “Textpresso: an ontology-based information retrieval and extraction system for biological literature”, PLoS Biology, 2, pp. 1984-1998, 2004.
  • Tsinaraki, C., Polydoros, P., Christodoulakis, S., “Interoperability support between mpeg-7/21 and owl in ds-mirf”, IEEE Transactions on Knowledge and Data Engineering, 19, pp. 219–232, 2007.
  • Daoud, M., Tamine, L., Boughanem, M., “Towards a graph based user profile modeling for a session-based personalized search”, Knowledge and Information Systems, 21, pp. 365–398, 2009.
  • Sun, S., Song, W., Zomaya, A. Y., Xiang, Y., Choo, K. K. R., Shah, T., Wang, L., “Associative retrieval in spatial big data based on spreading activation with semantic ontology”, Future Generation Computer Systems, 76, pp. 499-509, 2017.
  • Hahm, G-J., Lee, J-H., Suh, H-W., “Semantic relation based personalized ranking approach for engineering document retrieval”, Advanced Engineering Informatics, 29, pp. 366-379, 2015.
  • Wu, Z., Zhu, H., Li, G., Cui, Z., Huang, H., Li, J., Chen, E., Xu, G., “An efficient Wikipedia semantic matching approach to text document classification”, Information Sciences, 393, pp. 15-28, 2017.
  • Liu, F., Yu, F., Meng, W., “Personalized web search for improving retrieval effectiveness”, IEEE Transaction on Knowledge and Data Engineering, 16, pp. 28–40, 2004.
  • <http://www.loa.istc.cnr.it/DOLCE.html#OntoWordNet>, “Laboratory for applied ontology - DOLCE”, last visited on 19 Feb 2013.
  • Meng, L., Huang, R., Gu, J., “A review of semantic similarity measures in wordnet”, International Journal of Hybrid Information Technology, 6, pp. 1-12, 2013.
  • Kolb, P., “DISCO: A Multilingual Database of Distribution-ally Similar Words”, In Proceedings of KONVENS, 9th Conference in Natural Language, 2008.
  • McInnes, B. T., Pedersen, T., “Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text”, Journal of Biomedical Informatics, 46, pp. 1116-1124, 2013.
  • Langer, S., Beel, J., “Apache Lucene as Content-Based-Filtering Recommender System: 3 Lessons Learned”, 5th International Workshop on Bibliometric-enhanced Information Retrieval, BIR2017, 2017.
  • Zanger, D. Z., “Interpolation of the extended Boolean retrieval model”, Information Processing and Management,38, pp. 743–748, 2002.
  • Moral, C., de Antonio, A., Imbert, R., Ramírez, J., “A survey of stemming algorithms in information retrieval”, Information Research: An International Electronic Journal, 19, pp. 2014.
  • Bounabi, M., Moutaouakil, K. E., Satori, K., “A comparison of Text Classification methods Method of weighted terms selected by different Stemming Techniques”, In Proceedings of BDCA: international Conference on Big Data, Cloud and Applications, 2017.
  • Pyysalo, S., “Part-of-Speech tagging”, In: Dubitzky W., Wolkenhauer O., Cho KH., Yokota H. (eds) Encyclopedia of Systems Biology, Springer, 2013.
  • Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky. D., “The Stanford CoreNLP Natural Language Processing Toolkit”, In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, System Demonstrations, 2014.
  • Hakenberg, J., “Named Entity Recognition”, In: Dubitzky W., Wolkenhauer O., Cho KH., Yokota H. (eds), Encyclopedia of Systems Biology, Springer, 2013.
  • Mohit, B., “Named Entity Recognition”, In: Zitouni I. (eds) Natural Language Processing of Semitic Languages”, Theory and Applications of Natural Language Processing, Springer, 2014.
  • Baziz, M., Boughanem, M., Traboulsi, S., “A Concept-based Approach for Indexing in IR”, In Proceedings of INFORSID, 2005.
  • Biemann, C., Ponzetto, S. P., Faralli, S., Panchenko, A., Ruppert, E., “Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation”, In Proceedings of European Chapter of the Association for Computational Linguistics, 2017.
  • Liu, B., “Web Data Mining - Exploring Hyperlinks, Contents, and Usage Data”, Springer-Verlag Berlin Heidelberg, 2007.
  • Malo, P., Siitari, P., Ahlgren, O., Wallenius, J., Korhonen, P., “Semantic Content Filtering with Wikipedia and Ontologies”, In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops (ICDMW'10). IEEE Computer Society, 2010.
  • Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., Hellmann, S., Morsey, M. van Kleef, P., Auer, S., Bizer, C., “DBpedia – A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia”, Semantic Web Journal, 6, pp. 167-195, 2015.
  • Seco, N., Veale, T., Hayes, J., “An Intrinsic Information Content Metric for Semantic Similarity in WordNet”, In Proceedings of European Chapter of the Association for Computational Linguistics, 2004.
  • Kontostathis, A., Pottenger, W., “A Framework For Understanding Latent Semantic Indexing (LSI) Performance”, information Processing and Management, Special issue: Formal methods for information retrieval, Vol. 42, 56-73, 2006.
  • Lang, K., “The 20 Newsgroups data set, version 20news-18828”, [last update on Aug 14, 2017], [Online] Available: http://www. qwone.com/~jason/20Newsgroups, 2017..
  • Manning, P., Raghavan, H., Schutze, “Introduction to Information Retrieval”, Cambridge University Press, 2008.