Natural Language Processing Techniques for Representing Cultural Concepts in AI-Based Lexicography: A Case Study on English and Uzbek

Authors

  • Yusupova Mushtariy Baxtiyor qizi Karshi State University, Doctorate (PhD) Student

DOI:

https://doi.org/10.51699/ijllal.v4i3.546

Keywords:

Artificial intelligence, natural language processing, cultural concepts, AI lexicography, English language, Uzbek language, semantic analysis, named entity recognition

Abstract

Artificial intelligence (AI) and Natural Language Processing (NLP) have transformed lexicographic practices by enabling scalable and automated language analysis. While these advancements allow for the efficient processing of linguistic data, they often fall short in accurately capturing and representing culturally embedded concepts, particularly in underrepresented languages like Uzbek. Despite progress in semantic models and named entity recognition, there remains a significant lack of cultural sensitivity in AI-based lexicographic systems, primarily due to limited annotated corpora and challenges in modeling context-specific meanings. This study examines how NLP techniques—specifically tokenization, word embedding, and named entity recognition—function in representing cultural concepts in English and Uzbek, and evaluates their strengths and limitations. Findings show that while English-language models handle idiomatic and cultural terms with moderate success, models for Uzbek exhibit considerable deficiencies due to morphological complexity and corpus scarcity. Both languages face issues with accurately capturing idiomatic expressions and culturally loaded entities, leading to semantic distortion in automated outputs. The paper introduces a comparative framework grounded in semantic theory and cultural linguistics, providing practical examples of misrepresentation and highlighting the need for culturally annotated corpora and cross-cultural NLP modeling. To achieve culturally competent AI lexicography, interdisciplinary collaboration is essential. Future systems must integrate domain-specific resources, cultural annotations, and linguistic diversity to ensure that AI technologies do not reduce language to mechanistic processing but preserve its cultural and emotional richness.

References

[1] R. Navigli и S. P. Ponzetto, «BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network», Artif. Intell., 217–250, 2012.

[2] J. Devlin, M.-W. Chang, K. Lee, и K. Toutanova, «BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding», ArXiv Prepr. ArXiv181004805, 2019.

[3] T. Mikolov, K. Chen, G. Corrado, и J. Dean, «Efficient Estimation of Word Representations in Vector Space», ArXiv Prepr. ArXiv13013781, 2013.

[4] R. Jackendoff, Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford University Press, 2002.

[5] V. Evans, How Words Mean: Lexical Concepts, Cognitive Models, and Meaning Construction. Oxford University Press, 2006.

[6] A. Radford, K. Narasimhan, T. Salimans, и I. Sutskever, «Improving Language Understanding by Generative Pre-Training», OpenAI Tech. Rep., 2018.

[7] E. F. Tjong Kim Sang и F. De Meulder, «Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition», в Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, 2003, 142–147.

[8] N. Ide и J. Véronis, «Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art», Comput. Linguist., 1–40, 1998.

[9] E. Cambria и B. White, «Jumping NLP Curves: A Review of Natural Language Processing Research», IEEE Comput. Intell. Mag., 48–57, 2020.

[10] E. M. Bender, T. Gebru, A. McMillan-Major, и S. Shmitchell, «On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?», Proc. 2021 ACM Conf. Fairness Account. Transpar., 610–623, 2021.

[11] P. Piwek, «Presenting Natural Language Generation as a Form of Communication», Philos. Trans. R. Soc. B, 2625–2635, 2008.

[12] C. Goddard, Semantic Analysis: A Practical Introduction, Oxford University Press, 2011.

[13] A. Wierzbicka, Understanding Cultures Through Their Key Words: English, Russian, Polish, German, and Japanese. Oxford University Press, 1997.

[14] B. Tursunov, M. Akhmedov, и N. Xudoyberganov, «UzbekBERT: Pre-trained Language Model for Uzbek Language Understanding». 2021 г.

[15] C. Fellbaum, WordNet: An Electronic Lexical Database. MIT Press, 1998.

Downloads

Published

2025-04-30

How to Cite

Mushtariy Baxtiyor qizi, Y. (2025). Natural Language Processing Techniques for Representing Cultural Concepts in AI-Based Lexicography: A Case Study on English and Uzbek. International Journal of Language Learning and Applied Linguistics, 4(3), 36–40. https://doi.org/10.51699/ijllal.v4i3.546

Issue

Section

Articles

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.