Estonian language support in open-source large generative language models

Sirts, Kairit
Added: Apr 23, 2025
P176 computer science

Abstract

The project's goal is to add support for the Estonian language to selected open-source foundation language models, based on which it would later be possible to develop artificial intelligence applications that understand the Estonian language. Currently, support for the Estonian language is available in OpenAI's proprietary GPT models, the use of which is paid and which requires uploading the data into the OpenAI server. In addition, several open-source models exist that do not currently support the Estonian language. The project uses different training methods, full parameter training and parameter-efficient training, to add support for the Estonian language to the foundation models. In addition, the models will be fine-tuned on Estonian language instruction data and human ratings data to achieve better conversational ability. The project contributes to advancing language technological support for the Estonian language and the survival of the Estonian language in the digital age.

AI Agent Working...

Please wait while our AI processes your request.

This may take 20-60 seconds depending on the complexity of your request.