Estonian language support in open-source large generative language models
Abstract
The project's goal is to add support for the Estonian language to selected open-source foundation language models, based on which it would later be possible to develop artificial intelligence applications that understand the Estonian language. Currently, support for the Estonian language is available in OpenAI's proprietary GPT models, the use of which is paid and which requires uploading the data into the OpenAI server. In addition, several open-source models exist that do not currently support the Estonian language. The project uses different training methods, full parameter training and parameter-efficient training, to add support for the Estonian language to the foundation models. In addition, the models will be fine-tuned on Estonian language instruction data and human ratings data to achieve better conversational ability. The project contributes to advancing language technological support for the Estonian language and the survival of the Estonian language in the digital age.
Related Papers
Safety and quality of high-risk plant-based foods and meat alternatives
Roasto, Mati
The Circular Schools – Empowering Secondary Education Students for a Green Future through Circularity Thinking Strategies
Voronova, Viktoria
Developing Estonian startup ecosystem and startup incubation programs: Part 1 - Developing the deep-tech startup ecosystem.
Lööve, Triinu