Wednesday, May 1, 2024
HomeGlobalSingapore to launch Southeast Asian AI-Language model

Singapore to launch Southeast Asian AI-Language model

-

In recognition of the significance of inclusive Generative AI models, AI Singapore is partnering with Amazon Web Services (AWS) to develop the first Large Language Model family in the region trained specifically for Southeast Asian languages and cultures.

Singapore has created SEA-LION (Southeast Asian Languages in One Network), a Southeast Asian language model, to provide a more accurate representation for the region compared to ChatGPT. While large models like Llama 2 and Mistral AI have been tried, they often generate nonsensical text in English. SEA-LION, part of a Singaporean government initiative, is trained in Southeast Asian languages and cultures to address this issue.

Leslie Teo from AI Singapore highlights that SEA-LION, trained in 11 Southeast Asian languages such as Vietnamese, Thai, and Bahasa Indonesia, provides a cost-effective and efficient solution for businesses, governments, and academics in the region. He emphasizes that the goal of the initiative is to complement existing efforts rather than compete with them, aiming to improve representation for Southeast Asia. While acknowledging that the initiative is not flawless, Teo sees it as a step toward addressing biases present in American localized language models (LLMs).

Nuurrianti Jalli, an assistant professor in the School of Communications at Oklahoma State University, suggests that these models can enable local populations to more fairly engage in the global AI economy, which is currently dominated by large technology companies. The researchers also note that multilingual language models are capable of accurately inferring semantic and grammatical relationships between languages with varying levels of linguistic resources.

Such models find applications in various fields, including translation, customer service chatbots, and content moderation on social media platforms. These platforms often face challenges in identifying hate speech in languages with limited linguistic resources like Burmese or Amharic. SEA-LION stands out by incorporating 13% of data from Southeast Asian languages, a higher percentage than other major LLMs. This data includes over 9% Chinese text and about 63% English, as stated by Teo.

However, digital experts have raised a significant concern regarding the development of LLMs by different countries and regions. They are concerned that such initiatives could unintentionally reinforce existing online narratives, especially in countries with authoritarian regimes, strict media censorship, or weak civil societies.

SEA-LION is set to be accessible on Amazon SageMaker JumpStart this month. This platform offers pre-trained, publicly available models to assist customers worldwide in getting started with machine learning.

spot_img

LEAVE A REPLY

Please enter your comment!
Please enter your name here

LATEST POSTS

Follow us

51,000FansLike
50FollowersFollow
428SubscribersSubscribe
spot_img