Singapore to launch Southeast Asian AI-Language model

- Advertisement -

In recognition of the significance of inclusive Generative AI models, AI Singapore is partnering with Amazon Web Services (AWS) to develop the first Large Language Model family in the region trained specifically for Southeast Asian languages and cultures.

Singapore has created SEA-LION (Southeast Asian Languages in One Network), a Southeast Asian language model, to provide a more accurate representation for the region compared to ChatGPT. While large models like Llama 2 and Mistral AI have been tried, they often generate nonsensical text in English. SEA-LION, part of a Singaporean government initiative, is trained in Southeast Asian languages and cultures to address this issue.

Leslie Teo from AI Singapore highlights that SEA-LION, trained in 11 Southeast Asian languages such as Vietnamese, Thai, and Bahasa Indonesia, provides a cost-effective and efficient solution for businesses, governments, and academics in the region. He emphasizes that the goal of the initiative is to complement existing efforts rather than compete with them, aiming to improve representation for Southeast Asia. While acknowledging that the initiative is not flawless, Teo sees it as a step toward addressing biases present in American localized language models (LLMs).

Nuurrianti Jalli, an assistant professor in the School of Communications at Oklahoma State University, suggests that these models can enable local populations to more fairly engage in the global AI economy, which is currently dominated by large technology companies. The researchers also note that multilingual language models are capable of accurately inferring semantic and grammatical relationships between languages with varying levels of linguistic resources.

Such models find applications in various fields, including translation, customer service chatbots, and content moderation on social media platforms. These platforms often face challenges in identifying hate speech in languages with limited linguistic resources like Burmese or Amharic. SEA-LION stands out by incorporating 13% of data from Southeast Asian languages, a higher percentage than other major LLMs. This data includes over 9% Chinese text and about 63% English, as stated by Teo.

However, digital experts have raised a significant concern regarding the development of LLMs by different countries and regions. They are concerned that such initiatives could unintentionally reinforce existing online narratives, especially in countries with authoritarian regimes, strict media censorship, or weak civil societies.

SEA-LION is set to be accessible on Amazon SageMaker JumpStart this month. This platform offers pre-trained, publicly available models to assist customers worldwide in getting started with machine learning.

Hot this week

King Charles hails ‘enduring spirit’ of the Commonwealth during trying times

The King is full of praise, noting the enduring...

Turbulence Beyond the Skies: Singapore Airlines and Scoot Halt Gulf Flights as Rising Tensions Disrupt Global Travel.

Singapore flag-carrier and budget airline Scoot announced that they...

Trade & Trade Shifts Reshape Tech

Global trade dynamics have become a defining factor for...

Thousands Seek Evacuation as South Africans Told to Leave Middle East

More than 6,400 South African citizens have told the...
- Advertisement -

Related Articles

- Advertisement -sitaramatravels.comsitaramatravels.com

Popular Categories