Singapore to launch Southeast Asian AI-Language model

In recognition of the significance of inclusive Generative AI models, AI Singapore is partnering with Amazon Web Services (AWS) to develop the first Large Language Model family in the region trained specifically for Southeast Asian languages and cultures.

Singapore has created SEA-LION (Southeast Asian Languages in One Network), a Southeast Asian language model, to provide a more accurate representation for the region compared to ChatGPT. While large models like Llama 2 and Mistral AI have been tried, they often generate nonsensical text in English. SEA-LION, part of a Singaporean government initiative, is trained in Southeast Asian languages and cultures to address this issue.

Leslie Teo from AI Singapore highlights that SEA-LION, trained in 11 Southeast Asian languages such as Vietnamese, Thai, and Bahasa Indonesia, provides a cost-effective and efficient solution for businesses, governments, and academics in the region. He emphasizes that the goal of the initiative is to complement existing efforts rather than compete with them, aiming to improve representation for Southeast Asia. While acknowledging that the initiative is not flawless, Teo sees it as a step toward addressing biases present in American localized language models (LLMs).

Nuurrianti Jalli, an assistant professor in the School of Communications at Oklahoma State University, suggests that these models can enable local populations to more fairly engage in the global AI economy, which is currently dominated by large technology companies. The researchers also note that multilingual language models are capable of accurately inferring semantic and grammatical relationships between languages with varying levels of linguistic resources.

Such models find applications in various fields, including translation, customer service chatbots, and content moderation on social media platforms. These platforms often face challenges in identifying hate speech in languages with limited linguistic resources like Burmese or Amharic. SEA-LION stands out by incorporating 13% of data from Southeast Asian languages, a higher percentage than other major LLMs. This data includes over 9% Chinese text and about 63% English, as stated by Teo.

However, digital experts have raised a significant concern regarding the development of LLMs by different countries and regions. They are concerned that such initiatives could unintentionally reinforce existing online narratives, especially in countries with authoritarian regimes, strict media censorship, or weak civil societies.

SEA-LION is set to be accessible on Amazon SageMaker JumpStart this month. This platform offers pre-trained, publicly available models to assist customers worldwide in getting started with machine learning.

Singapore to launch Southeast Asian AI-Language model

Gulf Tensions Raise Fresh Fears Over Global Food Security and Fertilizer Supplies

The Final Spell of Shakespeare: The Tempest’s Extraordinary Tale of Magic, Betrayal, Forgiveness, and Human Imagination

One Adelaide Development Could Rewrite the Rules of Renting

Canadian Cancer Research Breakthrough: McMaster Team Identifies Drug Targets to Prevent Brain Metastases

Legendary singer S Janaki dies at 88: India bids farewell to the Nightingale of South India!

Related Articles

US ‘Explosive Diarrhoea’ Parasite Outbreak: Should You Be Concerned?

Firework Hits Delta Plane Near Chicago Airport – What We Know So Far

Eight Rescued After Seaplane Emergency Landing in New York’s East River: What Caused the Kodiak 100’s Rough Landing?

What US Supreme Court Birthright Citizenship Ruling Means for Trump

US-Iran Conflict: US Agrees to Stand Down After Strikes

BRICS expansion: five countries join, another 25 to follow in 2024

Chinese state media lauds India’s achievements under PM Modi!

Security alarm for India: Bangladesh frees Al-Qaeda terror group chief