Science & Technology (Commonwealth Union) – Researchers from MIT and other institutions have introduced a novel technique that empowers large language models to tackle tasks involving natural language, mathematics, data analysis, and symbolic reasoning by generating programs.

This method, termed natural language embedded programs (NLEPs), involves instructing a language model to create and run a Python program to address a user’s query, subsequently presenting the solution in natural language.

Their research demonstrated that NLEPs enhance the accuracy of large language models across a diverse array of reasoning tasks. Additionally, the approach is versatile, allowing a single NLEP prompt to be applied to multiple tasks.

NLEPs also boost transparency, as users can review the generated program to understand the model’s reasoning process and correct it if it produces an incorrect answer according to the researchers.

“We want AI to perform complex reasoning in a way that is transparent and trustworthy. There is still a long way to go, but we have shown that combining the capabilities of programming and natural language in large language models is a very good potential first step toward a future where people can fully understand and trust what is going on inside their AI model,” explained Hongyin Luo PhD ’22, who is a MIT postdoc as well as the co-lead author of a paper on NLEPs.

The researchers pointed out that many popular large language models operate by forecasting the next word or token given some natural language input. While models like GPT-4 can be applied to write programs, they embed those programs within natural language, which can then bring about errors in reasoning or outcomes.

The MIT researchers took a different approach with Natural Language Embedded Programs (NLEPs). They prompt the model to generate a step-by-step program entirely in Python code, embedding the necessary natural language within the program itself.

An NLEP follows a four-step problem-solving template. Initially, the model calls the necessary packages or functions required to solve the task. In the second step, it imports natural language representations of the knowledge needed for the task (such as a list of U.S. presidents’ birthdays). In the third step, the model implements a function to calculate the answer. Finally, in the fourth step, the model outputs the result as a line of natural language, potentially including an automatic data visualization if it is required.

“It is like a digital calculator that always gives you the correct computation result as long as the program is correct,” said Luo.

The user can directly look into and fix any errors in the code without needing to rerun the entire model, making troubleshooting more straightforward.

This approach is also more efficient compared to some other methods. If a user has multiple similar questions, they can create one core program and then modify specific variables as needed, without having to repeatedly run the model.

To prompt the model to generate a Natural Language Explanation Program (NLEP), the researchers provide an overall instruction to write a Python program, along with two NLEP examples (one involving mathematics and one involving natural language), as well as a test question.

Lu stated that in general, individuals engaging in few-shot prompting typically need to create unique prompts for each task. However, we discovered a method that enables us to use a single prompt for multiple tasks. This is because our prompt is not designed to teach LLMs to solve a specific problem, but rather to teach them how to tackle various problems by crafting a program.

“Having language models reason with code unlocks many opportunities for tool use, output validation, more structured understanding into model’s capabilities and way of thinking, and more,” explained Leonid Karlinsky, principal scientist for the MIT-IBM Watson AI Lab.

The study revealed that NLEPs achieved over 90% accuracy in prompting GPT-4 to perform various symbolic reasoning tasks, such as tracking shuffled objects or playing the game of 24. Additionally, NLEPs demonstrated remarkable success in instruction-following and text classification tasks. Researchers discovered that these programs outperformed task-specific prompting methods by 30%. Moreover, NLEPs proved to be more efficient than open-source LLMs. Furthermore, NLEPs have the potential to enhance data privacy, as the programs operate locally, eliminating the need to transmit sensitive user information to companies like OpenAI or Google for processing by a model as indicated by the researchers.

Natural language and programming may enhance LLMs

Can India Become the U.S.’s Next Big Trade Partner in Critical Minerals?

UK Government Raises English Bar for Skilled Workers in New Immigration Overhaul

Can Nigeria and Austria’s New Economic Pacts Spark a New Era of Africa–Europe Cooperation?

Google Picks India for Mega AI Expansion with $15 Billion Investment

Royal surprise: Brunei’s most-watched royal couple makes major announcement!

Related Articles

How BitChute’s 50% Revenue Share Could Redefine Creator Monetization Online

ESA Just Built a Giant Ear in the Australian Outback — and It’s About to Change Space Exploration

Scientists Unearth Two Brand-New Species in the Mysterious Deep Ocean

How Europe Just Boosted Its Power to Reach the Edge of the Solar System

UK’s EnerMech Secures Massive Offshore LNG Contract in Australia

BRICS expansion: five countries join, another 25 to follow in 2024

Chinese state media lauds India’s achievements under PM Modi!

Security alarm for India: Bangladesh frees Al-Qaeda terror group chief

Can India Become the U.S.’s Next Big Trade Partner in Critical...