Natural language and programming may enhance LLMs

- Advertisement -

Science & Technology (Commonwealth Union) – Researchers from MIT and other institutions have introduced a novel technique that empowers large language models to tackle tasks involving natural language, mathematics, data analysis, and symbolic reasoning by generating programs.

This method, termed natural language embedded programs (NLEPs), involves instructing a language model to create and run a Python program to address a user’s query, subsequently presenting the solution in natural language.

Their research demonstrated that NLEPs enhance the accuracy of large language models across a diverse array of reasoning tasks. Additionally, the approach is versatile, allowing a single NLEP prompt to be applied to multiple tasks.

NLEPs also boost transparency, as users can review the generated program to understand the model’s reasoning process and correct it if it produces an incorrect answer according to the researchers.

“We want AI to perform complex reasoning in a way that is transparent and trustworthy. There is still a long way to go, but we have shown that combining the capabilities of programming and natural language in large language models is a very good potential first step toward a future where people can fully understand and trust what is going on inside their AI model,” explained Hongyin Luo PhD ’22, who is a MIT postdoc as well as the co-lead author of a paper on NLEPs.

The researchers pointed out that many popular large language models operate by forecasting the next word or token given some natural language input. While models like GPT-4 can be applied to write programs, they embed those programs within natural language, which can then bring about errors in reasoning or outcomes.

The MIT researchers took a different approach with Natural Language Embedded Programs (NLEPs). They prompt the model to generate a step-by-step program entirely in Python code, embedding the necessary natural language within the program itself.

An NLEP follows a four-step problem-solving template. Initially, the model calls the necessary packages or functions required to solve the task. In the second step, it imports natural language representations of the knowledge needed for the task (such as a list of U.S. presidents’ birthdays). In the third step, the model implements a function to calculate the answer. Finally, in the fourth step, the model outputs the result as a line of natural language, potentially including an automatic data visualization if it is required.

“It is like a digital calculator that always gives you the correct computation result as long as the program is correct,” said Luo.

The user can directly look into and fix any errors in the code without needing to rerun the entire model, making troubleshooting more straightforward.

This approach is also more efficient compared to some other methods. If a user has multiple similar questions, they can create one core program and then modify specific variables as needed, without having to repeatedly run the model.

To prompt the model to generate a Natural Language Explanation Program (NLEP), the researchers provide an overall instruction to write a Python program, along with two NLEP examples (one involving mathematics and one involving natural language), as well as a test question.

Lu stated that in general, individuals engaging in few-shot prompting typically need to create unique prompts for each task. However, we discovered a method that enables us to use a single prompt for multiple tasks. This is because our prompt is not designed to teach LLMs to solve a specific problem, but rather to teach them how to tackle various problems by crafting a program.

“Having language models reason with code unlocks many opportunities for tool use, output validation, more structured understanding into model’s capabilities and way of thinking, and more,” explained Leonid Karlinsky, principal scientist for the MIT-IBM Watson AI Lab.

The study revealed that NLEPs achieved over 90% accuracy in prompting GPT-4 to perform various symbolic reasoning tasks, such as tracking shuffled objects or playing the game of 24. Additionally, NLEPs demonstrated remarkable success in instruction-following and text classification tasks. Researchers discovered that these programs outperformed task-specific prompting methods by 30%. Moreover, NLEPs proved to be more efficient than open-source LLMs. Furthermore, NLEPs have the potential to enhance data privacy, as the programs operate locally, eliminating the need to transmit sensitive user information to companies like OpenAI or Google for processing by a model as indicated by the researchers.

Hot this week

Can India Become the U.S.’s Next Big Trade Partner in Critical Minerals?

(Commonwealth_India) The latest round of trade tensions between the...

UK Government Raises English Bar for Skilled Workers in New Immigration Overhaul

(Commonwealth_Europe) From January 2026 on, some migrants coming to...

Can Nigeria and Austria’s New Economic Pacts Spark a New Era of Africa–Europe Cooperation?

Africa (Commonwealth Union)—Nigeria and Austria have reported that they...

Google Picks India for Mega AI Expansion with $15 Billion Investment

In a bold move signalling its confidence in India’s...

Royal surprise: Brunei’s most-watched royal couple makes major announcement!

Brunei (Commonwealth Union)_ Prince Abdul Mateen of Brunei and...
- Advertisement -

Related Articles

- Advertisement -sitaramatravels.comsitaramatravels.com

Popular Categories

Commonwealth Union
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.