Even as the battle between Google and Microsoft over the future of Internet search intensities, WhatsApp could soon become a major search engine on key government schemes for India’s estimated 150 million farmers – powered by the hugely sensational AI chatbot ChatGPT and an ambitious national-level program that aims to build vast datasets containing samples of Indian voices in several local languages, through a crowdsourcing model.
Bhashini, a small team at the Ministry of Electronics and IT (MeitY), is currently building a WhatsApp-based chatbot that relies on information generated by ChatGPT to return appropriate responses to queries. And because people, especially farmers in rural areas, may not always want to type out their queries, questions can be asked on the chatbot via voice notes.
In essence, queries on the chatbot could be simply asked through voice notes, following which it would return a voice-based response generated by ChatGPT.
According to a senior government official, a model of this bot was shown to Microsoft CEO Satya Nadella, who mentioned it earlier this year during the World Economic Forum at Davos. The Indian Express has also seen a demo of the chatbot in action, where it seamlessly responds to a query – made through a voice note – about the details of the PM Awas Yojana.
The chatbot, which is currently under testing, is being developed keeping in mind India’s rural and agrarian population – the sections of society that most depend on government schemes and subsidies – and the various languages spoken by them. And in that context, it becomes important to build a language model that can successfully identify and understand the local languages spoken by the country’s rural population, said another senior government official associated with the project.
While the responses generated by ChatGPT have so far impressed many with its ability to respond to complex queries in fascinating and eloquent ways, building a national digital public platform for Indian languages will be the key for the WhatsApp chatbot that the Bhashini team is building to succeed. To build such a language model, the official said, it is urgent to have large datasets of the various local languages spoken in India on which the model can be trained.
This is where an initiative called Bhasha Daan comes in, he explained. It is an ambitious project which aims to crowdsource voice datasets in multiple Indian languages. On the project’s website, people can contribute in three key ways: by recording their voice samples in multiple Indian languages by reading out a piece of text, typing out a sentence being played, and translating text in one language into another.
“A majority of the people who will use this chatbot will not know English. So, for their voice inputs to work on the chatbot, it is important that we train our language processing models in as many Indian languages as possible. We have a decent-sized repository of voices in many Indian languages that people of the country have contributed to through the Bhasha Daan portal. We also have a vast database of all the languages that Doordarshan telecasts in. So we have used the language model on the chatbot using these datasets,” the second official explained.
In the test phase, the model currently supports 12 languages, including English, Hindi, Tamil, Telugu, Marathi, Bengali, Kannada, Odia, and Assamese. This means that if a user sends a voice note to the chatbot in any of these languages, the chatbot will successfully return with a response to it.
In a country where despite rising rural connectivity to the global Internet, there exists a stark digital divide, the official said that the choice of WhatsApp as the delivery platform was a deliberate one.
“WhatsApp has more than 500 million users, and even those with relatively low digital literacy know their way around the app,” he added.
There are, however, some limitations currently. In its testing phase, the chatbot can only respond to simple queries about government schemes, among other things. This is primarily due to the current limitation of ChatGPT itself – the fact that it cannot access real-time information from the Internet. ChatGPT’s language model was trained on a vast dataset to generate text based on the input, and the dataset, at the moment, only includes information until 2021.
However, that could soon change. On Wednesday, Microsoft announced a new version of its search engine Bing, powered by an upgraded version of the same AI technology that underpins ChatGPT. Microsoft said that the feature would be powered by an updated version of GPT 3.5, the AI language model created by OpenAI that powers ChatGPT. It called this the “Prometheus Model,” and said it was more powerful than GPT 3.5 and better able to answer search queries with more up-to-date information and annotated answers. The first official said that once ChatGPT can search the Internet and return real-time results, the scope of the WhatsApp chatbot could go far beyond what is currently being tested. “People will not only be able to get information about various government schemes in a concise manner, but also inquire if they are eligible for some of those schemes,” the official said.
Even as both officials remained non-committal about the public release of the chatbot, they said that its demo had impressed Microsoft’s Nadella. It is worth noting, however, that Microsoft has invested a reported $10 billion in OpenAI, which has developed ChatGPT.
“A demo I saw was a rural Indian farmer trying to access some government programs. He just expressed a complex thought in speech in one of the local languages that got translated and interpreted by a bot, and a response came back saying, ‘go to a portal and here is how you will access the program’. He said, ‘I’m not going to go to the portal, I want you to do this for me’. The bot completed it, and the reason it was able to complete it was that a developer building it had taken GPT and trained it over all of the Government of India’s documents and then scaffolded it with the speech recognition software,” Nadella had said earlier this year.