Every two weeks, a minority language dies. Regrettably, humanity doesn’t just lose the unique evolution of that language when it fades; the unique meanings and forms of expression encoded within the language die as well. Language is the medium for cultural heritage in all of its forms — from medicinal knowledge to geography to communities’ humor, love and memory. In short, the loss of a language is an erasure of centuries of rich tradition. 1
While language extinction isn’t a new phenomenon, language loss is occurring at a breakneck pace in the twenty-first century. At least half of the estimated 7,000 languages spoken across the world today are likely to disappear within the century. 2 But, recent technological breakthroughs are offering new hope to affected communities. Applications of natural language processing (NLP) and computational linguistics may offer a pathway to preserve endangered languages from extinction.
AI Pirika is one particular application aimed at reviving the critically endangered Ainu language spoken by the Ainu people, the indigenous inhabitants of Hokkaido in northeastern Japan and a string of islands to the north of Japan, called “Ainu Moshiri,” or “Land of the Ainu.”
Known mostly for their tradition of catching and raising bear cubs as members of their families, the Ainu possess an ancient spiritual relationship with the natural world. These communities believe in the spirit of the sacred bear, worshipping and caring for bears in hopes of yielding good fortune for their families. From the vibrant geometric designs of their clothing to the mouth tattooing women undergo to signify their coming-of-age, traditional Ainu culture is significantly different from Japanese culture. In fact, their indigenous language is, much like the people themselves, of unknown origins. Ainu possesses no relationship to any other language, making it one of only 129 language isolates in the world. 3
In April 2019, researchers launched AI Pirika to preserve the unique language isolate of the Ainu people along with the vibrant cultural heritage imbued within the language. The project is a 5-year collaboration between the Society for Academic Research of Ainu Culture (SARAC), AI expert Professor Kenji Araki of Hokkaido University, and other technical and Ainu collaborators. Dubbed “Pirika,” which means “pretty girl” in Ainu, the virtual agent will be a hybrid between a virtual chatbot and speech recognition engine.
Tomoyuki Hanaoka, MD, PhD of SARAC found his passion for the project through the linguistic story of his colleague, Tokuhei, an Ainu man in his 70s who knows Ainu words but cannot speak the language fluently. Tokuhei was born in Kushiro, the Eastern part of Hokkaido, to an Ainu mother and a Wajin, or ethnically Japanese, father. Because neither his grandparents nor his mother spoke the Ainu language before him, Tokuhei was not exposed to conversation in his ancestral tongue. The cycle was amplified by his grandparents’ desire to assimilate into Wajin culture.
Tokuhei’s fraught linguistic story is a microcosm of the broader story of Ainu assimilation during the Meiji era. When Japan colonized Hokkaido in the 1850s, the Ainu were massacred by genocide and disease, dispossessed of their traditional land, and forcibly relocated to the mountainous barren area in the island’s center in which traditional subsistence living was nearly impossible. Even today, they need explicit permission from authorities to fish in their traditional land. 4
Laws outlawing Ainu customs pressured the remaining Ainu people to assimilate under the banner of Japan’s mythological homogeneity. Japanese settlers banned the Ainu language and dictated Ainu children to attend Japanese schools. Not only did the assimilation policy force Ainu to use the dominant languages and customs of the Wajin, but it also resulted in significant education gaps, socioeconomic disparity, and rampant bigotry. To this day, Ainu often conceal their identity when seeking jobs or marriage to avoid discrimination. Ainu culture was consigned to display as an exotic tourist attraction or an object of anthropological research. Adding insult to injury, Japan’s illusory ethnocultural homogeneity has left the Ainu struggle blind to the global eye, along with those of Japan’s wealth of ethnic minorities. 5 Only 2 native Ainu speakers remain as a result of this systematic cultural erasure. 6 With only one surviving dialect, the Ainu language faces tenuous odds on the road to preservation.
Against these odds, however, the Ainu have prevailed against Japan’s oppressive forces through recent activism on an international scale. In a matter of decades, the Ainu have shed their dependence on government and engaged in political mobilization to win national legislation promoting and protecting their culture. From launching periodicals proclaiming their indigeneity to founding the Hokkaido Ainu Association, the Ainu built collective solidarity during Japan’s postwar period. With global indigenous rights activities as an accelerator during the era of global human rights in the 1970s, Ainu activists allied with indigenous communities abroad and leveraged international pressure to further their domestic aims following the United Nations General Assembly’s acceptance of the United Nations Declaration of the Rights of Indigenous Peoples (UNDRIP) for which Japan voted in favor. After calls for recognition “as an indigenous people… with their own unique language, religion, and culture” immediately following the adoption of the UNDRIP and a strategically timed Indigenous Peoples’ Summit to precede the 34th G8 summit scheduled for Hokkaido, the Ainu successfully elicited the Japanese government’s 2008 declaration recognising the Ainu as an indigenous people. 7 Beyond the hard-won recognition of their indigeneity, Ainu continue to campaign to win back their ancestors’ right to fish, for the repatriation of ancestral remains and for the freedom to perform rituals necessitating land access. 8
AI Pirika is thus part of a larger movement to revive Ainu culture. In particular, “[t]he goal of AI Pirika is to revive the Ainu speaker,” said Dr. Hanaoka. “Unfortunately, the Ainu native speaker is almost extinct. We are working with the feeling of reviving the dead.” Dr. Hanaoka and the AI Pirika team hope that the system will survive as an Ainu speaker in the future to contribute to Ainu language education programs and activism, and other applications of NLP to the thousands of minority languages spoken worldwide.
Compared to other NLP applications, AI Pirika features a novel verbal element absent in existing AI systems for language preservation which have largely taken the form of chatbots, such as “Reobot,” a Facebook Messenger chatbot created to preserve the Indigenous language of New Zealand. 9 “[W]e aim to develop a voice dialogue system at the same level as Native, considering that the Ainu language originally has no written language,” said Dr. Hanaoka. The plan to enable both text and voice communication will make the system more accessible to surviving speakers of the exclusively oral language, as well as inspire verbal developments in other systems.
Mainstream NLP applications also rely heavily on written dialogical data, but since obtaining a sufficient amount of conversation transcripts, literature, and other written documentation in the endangered Ainu language is impossible, AI Pirika creators have taken a creative approach to encoding the language. To combat the data deficit, Professor Kenji Araki of Hokkaido University developed a Spoken Dialogue Method Using the Inductive Learning Method Based on Genetic Algorithm with Sexual Selection (SeGA-ILSD) for AI Pirika which allows the system itself to generate dialogue data through selective mating and mutation of genetic algorithms. “Selective mating is a method of generating different sentences by dividing and connecting two different sentences starting from a certain point, and mutation is a method of replacing randomly selected words at random positions with a certain probability,” explained Professor Araki. By snowballing existing data, the algorithm grows the language resource base until sufficient for native-level conversation.
Granted, the dialogue data automatically generated by the system SeGA-ILSD is bound to contain errors. But, the system is programmed to refine its data pool through a user-led feedback process. “If the system makes a mistake, the accuracy will be improved by the selection process of the genetic algorithm based on the mistake pointed out by the user who is using the system,” said Professor Araki. The selection process then improves the accuracy of the entire system by lowering the priority of the rules used to generate incorrect sentences.
But the genetic algorithm alone does not suffice for native-level accuracy. To tune the system further, Professor Araki incorporated the sexual selection theory into AI Pirika. In sexual selection theory, each rule has a sexual distinction; a female rule prefers a male rule. The purpose of this preference is to facilitate selective mating between rules and generate rules that include many common parts. Through this algorithmic process, the words in the rules evolve to possess commonalities, and it is through these small commonalities that AI Pirika learns new and superior rules from examples. “By using such a mechanism, we aim to realize a highly accurate dialogue system even in languages with few language resources such as Ainu language,” said Professor Araki.
Beyond functionality, AI Pirika will be an open access tool available to the public, broadening its impact as a tool for cultural revival. “We want many people to use AI Pirika,” said Dr. Hanaoka. “We plan to open it to the public online for this purpose.” Lowering the barrier to access preservational tools like AI Pirika will increase its reach to prospective Ainu language learners.
Though AI Pirika is an effective tool to preserve the vanishing Ainu language, there are ethical implications of academic research involving indigenous communities — even in the name of preservation — that merit consideration.
For one, academic research on indigenous communities can be a detriment to healing from intergenerational trauma. Ainu already have a long painful history with anthropological exploits launched on their ancestral remains post-colonization. Some 1,653 Ainu remains are held at Japanese universities to this day, and Ainu rights advocates have demanded the return of these remains to their lineal descendants. In 2009, the Japanese government proposed to transfer all unidentified Ainu remains from universities to a memorial hall to be built in Shiraoi, Hokkaido by 2020. This plan was met with disdain by Ainu advocates who were not involved in this decision. 10
Technological documentation and research on the Ainu languages may thus risk perpetuating this cycle of exploitative academic research if ethical standards are not defined and upheld. In particular, language research can be exploitative if indigenous people lack agency in the oral history preservation process. As stated in Article 31 of the Declaration on the Rights of Indigenous Peoples, “Indigenous peoples have the right to maintain, control, protect and develop their cultural heritage, traditional knowledge, and traditional cultural expressions, as well as the manifestations of their sciences, technologies, and cultures, including . . . oral traditions.” 11
The creators of AI Pirika are striving to make contributions toward ending this cycle by adopting a more mindful approach to language preservation. Recognizing the diverse views Ainu have toward the application, AI Pirika’s technical collaborators are working to integrate emotion and spirituality, specifically the Ainu’s unique bear-worshipping animism, into the system: “[Some Ainu] think that AI cannot reproduce the spirituality of the Ainu and show a negative attitude toward our project,” said Dr. Hanaoka.
“On the other hand, some Ainu people expect the Ainu language to be protected by current technology. We think they have a strong desire to retain the Ainu language by any method. We will respect the former feeling and include the spirituality of the Ainu in the AI Pirika system.”
Seeking and incorporating community preferences is a step towards participatory engagement, but co-creation and partnership are even closer. “We spend a lot of effort looking for collaborators,” said Dr. Hanaoka, who recruits technical cooperators to program this system and other collaborators to translate and record the Ainu language.
Given the dwindling native Ainu speaker population, however, finding native Ainu partners for the project has posed a challenge. “As mentioned above, there are almost no native speakers in Ainu. Therefore, it is not easy to make a collaborator,” said Dr. Hanaoka. “I asked an Ainu woman in her 50s, who had heard the Ainu language in her childhood, to cooperate in recording and started the project. However, due to the health reasons of the collaborator, the partnership ended.” For now, the team is working with those familiar with the Ainu language who can supervise the linguistic side of the project while conducting outreach for native Ainu collaborators, particularly young Ainu eager to learn the language.
Neighboring concerns around participatory engagement are those regarding virtual preservation as an intrinsically limited form of preservation––a bandage slapped onto a broader issue of cultural erasure due to systematic oppression that merits direct reparations. On the issue, Dr. Hanaoka said, “[Since] the Ainu native speaker is almost extinct[,] I think it is more realistic to leave an environment where the Ainu language can be spoken by AI.” While the Japanese government must make strides toward reparations, any effort to retain the Ainu language is leagues better than the imminent and altogether extinction it faces.
As for revitalizing the oral Ainu tradition via education, AI Pirika holds promise. “I believe that talking with AI Pirika will increase the chances of using Ainu language, which will increase the number of Ainu speakers,” said Dr. Hanaoka. Steps must be taken, however, to bridge the tech divide to ensure equal access to technology for language minorities, as the absence thereof may defeat the purpose of virtual preservation for those it hopes to serve most. In all preservation efforts, an agentic and collaborative approach is also crucial to upholding indigenous rights outlined in the UNDRIP. Commitment to these practices raises hope in imbuing technological preservation efforts with the indelible spirit of the Ainu.