No.037 - The Tale of Genji to computer science

The Tale of Genji to computer science


An exhibition about theThe Tale of Genji opened in March 2019 at New York’s Metropolitan Museum of Art. This is the first exhibition in North America to feature Genji-inspired works of art ranging from the 11th century, when the legendary novel was written, to the present.

The Tale of Genji has not only inspired artists. A researcher who became interested in Japanese literature thanks to Murasaki Shikibu’s masterpiece is engaged in a project that brings the fields of literature and computer science together by developing a system for computerized recognition of cursive Japanese script. We asked Tarin Clanuwat, a project researcher at the ROIS-DS Center for Open Data in the Humanities, how her encounter with The Tale of Genji led to her current project utilizing computer science.



Q: What first inspired you to do research on The Tale of Genji?

I came across a Thai translation of Asaki Yume Mishi (a manga adaptation of The Tale of Genji by Waki Yamato). I was already interested in Japanese culture, particularly the ancient culture of the Heian period (794–1185), and had studied Japanese since I was in elementary school. But when I went to libraries in Bangkok to learn more about the Heian period, they had virtually no works of classical Japanese literature on hand. That’s when I discovered there was a manga based on The Tale of Genji.

Then my dream of studying in Japan came true when I received a Japanese Government (MEXT) Scholarship and entered graduate school at Waseda University. From the outset my plan was to study The Tale of Genji, but to be honest, I knew nothing about scholarship on the work.



Q: Scholars studying Genji have to be able to read the old cursive handwritten script known as kuzushi-ji, which few contemporary Japanese readers are familiar with. How did you learn it?

I took a class in kuzushi-ji as part of my master’s program, but I couldn’t read it at all and failed the test. But I desperately wanted to learn it somehow, so I studied it by a rather unusual method. Figuring that if I learned how to write kuzushi-ji I would naturally learn to read it, I enrolled in a calligraphy class.

In practice, learning to read the script while practicing writing it helped me improve quickly. If you learn how to write kuzushi-ji, then you naturally focus on the brushstrokes when you read it as well. Because you are aware of how it was written, you internalize a sense of how to read it the way it was written.



Q: What sort of Genji research did you pursue in graduate school?

I studied how Genji scholars during the Kamakura (1185–1333) and Nanboku-cho (1336–1392) periods interpreted the text, which we can see from the commentaries they added to it. There are many types of Genji commentaries from this era, explaining the meaning of words, poems, historical references and so on. Deciphering those texts gave me a deeper understanding of Heian culture.



Q: Currently you are using artificial intelligence technology to develop a computerized kuzushi-ji recognition system. What led you to do research in the field of computer science?

All the time I was in graduate school, I worked on rewriting the Genji commentaries in modern script. This is hard work that seems to take forever when the book is a long one. I thought that I could use my study time more efficiently if I were able to use a computer for part of that process. Then, when I was writing my doctoral dissertation, I found myself wishing I could do full-text searches through the manuscripts I was reading. But given the huge volume of extant literature, there are far too few people available to carry out the task of converting all the texts. I decided I wanted to develop a text-conversion system to solve that problem, so I started to research machine learning for kuzushi-ji recognition.



Q: How does this new text-conversion system differ from previous efforts to develop automatic recognition of kuzushi-ji?

The automatic character recognition process typically consists of four steps. First, you digitize the document. Next you analyze the layout of the image—which areas are background, which are text, which are pictures. Then you divide the text area into its elements—paragraphs, lines, characters. Finally, you identify each individual character. But that method doesn’t work well for kuzushi-ji because the script is cursive and the characters run together.

To solve this problem, we adopted a method that does not explicitly divide up the text. This method was originally proposed for cell image analysis in the biomedical field. It occurred to us that the overlap of characters in classical texts resembles the overlap of cells in a specimen. When we actually applied this method, we found that it recognized kuzushi-ji with greater accuracy than previous methods. In the future we hope to make this text-conversion system available for anyone to use.



Tarin says that the use of computers to improve efficiency in converting kuzushi-ji will benefit not only scholars, but anyone who wishes to read classical texts but finds it hard to decipher cursive handwritten scripts.

Tarin’s current research perfectly dovetails her interest in programming and the love of the Japanese language she has nurtured since her childhood days. We look forward to what further “tales” may unfold in the course of her future research.

(Interview: Ayumi Koso)



After only three years of study, Tarin achieved fourth-dan ranking as a kana (Japanese syllabary) calligrapher.



Results (in red) of conversion of the beginning of The Tale of Genji to modern kana characters
by the text-conversion system developed by Tarin and her research group.




Tarin Clanuwat has been a project researcher at the ROIS-DS Center for Open Data in the Humanities since April 2018. She completed the doctoral program in Japanese Studies of the Waseda University Graduate School of Letters, Arts and Sciences in March 2018. She received the Best Paper Award from the IPSJ SIG Computers and the Humanities Symposium for her research on automatic recognition of cursive Japanese script.