Workshop Report

Japanese version is here.

On October 14, 1995, jus sponsored with the co-sponsorship of Wnn Consortium a "Kana-Kanji Conversion System Workshop." At present there are not many examples of workshops on such themes. Also, this is probably the first time most of us have had a place to have discussions with the people who are actually developing kana-kanji conversion systems for free product versions. Under such circumstances, the presenters were somwhat worried about whether this would be a success, but looking back the contents seem to have been a strongly significant workshop.

The workshop was divided into a morning and afternoon sesson, with the morning centering on lectures, and the afernoon focusing on panel discussions.

First of all, Mr. Ryotaro Waga (Omron Software) gave an activity report from the Wnn Consortium. The Wnn Consortium was established in 1990 with the purpose of widely disseminating the kana-kanji conversion system Wnn. Wnn itself was developed jointly by Kyoto University, Omron and Astec beginning in 1985, and in since Version 2 was presented as free software in 1987, it has included X Window System's contrib, the Chinese kanji conversion system cWnn, the Korean conversion system kWnn, and has come to a position of being the standard in free kana-kanji conversion systems. With such significance, currently the Wnn Consortium's goal has been mostly accomplished. Recently, Omron has issued an enhanced product Wnn6, and in the future the role of the Consortium will soon most likely be finished. The Wnn Consortium would like to continue to conserve the free version of Wnn4, but in order to avoid having the Consortium itself incur costs it will be run on a volunteer basis.

Next, Mr. Norihisa Doi of Keio University gave a talk entitled "Comupter Language and Character Sets" in which he spoke about various programming languages and their multi-byte kanji handling. The languages included were FORTRAN, COBOL, BASIC, PASCAL, Ada, C, Mumps, C++, Lisp, Prolog and APL. He discussed how each language has an established character expansion under ISO, and looked at how ASCII characters are stopped by language resources.

In the final morning topic, the Vice Editor of the Nikkei PC Magazine, Mr. Nishiyama spoke about Japanese input systems from the viewpoint of a PC user. PC Japanese input systems (called FEP for their construction) have been improving as far as considerations of tranfer efficiency. Based on the spread of hard disk and high speed CPU's, dictionary files are gradually becoming enormous, and it has become proper for there to be high speed organization for AI conversion, etc. Otherwise, Windows/Macintosh GUI established management panels are picking up, and items outside of the original kana-kanji conversion functions (for example automatically correcting input mistakes) are being picked up. In the future language dictionaries will play a larger cooperative role, and dictionaries full of know-how should be studied as a use for elaborate support and OCR. We were left with the words that "since users aren't careful about minute variations, there is no awareness of fixed qualitative analysis of conversion efficiency."

In the afternoon, we were treated to the appearance of panelists related to kana-kanji conversion system development who had discussions on several themes.

After that, the audience was included in the discussion. The panelists were as follows:

Mr. Seiji Kuwari (Omron): Developer of the kana-kanji system "Wnn"
Mr. Hiroshi Nagaoka (Omron): Developer of the kana-kanji system "Wnn"
Mr. Akira Kon (NEC): Developer of the kana-kanji system "kanna"
Mr. Yasuo Koyama (AI Software): Developer of the kana-kanji system "WX"
Mr. Makoto Ishisone (SRA): Developer of the Japanese input system for X Window Systems "kinput2"

In the beginning afternoon session, the theme was "Japanese input systems' architecture revision," which looked at Japanese input systems from the architectural side. The 4 panelists Mr. Kuwari, Mr. Ishisone, Mr. Kon and Mr. Koyama presented Wnn, kinput2, kanna and WX as illustrations of active systems.

The focus of the topic was the mutual utility of modules when using multiple systems, dictionaries, and the exchange of study information, and each system was seen to have its own strengths from the user viewpoint as far as the "tasty" combinations it can use. Without correction, WNN's dictionary server and kanna's joint interface construction, etc. were seen to be individual distinctive features of the systems, giving each a healthy figure. However in reality, even when exchanging study information for example for dictionaries, the parts of speech are frequently misnamed and the grammar analysis is not always in a good state so the flow of the attached language is difficult, which was shown to be a major issue. From the audience there was the unique idea of separating roman character to kana conversion from kana kanji conversion as a separate and independent activity.

The second afternoon session was on "progress in kana-kanji conversion efficiency" with various methodologies for correct conversion being discussed.

The words conversion efficiency are hard to define, but there seem to be 2 meanings.

Getting the correct translation on the first try. (efficiency of the conversion engine unit).
Turn-around to finally correctly translate mistaken results (overall efficiency including the user interface).

This time the former definition of "conversion accuracy" and the latter definition of "conversion efficiency" were discussed.

First Mr. Koyama, Mr. Kon and Mr. Nagaoka spoke about the construction of the kanji transfer for WX, kanna and Wnn. Each system has a distinctive transfer method, narrow technique, etc., and the following items which were introduced were extremely interesting.

Analysis of character punctuation
Word classification system
Prioritization of conversion candidates based on charater space
Dictionary performance
Progress in conversion accuracy based on learning

Various technologies are complexly coiled around each other. This is one reason why ordinary methods for conversion efficiency haven't progressed. On the other hand, it seems that a great many selections are in preparation, and each system has an original methodology which is the foundation for efficient progress.

In the discussion which included the audience, a variety of topics arose. One of them was an appraisal method for measuring accuracy and efficiency.

Currently, analysis methods are not well established, and in order to show which conversion methods are excellent, it was shown that perhaps a sensational appraisal request should be the actual state. There is currently no scale for analysis comparison of different conversion systems, which was surprising.

Aside from this, there was a swelling of various topics including input methods for character punctuation, conversion systemization for language grammar, successive conversion comparisons for automatic conversions, arrangement of punctuation learning, conditions for obtaining current grammar analysis for each system (this question was answered by each kana kanji conversion system developer participating), methods for dictionary learning of finished sentences, desirability of AI conversion, languages aside from Japanese, weaknesses in colloquial language, conversion systems for reasons in mistaken input, and T-code.

Although we were extremely limited by having 1 day only, having this many topics appear lead to extremely intense discussion, so it seems that this kind of workshop on this theme was strongly desired.

The beginning of development of kana-kanji conversion systems has already pased. To call it history would be inconvenient, but currently with improvements increasing it seems that it is still a "hot" management item technically.

Finally, we would like to express our thanks to al of the visiting speakers, panelists, committee members who undertook to put this on, and all of the attendees who joined in this workshop.