Japanese Spam Analysis (or Artificially Intelligent Teaching by Statistics)

by Javantea
Original Analysis: Sept 25, 2008
Updated Analysis: Aug 9 - Sept 8, 2009, Feb 15, 2010
Published: Feb 15, 2010

Japanese AI version 0.3 [sig]
Japanese AI version 0.1 [sig]


Over a year ago I released the concept and initial analysis of the Japanese AI project here. Since then I have been using the results off and on for translation, learning, and other projects. Not long after, I wrote a generic version of this project, AltSci Language AI using Twitter as the data source. It also utilized the Google Translate Language API to translate the conversations on the fly. It became obvious that the benefits of this type of language software would be quite useful, so I made a few quick user interface improvements to Japanese AI, so that I could release the full results.

Learner's View || Memorizer's View


To view both of these pages, I recommend Sazanami font by the eFont open source project. This will allow you to make use of the larger font size as seen below.

Sazanami font Japanese AI sample

The Memorizer's View was designed to fit on two and a fraction of a page. The table is designed to be read from top to bottom first, then left to right, which should confuse you. If you find yourself confused, check the order of the Learner's View and make sure that you read the Memorizer's View in the same order.

As with the original analysis, the number on the left hand side of the Learner's view is the number of times it was used in all spams analyzed. The order in which you learn these is quite important because the most commonly used kanji are at the top.

The Learner's View was designed to be printed. Though it is many pages long, a person can print the first 10 or 20 and get a very good effect from it. One way to use the Learner's View is to cover the English side and then remember the kana and meaning of the kanji. If stumped, uncover the kana and see if you can guess.

All the translations in the Learner's View came from a front-end to KANJIDIC. They are Copyright (C) 2002 James William Breen. His license allows free use with acknowledgment, which I feel has been met.

If you wish to discuss this project, please contact me at jvoss@altsci.com or use the comment section below.


Comments: 2

Leave a reply »

  • Javantea

    This is a test.

  • Javantea

    A second test.

  • Leave a Reply
    Your gravatar
    Your Name