Chinese Character Cloud

November 2011–March 2012 / Independent work

This poster design resulted from an experiment in presenting a very large and complex data set: the frequency distribution of characters in the Chinese language. The source data—Chih-Hao Tsai's Frequency and Stroke Counts of Chinese Characters, available here—come from the analysis of a 171,882,493-character corpus conducted by researchers in Taiwan. The corpus consists of all Usenet newsgroup activity using BIG5-encoded, traditional characters during the years 1993-1994. I scaled the characters by frequency and arranged them in descending order spiraling outwards from the center. The colors of the characters represent the number of strokes required to write them, as indicated by the legend.

(Click image to enlarge)