Help


[permalink] [id link]
+
Page "Character encodings in HTML" ¶ 14
from Wikipedia
Edit
Promote Demote Fragment Fix

Some Related Sentences

CJK and where
One possible rationale is the desire to limit the size of the full Unicode character set, where CJK characters as represented by discrete ideograms may approach or exceed 100, 000 ( while those required for ordinary literacy in any language are probably under 3, 000 ).
The glossary at Unicode. org defines “ Z-variant ” as “ Two CJK unified ideographs with identical semantics and unifiable shapes ,” where “ unifiable ” is taken in the sense of Han unification.

CJK and there
It does not make calculating the displayed width of a string easier except in limited cases, since even with a “ fixed width ” font there may be more than one code point per character position ( combining marks ) or more than one character position per code point ( for example CJK ideographs ).

CJK and are
In Chinese, Japanese, and Korean ( CJK ) fonts, these characters are rendered at the same width as CJK ideographs, rather than at half the width.
East Asian rules of typography, for example, require CJK fonts to always be monospaced at least as far as the main characters for writing words ( i. e. not punctuation ) are concerned.
Unicode discourages their use for mathematics and in Western texts because they are canonically equivalent to the CJK code points U + 300x and thus likely to render as double-width symbols.
They are useful for East Asian fixed-width CJK typography, because they are equal in proportion to one Chinese character.
In East Asia, graphics tablets, known as " pen tablets ", are widely used in conjunction with input-method editor software ( IMEs ) to write Chinese, Japanese, Korean characters ( CJK ).
Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible.
Later Unicode has been extended to 21 bits allowing many more CJK characters ( 75, 960 are assigned, with room for more ).
The national character code standards existing in CJK languages are considerably more involved, given the technological limitations under which they evolved, and so the official CJK participants in Han unification may well have been amenable to reform.
Most well-known code pages, excluding those for the CJK languages and Vietnamese, fit all their code-points into 8 bits and do not involve anything more than mapping each code-point to a single character ; furthermore, techniques such as combining characters, complex scripts, etc., are not involved.
Glyphs for CJK ideographs are reworked to look more like Arial Unicode MS, while sub-glyphs for these characters are repositioned and rescaled.
More code points are now associated to characters due to update of Unicode, especially the appearance of CJK Unified Ideographs Extension B.
The Unicode philosophy of codepoint allocation for CJK languages is organized along three “ axes .” The X-axis represents differences in semantics ; for example, the Latin capital A ( U + 0041 A ) and the Greek capital alpha ( U + 0391 Α ) are represented by two distinct codepoints in Unicode, and might be termed “ X-variants ” ( though this term is not common ).
Nevertheless, two single-byte, fixed-width code pages ( 874 for Thai and 1258 for Vietnamese ) and four multibyte CJK code pages ( 932, 936, 949, 950 ) are used as both OEM and ANSI code pages.

CJK and several
The IRG has contributed several blocks of characters to Unicode / UCS, including the CJK Unified Ideographs, CJK Compatibility Ideographs ( and Supplement ), and CJK Unified Ideographs Extensions A, B, C, and D. CJK Unified Ideographs Extension E is in preparation, as of Unicode 6. 0.

CJK and different
The need to support more writing systems for different languages, including the CJK family of East Asian scripts, required support for a far larger number of characters and demanded a systematic approach to character encoding rather than the previous ad hoc approaches.
The Unicode standard also has attempted to create a unified CJK character set that can represent Chinese ( Hanzi ) as well as the Japanese ( Kanji ) and Korean ( Hanja ) derivatives of this script through the Han unification process, which does not discriminate by language nor region for rendering Chinese characters, as long as the different typographic traditions have not resulted in major differences concerning what the character looks like – see: Image: Xin-jiu-zixing. png for examples of characters whose appearance recently underwent only minor changes in Mainland China.

CJK and multi-byte
The primary goal of the dvipdfmx project is to support multi-byte character encodings and CJK character sets for East Asian languages.

CJK and encodings
This is a compatibility character encoded for roundtrip compatibility with legacy CJK encodings ( which included it to conform to layout in square ideographic character cells ) and vertical layout.
CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as pinyin, bopomofo, hiragana, katakana and hangul.
CJK character encodings include:
These code pages represent DBCS character encodings for various CJK languages.
In computing, Chinese character encodings can be used to represent text written in the CJK languages — Chinese, Japanese, Korean — and ( rarely ) obsolete Vietnamese, all of which use Chinese characters.
In traditional CJK encodings characters usually took either a single byte ( known as halfwidth ) or two bytes ( known as fullwidth ).
They exist in Unicode because it was deemed useful to be able to “ round-trip ” documents between Unicode and other CJK encodings such as Big5 and CCCII.

CJK and use
It is similar to Unicode but does not use Unicode's Han unification process: each character from each CJK character set is encoded separately, including archaic and historical equivalents of modern characters.

CJK and is
CJK is a collective term for Chinese, Japanese, and Korean, which is used in the field of software and communications internationalization.
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the so-called CJK languages into a single set of unified characters.
Because modern Vietnamese is no longer written with Chinese characters at all, it is sometimes left out of this grouping, in which case the area is just called CJK.
Although originally coined for CJK ( Chinese, Japanese and Korean ) computing, the term is now sometimes used generically to refer to a program to support the input of any language.
The Smart Common Input Method platform ( SCIM ) is an input method ( IM ) platform containing support for more than thirty languages ( CJK and many European languages ) for POSIX-style operating systems including Linux and BSD.
It is up to line breaking rules in CJK.

CJK and also
Fonts covering the CJK characters usually include not only the script small but also four precomposed characters: < span style =" font-size: 112 %">㎕, ㎖, ㎗</ span >, and < span style =" font-size: 112 %">㎘</ span > ( U + 3395 to U + 3398 ) for the microlitre, millilitre, decilitre, and kilolitre.
The CID-keyed font format was also designed, to solve the problems in the OCF / Type 0 fonts, for addressing the complex Asian-language ( CJK ) encoding and very large character set issues.
OpenOffice. org version also supported Windows ME / 2000 for Asian / CJK versions, generic Linux 2. 2. 13 with glibc2 2. 1. 3, Solaris 7 SPARC ( 8 for Asian version ).
East Asian orthographies ( languages using CJK characters ) also tend to delimit syllables ( in the case of Chinese characters ) or morae ( in the case of kana ) rather than full words.

CJK and .
However, the standard makes no provision for the scripts of East Asian languages ( CJK ), as their ideographic writing systems require many thousands of code points.
The increased pin-count permitted superior print-quality which was necessary for success in Asian markets to print legible CJK characters.
* CJK Compatibility excerpt from The Unicode Standard, Version 4. 1.
Traditionally, all CJK languages have no spaces: modern Chinese and Japanese ( except when written with little or no kanji ) still do not, but modern Korean uses spaces.
The term CJKV means CJK plus Vietnamese, which constitute the main East Asian languages.

0.459 seconds.