Warning |
This module is EXPERIMENTAL. Function name/API is subject to be changed. Current conversion filter supports Japanese only. |
There are many languages that all characters cannot be expressed by single byte. Multi-byte character codes are used to express many characters for many languages. mbstring is developed to handle Japanese characters. However, many mbstring functions are able to handle character codes other than Japanese.
Multi-byte character encoding represents single character with consecutive bytes. Some character encoding has shift(escape) sequences to start/end multi-byte character string. Therefore, multi-byte character string may be destroyed when it is divided and/or counted, unless multi-byte character encoding safe method is used. mbstring functions support multi-byte character safe string functions and other utility functions such as conversion functions.
Most Japanese characters need more than 1 byte for a character. In addition to this, several character encodings are used under Japanese environment. There are EUC-JP, Shift_JIS and ISO-2022-JP character encoding. As Unicode is getting popular, UTF-8 is used also. To develop Web application for Japanese environment, it is important to use these character codes depend on its purpose, HTTP input/output, RDBMS and E-mail.
Storage for a character can be upto four bytes
A multi-byte character usually has twice of width compare to single byte characters. Wider character is called "zen-kaku" - meaning full width, narrower character called "han-kaku" - meaning half width. "zen-kaku" characters are fixed width usually.
Some character encoding defines shift sequence for entering/exiting multi-byte character strings.
Database may allocate storage for characters that differs from size used in PHP even if the same character encoding is used. (For example, PostgreSQL)
E-mail is supposed to use ISO-2022-JP.
"i-mode" web site is supposed to use Shift_JIS.
Following character encodings are supported in this PHP extension : UCS-4, UCS-4BE, UCS-4LE, UCS-2, UCS-2BE, UCS-2LE, UTF-32, UTF-32BE, UTF-32LE, UCS-2LE, UTF-16, UTF-16BE, UTF-16LE, UTF-8, UTF-7, ASCII, EUC-JP, SJIS, eucJP-win, SJIS-win, ISO-2022-JP(JIS), ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-13, ISO-8859-14, ISO-8859-15.
mbstring.internal_encoding defines default internal character encoding.
mbstring.http_input defines default HTTP input character encoding.
mbstring.http_output defines default HTTP output character encoding.
mbstring.detect_order defines default character encoding detection order.
mbstring.substitute_character defines character to substitute for invalid character codes.
HIVE: All information for read only. Please respect copyright! |