The Unicode System is a universal character encoding standard that Java uses to represent characters. It ensures that Java programs can handle characters from multiple languages and scripts, making it platform-independent and suitable for global applications.
What is Unicode?
Unicode provides a unique number (called a code point) for every character in every language, including symbols, digits, and special characters. It supports over 143,000 characters across more than 150 modern and historic scripts.
- Code Point Range: Unicode code points range from
U+0000
toU+10FFFF
. - Examples:
'A'
:U+0041
'अ'
:U+0905
'你'
:U+4F60
Why Unicode in Java?
Before Unicode, different systems used different encoding standards, such as ASCII or ISO-8859. These systems had limitations, particularly when representing non-English characters. Unicode solves these problems by providing a consistent encoding system.
Characteristics of Unicode in Java
-
Default Character Encoding:
- Java uses Unicode to represent
char
data type andString
objects. - Each
char
is 16 bits (2 bytes) in Java, based on the UTF-16 encoding scheme.
- Java uses Unicode to represent
-
Wide Character Support:
- Java can handle characters from multiple languages, symbols, and emojis.
-
Compatibility:
- Unicode ensures that Java programs can be executed on any platform with consistent results.
How Java Implements Unicode
-
Using the
char
Data Type:- The
char
type in Java is a 16-bit Unicode character. - Example:
public class UnicodeExample { public static void main(String[] args) { char letter = 'A'; // Unicode: U+0041 char hindiChar = 'अ'; // Unicode: U+0905 System.out.println("Letter: " + letter); System.out.println("Hindi Character: " + hindiChar); } }
- The
-
Using Unicode Escapes:
- Unicode characters can also be represented using escape sequences in the form
uXXXX
, whereXXXX
is the hexadecimal code point. - Example:
public class UnicodeEscapeExample { public static void main(String[] args) { char letter = 'u0041'; // Unicode for 'A' char smiley = 'u263A'; // Unicode for ☺ System.out.println("Letter: " + letter); System.out.println("Smiley: " + smiley); } }
- Unicode characters can also be represented using escape sequences in the form
Unicode Encoding Schemes
Java primarily uses the UTF-16 encoding scheme, which:
- Encodes most common characters (Basic Multilingual Plane) in 16 bits.
- Encodes supplementary characters using 4 bytes (2 code units).
Other Unicode encoding schemes include:
- UTF-8: Variable-length encoding (1–4 bytes) and backward compatible with ASCII.
- UTF-32: Fixed-length encoding (4 bytes for all characters).
Advantages of Unicode in Java
-
Global Language Support:
- Enables applications to support multiple languages in a single program.
- Example: A Java application can process text in English, Hindi, Chinese, and Arabic simultaneously.
-
Platform Independence:
- Java’s Unicode support ensures consistent character representation across different platforms.
-
Ease of Use:
- Built-in support for Unicode in
char
andString
simplifies handling characters.
- Built-in support for Unicode in
Examples of Unicode Usage
Example 1: Printing Unicode Characters
public class UnicodePrintExample { public static void main(String[] args) { System.out.println("English: Hello"); System.out.println("Hindi: u0928u092Eu0938u094Du0924u0947"); System.out.println("Chinese: u4F60u597D"); System.out.println("Smiley: u263A"); } }
Output:
English: Hello Hindi: नमस्ते Chinese: 你好 Smiley: ☺
Example 2: Supplementary Characters
public class SupplementaryExample { public static void main(String[] args) { String emoji = "uD83DuDE00"; // Unicode for 😀 System.out.println("Emoji: " + emoji); } }
Key Points
- UTF-16 Encoding: Java’s
char
andString
are based on UTF-16. - Unicode Escapes: Use
uXXXX
format for specifying characters in code. - Global Compatibility: Unicode allows Java applications to handle text in any language.
- Memory Efficiency: UTF-16 uses variable-length encoding for supplementary characters.