Unicode System in Java

The Unicode System is a universal character encoding standard that Java uses to represent characters. It ensures that Java programs can handle characters from multiple languages and scripts, making it platform-independent and suitable for global applications.

What is Unicode?

Unicode provides a unique number (called a code point) for every character in every language, including symbols, digits, and special characters. It supports over 143,000 characters across more than 150 modern and historic scripts.

Code Point Range: Unicode code points range from U+0000 to U+10FFFF.
Examples:
- 'A': U+0041
- 'अ': U+0905
- '你': U+4F60

Why Unicode in Java?

Before Unicode, different systems used different encoding standards, such as ASCII or ISO-8859. These systems had limitations, particularly when representing non-English characters. Unicode solves these problems by providing a consistent encoding system.

Characteristics of Unicode in Java

Default Character Encoding:
- Java uses Unicode to represent char data type and String objects.
- Each char is 16 bits (2 bytes) in Java, based on the UTF-16 encoding scheme.
Wide Character Support:
- Java can handle characters from multiple languages, symbols, and emojis.
Compatibility:
- Unicode ensures that Java programs can be executed on any platform with consistent results.

How Java Implements Unicode

Using the char Data Type:
- The char type in Java is a 16-bit Unicode character.
- Example:
  
  public class UnicodeExample
  
  {
  
  public static void main(String[] args)
  
  {
  
  char letter = 'A'; // Unicode: U+0041
  
  char hindiChar = 'अ'; // Unicode: U+0905
  
  System.out.println("Letter: " + letter);
  
  System.out.println("Hindi Character: " + hindiChar);
  
  } }
Using Unicode Escapes:
- Unicode characters can also be represented using escape sequences in the form uXXXX, where XXXX is the hexadecimal code point.
- Example:
  
  public class UnicodeEscapeExample {
  
  public static void main(String[] args)
  
  { char letter = 'u0041'; // Unicode for 'A'
  
  char smiley = 'u263A'; // Unicode for ☺
  
  System.out.println("Letter: " + letter);
  
  System.out.println("Smiley: " + smiley);
  
  } }

Unicode Encoding Schemes

Java primarily uses the UTF-16 encoding scheme, which:

Encodes most common characters (Basic Multilingual Plane) in 16 bits.
Encodes supplementary characters using 4 bytes (2 code units).

Other Unicode encoding schemes include:

UTF-8: Variable-length encoding (1–4 bytes) and backward compatible with ASCII.
UTF-32: Fixed-length encoding (4 bytes for all characters).

Advantages of Unicode in Java

Global Language Support:
- Enables applications to support multiple languages in a single program.
- Example: A Java application can process text in English, Hindi, Chinese, and Arabic simultaneously.
Platform Independence:
- Java’s Unicode support ensures consistent character representation across different platforms.
Ease of Use:
- Built-in support for Unicode in char and String simplifies handling characters.

Examples of Unicode Usage

Example 1: Printing Unicode Characters

public class UnicodePrintExample

{

    public static void main(String[] args)

{

        System.out.println("English: Hello");

System.out.println("Hindi: u0928u092Eu0938u094Du0924u0947");

System.out.println("Chinese: u4F60u597D");

        System.out.println("Smiley: u263A");

    }

}

Output:

English: Hello

Hindi: नमस्ते

Chinese: 你好

Smiley: ☺

Example 2: Supplementary Characters

public class SupplementaryExample

 {

    public static void main(String[] args)

{

        String emoji = "uD83DuDE00"; // Unicode for 😀

        System.out.println("Emoji: " + emoji);

    }

}

Key Points

UTF-16 Encoding: Java’s char and String are based on UTF-16.
Unicode Escapes: Use uXXXX format for specifying characters in code.
Global Compatibility: Unicode allows Java applications to handle text in any language.
Memory Efficiency: UTF-16 uses variable-length encoding for supplementary characters.