Home : Bits and Bytes : Unicode Definition

Unicode

Unicode is a universal character encoding standard. It defines the way individual characters are represented in text files, web pages, and other types of documents.

Unlike ASCII, which was designed to represent only basic English characters, Unicode was designed to support characters from all languages around the world. The standard ASCII character set only supports 128 characters, while Unicode can support roughly 1,000,000 characters. While ASCII only uses one byte to represent each character, Unicode supports up to 4 bytes for each character.

There are several different types of Unicode encodings, though UTF-8 and UTF-16 are the most common. UTF-8 has become the standard character encoding used on the Web and is also the default encoding used by many software programs. While UTF-8 supports up to four bytes per character, it would be inefficient to use four bytes to represent frequently used characters. Therefore, UTF-8 uses only one byte to represent common English characters. European (Latin), Hebrew, and Arabic characters are represented with two bytes, while three bytes are used to Chinese, Japanese, Korean, and other Asian characters. Additional Unicode characters can be represented with four bytes.

Updated: April 20, 2012

Cite this definition:

https://techterms.com/definition/unicode

TechTerms - The Tech Terms Computer Dictionary

This page contains a technical definiton of Unicode. It explains in computing terminology what Unicode means and is one of many computing terms in the TechTerms dictionary.

All definitions on the TechTerms website are written to be technically accurate but also easy to understand. If you find this Unicode definition to be helpful, you can reference it using the citation links above. If you think a term should be updated or added to the TechTerms dictionary, please email TechTerms!