This page has not been looked at yet My apologies for any issues this may cause.
3.2 : Unicode
The Universal Coded Character Set, or Unicode for short, is an extension of ASCII. The first 128 codepoints are the same as ASCII. The next 1,113,984 codepoints are literally anything else. What do I mean by "literally anything else"? What are these over 1 million codepoints?
The "Literally Anything Else"
I feel rude calling this section "literally anything else". It makes it seem pointless, even though there are many important codepoints in Unicode. This includes accents on letters and special characters for non-Latin alphabets. From a communication standpoint, this is very important. Unicode added mathematic symbols and greek letters. Now we can better express mathematic expresssions. It's important to recognize that computers could compute these expressions. Now, we can type these formulas. I use Unicode in future sections with boolean algebra. Below, I included tables with a few characters that are important.
Mathematic Characters | ||
---|---|---|
Name | UTF-16 Code | Character |
Plus Sign | 0x002B | + |
Multiplication | 0x00D7 | × |
Division | 0x00F7 | ÷ |
Equals Sign | 0x003D | = |
Accented Characters | ||
---|---|---|
Name | UTF-16 Code | Character |
a acute | 0x00E0 | á |
E grave | 0x00C8 | È |
c cedilla | 0x00E7 | ç |
n tilde | 0x00F1 | ñ |
Boolean Characters | ||
---|---|---|
Name | UTF-16 Code | Character |
Not | 0x00AC | ¬ |
And | 0x2227 | ∧ |
Or | 0x2228 | ∨ |
Xor | 0x22BB | ⊻ |
Clearly unicode contains many important characters. That doesn't mean ALL unicode characters are important. With over a million codepoints, there is a whole lot of unimportant and useless. Just because something is useless, doesn't mean that it's not fun. Now, I'm not a trekkie (sorry), but unicode includes the klingon alphabet, or as named by Marc Okrand in The Klingon Dictionary, pIqaD. If you don't have the proper fonts downloaded in your browser, the Klingon letters won't display properly. Hopefully, since you're probably a computer science major, you're a massive nerd. Hopefully, being a massive nerd, you're prepared for this situation. Along with the highly important Klingon language, unicode also contains chess pieces. Don't worry, there are white and black chess pieces. Long story short, unicode is very inclusive.
pIqaD | ||
---|---|---|
Name | UTF-16 Code | Character |
A | 0xF8D0 | |
B | 0xF8D1 | |
CH | 0xF8D2 | |
D | 0xF8D3 | |
Chess | ||
---|---|---|
Name | UTF-16 Code | Character |
Black King | 0x265A | ♚ |
White King | 0x2654 | ♔ |
Black Pawn | 0x265F | ♟ |
White Pawn | 0x2659 | ♙ |
UTF Encoding
So if you were paying attention to the table, you noticed that one of the columns is "UTF-16 Code". If you were really paying attention, you noticed I didn't explain what that was. Well now, I'm gonna explain that to you! Aren't I a sweetheart. Anyway. Encodings. So unicode is composed of a whole lotta codepoints. We need some way of numbering and ordering them. UTF stands for Unicode Transformation Format. The following number is the number of bits. UTF-8 would be 8 bits, UTF-16 would be 16 bits, etc. etc. As the unicode library grows, you would need more bits to hold each value.
In HTML
If you are on a computer, I want you to look down at your keyboard. Count how many characters you can type. I counted 26 capital letters, 26 lowercase letters, 10 numbers, and 32 symbols on my laptop. That's 94 symbols I can type out. Now, that sure is a problem if I wanna print out the other 1,114,018 codepoints. One solution is a very large keyboard. Now, that's not exactly practical.\ Now this entire textbook I've written is in HTML. How did I display the character above in the table? HTML has unicode compatibility. The format is: & \# NUMBER ; or & KEYWORD ; So Δ is & \# 9 1 6 ; or & Delta ; There actually wouldn't be any spaces. I only did that to avoid HTML. Below is a link to a whole bunch of HTML entities. I also made a little tool for you to try them out on your own.
HTML Entity Tester
Enter a #Number or Keyword for an HTML entity. Make sure to include the '#' for numbers.
Δ |
More Unicode
So since there are over 1 million unicode codepoints, I'm not going to write them all out. However! Other people have already done that for me! Here are a few websites that I used for reference.
[Wikipedia : List of Unicode Characters](https://en.wikipedia.org/wiki/ListofUnicodecharacters)