Warning: Update in progress. Before adding new content, I'm going through and updating the content that already existed that I haven't touched in years. Also fixing up a lot of CSS. I wrote bad code a very long time ago and it's taking a while to update it.
This page has not been looked at yet My apologies for any issues this may cause.

3.2 : Unicode


The Universal Coded Character Set, or Unicode for short, is an extension of ASCII. The first 128 codepoints are the same as ASCII. The next 1,113,984 codepoints are literally anything else. What do I mean by "literally anything else"? What are these over 1 million codepoints?


The "Literally Anything Else"

I feel rude calling this section "literally anything else". It makes it seem pointless, even though there are many important codepoints in Unicode. This includes accents on letters and special characters for non-Latin alphabets. From a communication standpoint, this is very important. Unicode added mathematic symbols and greek letters. Now we can better express mathematic expresssions. It's important to recognize that computers could compute these expressions. Now, we can type these formulas. I use Unicode in future sections with boolean algebra. Below, I included tables with a few characters that are important.

Mathematic Characters
Name UTF-16 Code Character
Plus Sign 0x002B +
Multiplication 0x00D7 ×
Division 0x00F7 ÷
Equals Sign 0x003D =
Accented Characters
Name UTF-16 Code Character
a acute 0x00E0 á
E grave 0x00C8 È
c cedilla 0x00E7 ç
n tilde 0x00F1 ñ
Boolean Characters
Name UTF-16 Code Character
Not 0x00AC ¬
And 0x2227
Or 0x2228
Xor 0x22BB

Clearly unicode contains many important characters. That doesn't mean ALL unicode characters are important. With over a million codepoints, there is a whole lot of unimportant and useless. Just because something is useless, doesn't mean that it's not fun. Now, I'm not a trekkie (sorry), but unicode includes the klingon alphabet, or as named by Marc Okrand in The Klingon Dictionary, pIqaD. If you don't have the proper fonts downloaded in your browser, the Klingon letters won't display properly. Hopefully, since you're probably a computer science major, you're a massive nerd. Hopefully, being a massive nerd, you're prepared for this situation. Along with the highly important Klingon language, unicode also contains chess pieces. Don't worry, there are white and black chess pieces. Long story short, unicode is very inclusive.

pIqaD
Name UTF-16 Code Character
A 0xF8D0
B 0xF8D1
CH 0xF8D2
D 0xF8D3
Chess
Name UTF-16 Code Character
Black King 0x265A
White King 0x2654
Black Pawn 0x265F
White Pawn 0x2659


UTF Encoding

So if you were paying attention to the table, you noticed that one of the columns is "UTF-16 Code". If you were really paying attention, you noticed I didn't explain what that was. Well now, I'm gonna explain that to you! Aren't I a sweetheart. Anyway. Encodings. So unicode is composed of a whole lotta codepoints. We need some way of numbering and ordering them. UTF stands for Unicode Transformation Format. The following number is the number of bits. UTF-8 would be 8 bits, UTF-16 would be 16 bits, etc. etc. As the unicode library grows, you would need more bits to hold each value.


In HTML

If you are on a computer, I want you to look down at your keyboard. Count how many characters you can type. I counted 26 capital letters, 26 lowercase letters, 10 numbers, and 32 symbols on my laptop. That's 94 symbols I can type out. Now, that sure is a problem if I wanna print out the other 1,114,018 codepoints. One solution is a very large keyboard. Now, that's not exactly practical.\ Now this entire textbook I've written is in HTML. How did I display the character above in the table? HTML has unicode compatibility. The format is: & \# NUMBER ; or & KEYWORD ; So Δ is & \# 9 1 6 ; or & Delta ; There actually wouldn't be any spaces. I only did that to avoid HTML. Below is a link to a whole bunch of HTML entities. I also made a little tool for you to try them out on your own.

FreeFormatter

HTML Entity Tester

To Use:

Enter a #Number or Keyword for an HTML entity. Make sure to include the '#' for numbers.

Δ


More Unicode

So since there are over 1 million unicode codepoints, I'm not going to write them all out. However! Other people have already done that for me! Here are a few websites that I used for reference.

Compart : Unicode Library

[Wikipedia : List of Unicode Characters](https://en.wikipedia.org/wiki/ListofUnicodecharacters)