Wednesday, December 25, 2013

C++ Tutorial #8: More Fun with Numbers

As the title says.  Now we're gonna get into more theoretical math stuff.

The Average Computer’s Definition of “Integer”
  • A whole number (no fractions, decimals, or other nonsense)
  • Can be negative, positive, or 0
  • In math, integers include all real numbers that follow the above two rules.  However, numbers are infinite, so you can't program every single existing number into a computer's system.  There has to be a limit.  Your average modern computer can recognize integers from -2,147,483,648 through positive 2,147,483,647.  (Your programs probably won't be working with anything larger than that.  If you plan to, you'll need a more advanced, higher-processing machine, and you're probably getting a bit ahead of yourself.)
Any number that does not adhere to the criteria and stay within the range will cause the computer to go completely insane. (Okay, you can get around this by using a data type other than the integer, but we'll save those for later.)

Something You Need to Know About Numbers vs. Text
One of the reasons we swear at our devices so much when they don't work is that we often overestimate their abilities.  Computers are really just big calculators, and they have their limits, one of which is that they really only know how to deal with numbers.  At its core, a computer has no idea what letters or other special characters are.

Wait a minute...if computers don't recognize non-numeric characters, how are they able use variables like string to read text?

This is where ASCII comes in.  ASCII (which is pronounced "ass-key"—I'm serious) stands for American Standard Code for Information Interchange.  It's a code that assigns every character a number.  Characters that stand for single-digit numbers have the same value as the number they stand for.  For example, the text character "5", which stands for the number 5, is assigned an ASCII value of 5.  Non-numeric characters have numeric values as well.  Uppercase "A" has an ASCII value of 65, a front slash "/" is 47, and a space " " is 32.

So, when you add text characters like "A", "/", "5", or a space, the computer doesn't read them as text they way we do.  Instead, it's told, "Okay, these are text characters with values of 65, 47, 5, and 32.  Go look at the ASCII chart to see what those are and work from there."  We programmers type characters into our source code, but we specify them as characters so that the computer knows to look up their ASCII values.  Again, the computer speaks a much different language than we do, and ASCII is what allows us to communicate with it in terms that are familiar to us.

Here's a link to a printable ASCII chart.  It might be a good idea to keep it handy for reference:

http://www.asciitable.com/index/asciifull.gif


Yeah, I know there's a lot to it.  There are actually many different numeric/character codes that computers use, and a few are listed here.  ASCII values are listed in the very first column (the Dec column).  The one highlighted in red (Char) refers to the character equivalents of ASCII codes.  The other two columns (Hx and Oct) are numeric systems that I'll discuss later.

A Different Character Code

Something important to note about ASCII is that it's the American Standard Code.  In other words, it's only used on North American computers.  There are other character codes, and while they all work the same way, they might not include characters seen in the English language and might include other characters (such as the umlaut: ë, ü, etc).  A different international code, Unicode, is also used, but being an American I'm only familiar with ASCII, so that's what I'm going to work with.  If you're from another country but you use a keyboard that supports English characters, you should be able to follow along with it.  If not, however, don't worry.  It's the concept of computers and character codes that's important.

A Lesson in Numbers and Numeric Code

There are different systems of numbers that can be used when working with computers.  Humankind runs on the decimal system, which uses 10 as the basic number from which we work (which is why it's often called the "Base-10" system).  Really, all numeric systems are the same, but you're obviously initially going to be comfortable only with the one you were raised with.  Thus, we'll revisit that one before going into some of the more unfamiliar ones.

"Decimal" (Dec): The "human" number system
The most familiar set of numbers are those that follow the Base-10 decimal system.  As you probably have learned in math, human beings read and manipulate numbers through the Base-10 system.  This means that our numeric system has a total of 10 digits—0 through 9—that can be used in various combinations to create numbers.  It makes sense that man would gravitate towards a system with 10 digits—all you have to do is hold up your hands to find out why 10 is the optimal counting number for our species!  : )

Binary (Bin): Base-2
Binary is a numeric system just like the decimal system, but instead of Base-10 it's Base-2.  (Get it?  "Bi-" = 2?)  Base-10 has 10 digits to work with, but Base-2 only has 2 digits.  If you've ever seen binary code, you know those digits are 0 and 1.  So, while decimal numbers are formed by various combinations of the digits 0-9, binary numbers can only be formed from combinations of 0 and 1 (which is why you often see really long binary numbers).

Octal (Oct): Base-8
Of all the numeric systems, octal is used the least.  However, you might as well become familiar with the name in case it ever comes up.  As the name suggests, Octal is Base-8 and uses the digits 0 through 7, which gets really confusing, because it has so many digits in common with Base-10 but not all of them.  If you were to count in octal, you'd count 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14, 15, 16, 17, 20, 21, 22, etc.  You can actually follow octal on the ASCII chart I provided.

(Because of this habit of skipping numbers with the digits 8 or 9 in them, discrepancies between octal and decimal numbers add up quickly.  For example, the octal number 112, while it looks like a number in our good old familiar Base-10 system, actually means 74 in Base-10.  Like I said, confusing.)

Having said all this, octal itself is seldom used.  I explained it in-depth because the concept of counting in strange ways is very much present in the next system.

Hexadecimal (hx or hex): Base-16
Here's a weird one that is used often—not necessarily in an introductory C++ tutorial, but in more advanced programming.  If you're planning to go into technology as a field, get comfortable with hex.  Hexadecimal is a Base-16 numeric system, which might confuse you.  After all, the only digits that exist to mankind are 0 through 9.  If man only created 10 digits, how do we work with a numeric system that supposedly has 16?

The answer is we use letters.  Base-16 uses digits 0 through 9 the same way Base-10 does, but it adds the letters A-F on the end (so, in hex, A is a digit that stands for 10, B stands for 11, etc.).  To create numbers, it uses a combination of 0-9 and A-F.  What's really weird about Base-16 is that, since it includes all Base-10 numbers, hex and decimal numbers can look very similar.  In fact, the digits 0-9 represent the same numbers in both hex and decimal (so 0 in decimal is 0 in hex, and 9 in decimal is 9 in hex).

After 9, however, things get strange.

Base-10 (decimal):        Base-16 (hex):
10                                  A
11                                  B
12                                  C
13                                  D
14                                  E
15                                  F
16                                  10
17                                  11
...

And the list goes on.  As you can see, in hex, you go from 0 to 9, then A to F, and then repeat the process, but this time with a 1 in front.  So what we know as the number 16 is actually 10 in hex.  Hex numbers continue from 10 to 19, then 1A to 1F, then 20 to 29, then 2A to 2F, and so forth.  This relationship with the Base-10 system is also shown in the ASCII chart I provided.


Again, the only system you need to be familiar with in introductory programming is the decimal system.  Everything you need to know about Base-2 you already know—it's used in machine code, the only language a computer understands, blah blah blah.  I'm just telling you about these other systems because, seeing your interest in the inner workings of a computer, you're probably going to become far more involved in tech, and someday you'll be working with other numeric systems very closely.  Not in this tutorial, but someday.

Here's Something That's Relevant Right Now: Other Numeric Variable Data Types

Back to the C++ realm.  You're already familiar with the data type integer, but there are many others.  All groups have their benefits and drawbacks, so it's up to you to decide when and where you want to use them.

One last important thing to note: because computers are limited calculators, different data types can only be accurate for so many digits or decimal places.  Afterwards, they may not calculate correctly.  For example, the integer is accurate to 7 digits.  So if you make a calculation that has a 7-digit answer, such as 5,294,863, the computer will store the number 5,294,863 and work with it without making any errors.  However, if your number has more digits (5,294,863,700), after the 7th digit it will either round off the number, cutting it back down to 7 digits, or begin spitting out random numbers after the 7th place (so you could end up with something like 5,294,863,920).  You probably won't be working with very big numbers in the beginning of your programming career, but if you do, keep in mind how accurate you want your calculations to be when choosing data types as some are more accurate than others.


The integer (int) is a whole number, negative or positive.  The benefits of using integers are that they take up little memory (4 bytes per value), have decent accuracy (up to 7 digits), and have a pretty good range that the computer can handle (-2,147,483,648 through positive 2,147,483,647).  However, if you're looking for something with better accuracy or range, you'll need to use something else.

The short integer (short) is an integer that takes up 2 bytes and has a range of -32,768 through 32,767.  If your computer is low on memory, use this one instead of the regular int.

The unsigned integer (unsigned) takes up 4 bytes but has a greater positive range than the regular int.  Basically, it means the unsigned nixes the negative numbers to increase range of positive numbers from 0 through 4,294,967,295 without taking up any more memory space than the regular int.

The float (float) takes up 4 bytes and allows you to use decimals.  It can be used to represent any number between -3.4 * 10^38 (-3.4 times 10 to the 38th power—a really small number!) and 3.4 * 10^38 (a number of equal proportions in the opposite direction!).  It's accurate to 7 digits, including decimals of course.

The double (double) is like float, except it takes up 8 bytes and can represent any number between *takes deep breath*: -1.7 * 10^308 and 1.7 * 10^308.  It's also accurate up to 15 digits.  Holy cow.


All right.  I think we've talked enough about numbers.  Print out that ASCII chart and jot down some notes on the different data types.  You don't need to memorize everything now (you'll begin to remember these guys automatically as you use them in programs).  It's a good idea to get used to the idea of using as little memory as possible in a program while still retaining decent accuracy and having a data type sufficient for handling the numbers and values you'll be using in your programs.  Granted, a lot of modern computers have enough memory to handle whatever kind of data type you throw at it.  But it adds up, and if you end up using a lot of memory unnecessarily you could slow down your program, especially if you're trying to run it on an already slow or low-memory system.

Conservation vs. usability.  Always a balancing act, made even more entertaining with math.