Compilers and IDE

Before you can start coding, you need to setup your environment.
Due to its popularity, I’ll use Windows in all my examples, however, you can program on any operating system. The most basic tools you require are a text editor and a compiler.

What is a compiler?

Great question. A computer doesn’t understand commands such as:

   printf("Hello World"); //this prints the text Hello World onto the screen

A computer can only interpret machine language, i.e. 0s & 1s. Different compilers work differently, GCC, a popular C compiler, will take your C code, and first turn it into assembly language which it can process. The reason you don’t directly write programs in assembly is that it will take much longer due to the fact that you’re working at a very low level, only providing the CPU with basic instructions. Moreover, the code is not exactly intuitive to read and understand — as you will see in a moment.

Let’s take a look at some basic assembly.
Instructions are encoded into a processor.
Let’s design a very simple 8-bit example.
A bit is the smallest unit of data in a computer. It’s a single binary digit and can be a 0 or a 1.
Each input will be composed of 8 bits.

Each 8-bit input will, therefore, be some combination of zeros and ones. We call this an encoding. On its own, this means nothing. An input could look like “00101111” and we wouldn’t know what to do with this. What makes encoding useful is a mutually agreed upon rule with designers of the CPU, and the programmers that use it.

Designing the CPU, we can say, “okay, the first 2 bits of the input represents the instruction. So when a programmer gives us any input, we understand the first 2 (leftmost) bits to specify the instruction. The next 3 bits will specify a destination address, and the last 3 will be a source address”.

Here is how we can interpret this encoding.

encoding: II XXX YYY

Above is the format in which input is received. That’s our encoding.

The first 2 bits are the instruction. Then we have 3 bits representing our destination and finally our source.

Let’s have a closer look at the bits themselves…

Remember everything is either a 1 or a 0.
How do we tell the processor what to do with only 1s and 0s?

0: 00
1: 01
2: 10
3: 11

So for this simple example, the instruction field only has 2 bits, hence there are a total of 2^2 possible instructions we can have.
Each of the above 2-bit binary numbers is a different instruction.

Let’s see what possible types of instructions exist in assembly.

mov
This is the move instruction. “mov” takes the contents of one address and moves it into another address.
Recall our encoding had the format:
II XXX YYY
XXX and YYY are addresses.
YYY is the source, and XXX is the destination.
Observe that, XXX and YYY as opposed to the II field are encoded in 3-bits
This means that our addresses go from 0 to 7.

Or in Binary that would be:

0: 000
1: 001
2: 010
3: 011
4: 100
5: 101
6: 110
7: 111

Hence each of the above values can be an address.
This is a very primitive example and does not exactly describe how the CPU in your computer works, however, this should provide you with some basic understanding of it.
mov is used like so:

mov r2,r4

where r2 is for example address 010 and r4 is address 110
We take the contents of address r4, move them into r2

We write it as “mov r2,r4” to make it easier to see. But what the CPU is really seeing here is: 00010110
movi
This instruction is similar to mov, however, the YYY field is not a source address, but rather a constant value. In the event that we wish to write a specific number to the destination address.

Add & Sub
These 2 commands add and subtract contents of addresses.

add r1,r2

This will add the contents of r1 and r2, storing the result in r1.

You might be asking, where exactly are these instructions coming from? That’s a great question. They are fetched from RAM.

Hence, getting back to the topic on hand, a computer will only understand these basic encoded instructions, 00, 01, etc.
A compiler takes your C code, turns it into assembly ie. (mov r2,r1). It then parses the code, makes sure it’s error-free, finally, it directly turns the assembly code into machine language ie. (00 010 110).

Here is a bit of code to do the exact same thing as the “printf” command, only instead of using C, it’s in x86 Assembly Language (windows).
Note that the commands used here are different from our simple example, as we made an 8-bit CPU whereas most modern chips today are 64-bit.

.486p
         .model  flat,STDCALL
include  win32.inc

extrn            MessageBoxA:PROC
extrn            ExitProcess:PROC

.data

HelloWorld db "Hello, world!",0
msgTitle db "Hello world program",0

.code
Start:
         push    MB_ICONQUESTION + MB_APPLMODAL + MB_OK
         push    offset msgTitle
         push    offset HelloWorld
         push    0
         call    MessageBoxA

         push 0
         call ExitProcess
ends
end Start

As you can see, it’s incredibly complicated and difficult to read. Not to mention the fact that in order to do the exact same task on a computer with a different CPU, such as an ARM chip which now powers the majority of mobile devices, the code will be different. A program written in C to do anything will be written the exact same way, independent of the platform on which it’s written on.
This is a rather complicated topic and deserves a page of its own.
Perhaps in a later page specifically on hardware.

Thus, recapping compilers, a compiler is a tool that will take your code and convert it into machine language. The first stage is the preprocessor, which will be discussed in further detail later when we start to write programs. Next is compiling, it checks for any syntax issues. The compiler will then generate machine code, this step is called assembling. It then generates object files and in the final stage, known as linking, these object files are linked together to give you an executable program.

Programming using a text editor will work just fine, however it’s very limited when it comes to debugging code and checking for errors, this gets more difficult to do as programs get larger and more complex.
To make your life easy, you can use an Integrated Development Environment (IDE).
This is a program that will make programming much easier as it will automatically check your syntax as you type code and offers power tools for debugging.
Debugging is an essential part of computer programming, as programs don’t always work on the first build. Despite being free of any syntax errors, your program won’t necessarily produce the expected results as logic errors may be present. Debugging is the process of finding and correcting any logic error in your code.

For those on Windows, some available IDEs are:
Visual Studio
Code Blocks
Net Beans
Eclipse
(Eclipse takes some work to get setup on Windows)

For Mac OS X, Xcode is recommended, it’s free on the App Store.

On Linux, all of the above listed are available and work fine, except Visual Studio.

I myself use Visual Studio, however when writing code in the examples, I just write down a block of code. You can copy this into your IDE, or word editor and compile it.

Next Page: Types, and memory 

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s