Programming languages - a short primer

In the following paragraphs you will be given a short tour around the vast area of programming languages which are used for computer programming.

I will start with the lowest level of computer programming and work my way up to the high levels where most software is created today, roughly following the different technical levels and the historic development of programming languages:

Numerical ghost in the machine: machine code

Computer programming is hardly possible without the use of programming languages. Even on the lowest level, inside the computer chips, programs are presented to the central processing unit (CPU) in a programming language.

On this lowest level of computer systems, the central processing units work with a succession of numerical values - this is called "machine code".

This machine code language is basically a stream of numbers. While it can be seen as a stream of zeroes and ones (binary code, representing the electrical “on”/”off” status of the CPU’s transistors), it is usually written and displayed in hexadecimal values when a programmer actually is programming on this low level.

The binary numbers are converted into hexagonal numbers, which are calculated on the basis of 16 (in contrast to the normal base of 10 or the base of 2 for binary code). This is also known as "hex code".

Hexadecimal machine code looks like this:

 CODE: Select all lines of code for copying

AD  A0  03
6D  A1  03
8D  A2  03
00

This example is a small machine code routine written for the old 8-bit CPU of the Apple II computers, the MOS 6502.

The structure of these numbers combined with the way the CPU works make the CPU interpret some of the numbers as calls to electrical functions which are based on the chip in the form of hardware. By computing a lot of numbers very fast, modern computers and their hardware components can perform all the different higher level functions. This includes displaying and printing the letter to your aunt, playing a song or showing all of the photos of your recent vacation in a nice slideshow.

Low-level machine code is also known as first generation programming language.

Assembly line

Today this most basic form of programming is hardly used at all, at least not in personal computers. It is fast and does not need any translation for the computer in order to understand it, but it is far from being easily readable for human beings.

When maximum speed is required, programmers usually only go as low as the next level, which is assembly code language.

Assembly language combines numbers in hex code with simple, abbreviated English command words. These are the same commands which you can find in technical manuals for the different CPU types.

The machine code sample from above is a little bit more readable in assembler:

 CODE: Select all lines of code for copying

LDA	$  03  A0
ADC	$  03  A1
STA	$  03  A2
BRK

Translated into normal English the routine would look more like this:

  • Load accumulator with the value from memory location (hex) 03A0.
  • Add with carriage value from memory location (hex) 03A1 (to accumulator).
  • Store new accumulator value in memory location (hex) 03A2.
  • Break (stop here).

The assembler commands are abbreviations for their natural language counterparts, known as “mnemonics”. Once you know the meaning behind the abbreviations, assembler programs become easy to read (more or less).

To fully understand the programs however, you need a deep and far-reaching knowledge about the hardware the program is written for.

The command words represent the actual functions which have been build into the CPU by its manufacturer. They are directly equivalent to the machine code commands. The assembler commands (and the machine code equivalents) are usually listed in the documentation for the chips themselves or in the relevant developer documentation.

While most of the assembler commands are easy to understand for someone familiar with the English language, some variations look very much alike. They only differ in small ways, e.g. the way parameters are listed. On the hardware level these variations control how the data is actually processed, in which order etc.

Apart from the different variations for the different assembler commands, there is another factor which makes it difficult to write long and complex programs directly in assembler.

The commands all provide very basic, limited functionality. If you want to create a complex program out of these small parts, this is like building a complete house out of matchsticks. It can be done, but the amount of parts you need and the amount of time it requires to put them together is really huge.

There is also a different form of assembler programming. This is known as “macro-assembler”. Macro-assembler is a program where you can write assembler routines and group them into subroutines and functions. These can then be referenced more easily which allows for longer, more complex programs. The macro-assembler generates a normal assembler/machine code program which will run directly on the target hardware.

Today assembler code is mostly used for direct hardware programming or for small functions within a larger program where speed is most essential.

Assembler code is also known as second generation language.

Rising to management level: macro assembler

While macro-assembler makes things a little bit easier when it comes to writing complex programs, the assembler routines are still long if you want to have a rich functionality in your program. Complex functionality like displaying a graphical window on your computer screen, require a relatively long block of assembler code.

It is also quite difficult to find errors in your program, especially logical errors.

This is where higher programming languages come into play. These kinds of programming languages are more like real languages and less oriented towards the technical side of programming only.

Complex functions are usually turned into single commands, or can be realized with very few lines of code. In comparison to natural languages this kind of programming language is very easy in its syntax and (grammar) rules. This makes them a lot easier to learn than natural languages.

Just like natural languages, programming languages have evolved over time. There are several variations today, with more languages being developed all the time.

These programming languages can be divided into different generations (currently third- or fourth-generation languages mostly) as well as different types of languages.

Get me a translator: interpreted and compiled languages

Higher programming languages are either interpreted by an interpreter program or compiled with the help of compiler and linker programs.

An interpreter just takes the list of instructions which make up your program and acts upon them step by step.

In contrast to this a compiler translates the program into machine code language. With the help of a linker program standard functionality from standard function libraries can be added if necessary.

In the end this produces a runnable program (also known as “executable”) which can be used independently of your development environment.

In contrast to this, an interpreted program can only run inside the interpreter program. So you always need a second program to run the program you created.

Call me Babylon - high level programming languages

Different categories for programming languages were introduced over time in order to get a better overview of all the programming languages.

If you look up the term “programming language” in wikipedia you can find lists of known programming languages (like this alphabetical list). This list is quite long, even though the different BASIC dialects have been moved to a separate list. No wonder then that some kind of grouping mechanism is sorely needed.

I don’t want to expand upon the actual history of programming languages. You can find this information easily in wikipedia.

For now, it is more important to know that out of all these languages, only a very small subset is used outside of scientific spaces for real-world applications.

Different variations of BASIC – the “Beginner's All-purpose Symbolic Instruction Code” (obviously this was named by techno-geeks :-)) – are commonly used to teach programming.

Several different versions and iterations of the “C” programming language (like C++, C# and Objective C) can be found in the development of operating systems, applications programs and tools.

Java is also used more and more often in current software development, especially when it comes to writing programs for the (world wide) web which have to be independent of the hardware they run on.

Apart from this you can also still find older programs written in Cobol or Fortran in a lot of corporations.

As far as interpreted languages go, JavaScript as lately been under siege by newer, often more specialized scripting languages. Often used for web applications these kind of languages seem to be evolving at a greater pace than the compiled languages, at least right now. Among others, these languages include things like Python, Ruby and Lua.

The beginners choice: BASIC

As a person new to computer programming, in spite of all the existing and upcoming programming languages, there is really only one place to start, namely the BASIC language.

Some people may argue that you could also start with Logo, but this language is more aimed at kids. It doesn’t offer the flexibility of a modern BASIC. With BASIC you can not only learn how to program but also create complex programs which can compete with other professional software.

The modern versions of BASIC (BASIC dialects) still implement the syntactic and logical rules of the classic ANSI standard for BASIC programming languages.

With the addition of several principles from object-oriented programming and functionality provided by modern operating systems, BASIC programs can be extremely flexible while still being relatively easy to understand (The Visual BASIC family).

Object-oriented programming seems to be a big word. But in fact it makes things easier because it is more akin to handling real-life things.

Where programming without objects uses variables and data types separate from the functions which are used to work with them, object-oriented programming combines these into related object classes.

So instead of defining a graphical window and separate functions to open, resize, refresh and close it, the functions become part of the object definition. Once you have defined a new window object, you can use its open, resize, refresh and close functions without having to program them separately.

Where to go from there - advanced choices

Once you have mastered the BASIC language and you can create complex programs with it, you might want to take a look at what other languages have to offer in comparison.

Java is a good thing to look at here. While it doesn’t provide the fastest programs, the different mechanisms incorporated in it make the programs save and stable. In addition to this, Java is also designed to be independent of the hardware platform it runs on. This makes it easy to develop programs which run on different kinds of computer systems, mobile phones etc.

If you are thinking of becoming a programming professional of any kind, then the modern versions of the “C” language cannot be ignored. For several years now the object-oriented version of C known as “C++” (cplusplus, the increment to C) has been the language of choice for professional software developers in any field of application. At the time of writing however (April 2008), Microsoft’s newer “C#” (C sharp) is slowly but steadily replacing C++ as developers choice.

In spite of the differences between C, C++ and C# the languages have a lot in common. Due to a lack of C# compilers and tools for operating systems outside of Microsoft Windows – like Linux, all the Unix variants on server systems or Apples MacOS etc. – I would still advise anyone with an interest for the “C” language to learn “C++”.

Far out - programming the future

So far, most of the programming languages which are used today belong to the vast group of imperative programming languages. This is due to the fact that in each of these languages the programs are made up of a list of orders which the computer simply has to follow.

But there are also programming languages which operate in different ways. The biggest groups here are made up of the logical and functional programming languages. The “Prolog” language is a typical member of the group of logical programming languages. “TermML” would be an example for a functional programming language.

Programming in these languages requires a different way of modelling problems and possible solutions. Instead of listing orders, available data and known rules concerning the problem which has to be solved are described in an abstract, logical/mathematical kind of way.

This approach makes it possible to create programs which can more easily learn by themselves (within given limits). When such a program is run, the computer will analyse the data and the rules and try to combine these into new rules and data. These in turn will either present the solution to the original problem or offer something new which is a step towards a solution.

These languages also allow certain side effects for several parts of the abstract descriptions. With these side effects such programs can display or print data, allow for user input etc. just like a program in an imperative programming language. So on the outside – the side of the program’s user – it does not look too different from “imperative” programs.

Once the development of the modern, object-oriented variations of “classic” programming languages like C will reach a point where further refinement seems to be possible or sensible, it is very likely that logical or functional programming languages will become more important.

We might see a development similar to the current crop of imperative programming languages. Or new languages might combine logical and functional programming with imperative programming, combining the strengths of both types of programming language. This is already happening with the new "C++11" standard for example.

Back to the script - scripting languages

So far I only wrote about compiled or interpreted programming languages for stand-alone programs.

For smaller programs and actions there are also special scripting languages. Some of these can also be used to create fairly complex programs just like those you would write in BASIC or C++.

Usually though, such languages like Python, Ruby, Lua and shell script others are used to write scripts which realise special add-on functionality in a limited environment.

For example a shell script can be used to regularly create compressed archives of certain folders and move these into a backup. Another example might be a Lua script which controls the behaviour of a computer-controlled opponent in a strategy game.

Due to the more limited scope of most scripting languages, these are easier to learn.

They are also used more and more often in combination with more conventional programming languages like C++.

Current professional game development is a good example for this. The “engine” of a game usually requires a fast program and very direct access to the hardware and operating system functionality. This is usually done with C or C++ code.

Adding (interpreted) scripts in a simpler language to control the behaviour of objects and characters in a game level allows to chance these things and optimize them independently from the main program. This can be done by level and game designers while the programmers concentrate on getting the technical details to work properly.

Script languages are also used a lot in web applications. Java Script is the common standard for this and also part of the HTML5 standard for web browsers. Other script languages require an add-on for the browser, like the Adobe Flash plug-in for the Flash "Action Script" language.

Closing comments

As you may have realized by now, there is a lot more to computer programming than most people would think.

But this should not stop you from learning how to program. There is no need to study all the different aspects of programming and programming languages just to write a few programs yourself.

You can expand your knowledge in this area step by step over time. Or you can simply concentrate on learning one programming language and use it as good as possible.

It’s all up to you.