programming concepts: July 2013

Sunday, 7 July 2013

compiler

A compiler is a computer program (or set of programs) that transforms source code written in a programming language (high language) (the source language) into another (machine language) computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program

The compiler scans the entire program first and then translates it into machine code which will be executed by the computer processor and the corresponding tasks will be performed.

A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis (Syntax-directed translation), code generation, and code optimization.

Structure of a compiler

Any large software is easier to understand and implement if it is divided into well defined modules.

$\begin{figure}%% \htmlimage \centering\includegraphics[scale=.4]{structureOfACompiler.eps} \end{figure}$

1. Lexical analysis:-It is the process of breaking down the source files into key words, constants, identifiers, operators and other simple tokens. A token is the smallest piece of text that the language defines.

2. Syntactical analysis:- It is the process of combining the tokens into

well-formed expressions, statements, and programs. Each language has

specific rules about the structure of a program--called the grammar or

syntax. Just like English grammar, it specifies how things may be put

together. In English, a simple sentence is: subject, verb, predicate.

3. Semantic analysis:- It is the process of examining the types and values of the

statements used to make sure they make sense. During the semantic

analysis, the types, values, and other required information about statements are recorded, checked, and transformed as appropriate to make sure the program makes sense.

For C/C++ in the line:

float x = "This is red"++

The semantic analysis would reveal the types do not match and can not be made to match, so the statement would be rejected and an error reported.

While in the statement:

float y = 5 + 3.0;

The semantical analysis would reveal that 5 is an integer, and 3.0 is a

double, and also that the rules for the language allow 5 to be converted to

a double, so the addition could be done, so the expression would then be

transformed to a double and the addition performed. Then, the compiler

would recognize y as a float, and perform another conversion from the double

8.0 to a float and process the assignment.

4. Intermediate code generation:-In this process,depending on the compiler, this step may be skipped, and instead the program

may be translated directly into the target language (usually machine object code). If this step is implemented, the compiler designers also design a machine independent language of there own that is close to machine language and easily translated into machine language for any number of different computers.

The purpose of this step is to allow the compiler writers to support

different target computers and different languages with a minimum of effort.

The part of the compiler which deals with processing the source files,

analyzing the language and generating the intermediate code is called the

front end, while the process of optimizing and converting the intermediate

code into the target language is called the back end.

5. Code optimization:-In this process the code generated is analyzed and improved for efficiency. The compiler analyzes the code to see if improvements can be made to the intermediate code that couldn't be made earlier. For example, some languages like Pascal do not allow pointers, while all machine languages do. When accessing arrays, it is more efficient to use pointers, so the code optimizer may detect this case and internally use pointers.

6. Code generation:- Finally, after the intermediate code has been generated and optimized, the compiler will generated code for the specific target language. Almost always this is machine code for a particular target machine.

Also, it us usually not the final machine code, but is instead object code,

which contains all the instructions, but not all of the final memory

addresses have been determined.

A subsequent program, called a linker is used to combine several different

object code files into the final executable program.

The C Programming Language

C is a general-purpose programming language initially developed by Dennis Ritchie between 1969 and 1973 at AT&T Bell Labs. Like most imperative languages in the ALGOL tradition, C has facilities for structured programming and allows lexical variable scope(static variable scope) and recursion, while a static type system prevents many unintended operations. Its design provides constructs that map efficiently to typical machine instructions, and therefore it has found lasting use in applications that had formerly been coded in assembly language, most notably system software like the Unix computer operating system.

C is an imperative (procedural) language. It was designed to be compiled using a relatively straightforward compiler, to provide low-level access to memory, to provide language constructs that map efficiently to machine instructions, and to require minimal run-time support. C was therefore useful for many applications that had formerly been coded in assembly language, such as insystem programming.

Like most imperative languages in the ALGOL tradition, C has facilities for structured programming and allows lexical variable scope and recursion, while a static type system prevents many unintended operations. In C, all executable code is contained within subroutines, which are called "functions" (although not in the strict sense of functional programming). Function parameters are always passed by value. Pass-by-reference is simulated in C by explicitly passing pointer values. C program source text is free-format, using the semicolon as a statement terminator and curly braces for grouping blocks of statements.

The C language also exhibits the following characteristics:

1. There is a small, fixed number of keywords, including a full set of flow of control primitives: for, if/else, while, switch, and do/while. There is one namespace, and user-defined names are not distinguished from keywords by any kind of sigil.

2.There are a large number of arithmetical and logical operators, such as +, +=, ++, &, ~, etc.

3. More than one assignment may be performed in a single statement.

4. Typing is static, but weakly enforced: all data has a type, but implicit conversions can be performed; for instance, characters can be used as integers.

5. Declaration syntax mimics usage context. C has no "define" keyword; instead, a statement beginning with the name of a type is taken as a declaration. There is no "function" keyword; instead, a function is indicated by the parentheses of an argument list.

6. User-defined (typedef) and compound types are possible.

7. Heterogeneous aggregate data types (struct) allow related data elements to be accessed and assigned as a unit.

8. Array indexing is a secondary notion, defined in terms of pointer arithmetic. Unlike structs, arrays are not first-class objects; they cannot be assigned or compared using single built-in operators. There is no "array" keyword, in use or definition; instead, square brackets indicate arrays syntactically, e.g. month.

9. Enumerated types are possible with the enum keyword. They are not tagged, and are freely interconvertible with integers.

10. Strings are not a separate data type, but are conventionally implemented as null-terminated arrays of characters.

11.Low-level access to computer memory is possible by converting machine addresses to typed pointers.

12. Procedures (subroutines not returning values) are a special case of function, with an untyped return type void.

13. Functions may not be defined within the lexical scope of other functions.

14. Function and data pointers permit ad hoc run-time polymorphism.

15. A preprocessor performs macro definition, source code file inclusion, and conditional compilation.

16. There is a basic form of modularity: files can be compiled separately and linked together, with control over which functions and data objects are visible to other files via static and externattributes.

17. Complex functionality such as I/O, string manipulation, and mathematical functions are consistently delegated to library routines.

18. C does not include some features found in newer, more modern high-level languages, including object orientation and garbage collection.