A compiler is a computer program (or set of programs) that
transforms source code written in a programming language (high language) (the source
language) into another (machine language) computer language (the target
language, often having a binary form known as object code). The
most common reason for wanting to transform source code is to create an executable program
The compiler scans the entire program
first and then translates it into machine code which will be executed by the
computer processor and the corresponding tasks will be performed.
A compiler is likely to perform many or
all of the following operations: lexical
analysis, preprocessing, parsing, semantic analysis (Syntax-directed translation), code generation, and code optimization.
Any large software is easier to understand and implement if it is divided into well defined modules.
1. Lexical analysis:-It is the process of breaking down the source
files into key words, constants, identifiers, operators and other simple
tokens. A token is the smallest piece of
text that the language defines.
2. Syntactical analysis:- It is the process of combining the tokens into
well-formed
expressions, statements, and programs. Each language has
specific rules
about the structure of a program--called the grammar or
syntax. Just
like English grammar, it specifies how things may be put
together. In
English, a simple sentence is: subject, verb, predicate.
3. Semantic
analysis:- It is the process of examining the types and values of the
statements used
to make sure they make sense. During the semantic
analysis, the
types, values, and other required information about statements are
recorded, checked, and transformed as appropriate to make sure the program
makes sense.
For C/C++ in the
line:
float x = "This is red"++
The semantic
analysis would reveal the types do not match and can not be made to match, so
the statement would be rejected and an error reported.
While in the statement:
float y = 5 + 3.0;
The semantical
analysis would reveal that 5 is an integer, and 3.0 is a
double, and also
that the rules for the language allow 5 to be converted to
a double, so the
addition could be done, so the expression would then be
transformed to a
double and the addition performed. Then, the compiler
would recognize y
as a float, and perform another conversion from the double
8.0 to a float and
process the assignment.
4. Intermediate code
generation:-In this process,depending on the compiler, this step may be
skipped, and instead the program
may be translated
directly into the target language (usually machine object code). If this
step is implemented, the compiler designers also design a machine independent
language of there own that is close to machine language and easily translated
into machine language for any number of different computers.
The purpose of
this step is to allow the compiler writers to support
different target
computers and different languages with a minimum of effort.
The part of the
compiler which deals with processing the source files,
analyzing the
language and generating the intermediate code is called the
front end, while
the process of optimizing and converting the intermediate
code into the
target language is called the back end.
5. Code
optimization:-In this process the code generated is analyzed and
improved for efficiency. The compiler analyzes the code to see
if improvements can be made to the intermediate code that couldn't be
made earlier. For example, some languages like Pascal do not allow
pointers, while all machine languages do. When accessing arrays, it is
more efficient to use pointers, so the code optimizer may detect this
case and internally use pointers.
6. Code
generation:- Finally, after the intermediate code has been generated
and optimized, the compiler will generated code for the specific target
language. Almost always this is machine code for a particular target
machine.
Also, it us
usually not the final machine code, but is instead object code,
which contains all
the instructions, but not all of the final memory
addresses have
been determined.
A subsequent
program, called a linker is used to combine several different
object code files
into the final executable program.