Compiler: Difference between revisions
imported>Pat Palmer No edit summary |
imported>Pat Palmer (switching order of 2nd and 3rd paragraphs) |
||
Line 1: | Line 1: | ||
In [[computer science]], a '''compiler''' is a [[translation system]] for a [[computer]] [[programming language]]. For example, a compiler might translate a human readable program, called [[source code]], into [[machine code]]. Compilers are used for [[formal language|formal languages]] rather than [[natural language|natural languages]]. | In [[computer science]], a '''compiler''' is a [[translation system]] for a [[computer]] [[programming language]]. For example, a compiler might translate a human readable program, called [[source code]], into [[machine code]]. Compilers are used for [[formal language|formal languages]] rather than [[natural language|natural languages]]. | ||
The input to a compiler is a file (or files) containing a program in a source language. The source file is likely to be a human-readable [[programming language]], though it could be any unambiguous representation of an [[algorithm]], such as a [[flow chart]] or [[finite state machine]]. The output of a compiler is a different file containing code in a target language, often a low-level [[machine language]], though it could just as well be another high-level language. | |||
Most modern programming languages perform compilation in two stages, first from the source language to an intermediate language (typically an assembler), and second from the intermediate language to machine code. In so-called ''managed'' programming languages such as [[Java]] and [[C sharp|C#]], the second compilation is postponed until right before the program needs to execute, in which case it is called "just-in-time" compilation. | Most modern programming languages perform compilation in two stages, first from the source language to an intermediate language (typically an assembler), and second from the intermediate language to machine code. In so-called ''managed'' programming languages such as [[Java]] and [[C sharp|C#]], the second compilation is postponed until right before the program needs to execute, in which case it is called "just-in-time" compilation. | ||
The actions which a compiler must perform usually include the following: | The actions which a compiler must perform usually include the following: |
Revision as of 02:08, 9 May 2007
In computer science, a compiler is a translation system for a computer programming language. For example, a compiler might translate a human readable program, called source code, into machine code. Compilers are used for formal languages rather than natural languages.
The input to a compiler is a file (or files) containing a program in a source language. The source file is likely to be a human-readable programming language, though it could be any unambiguous representation of an algorithm, such as a flow chart or finite state machine. The output of a compiler is a different file containing code in a target language, often a low-level machine language, though it could just as well be another high-level language.
Most modern programming languages perform compilation in two stages, first from the source language to an intermediate language (typically an assembler), and second from the intermediate language to machine code. In so-called managed programming languages such as Java and C#, the second compilation is postponed until right before the program needs to execute, in which case it is called "just-in-time" compilation.
The actions which a compiler must perform usually include the following:
- Lexical Analysis or Scanning, in which the input characters are recognized by a set of regular expressions and output as a sequence of tokens.
- Syntactical Analysis or Parsing, in which the input tokens are recognized by a set of pushdown automatons and output a sequence of semantic actions.
- Semantic Analysis, in which each semantic action builds an internal or intermediate representation of the source program, and context sensitive errors (any error that cannot be discriminated by a context-free language) are detected.
- Optimization, in which the compiler attempts to replace computationally expensive portions of the program with less expensive versions, provided that no substitution affects the operation of the program.
- Code Generation, in which the intermediate language is translated piece at a time to the target language.
- Peephole Optimization, a final optimization pass in which analyses the output code over a small region (the peephole), searching for very localized optimizations.
In actuality, there may be multiple optimization stages scattered throughout this process. Additionally, most modern compilers repeatedly translate the language from an intermediate representation to a simpler intermediate representation in order to accomodate a wide swath of optimizations that operate on different levels of detail.
Lexical Analysis
During lexical analysis, a set of regular expressions translate the input sequence (generally characters) into an output sequence (called tokens). One popular tool to simplify the creation of lexical analyzers is a software package called lex.
Readers accustomed to programming may benefit from a few examples of errors that can be detected during this phase. A lexical analyzer could detect errors in a single token, for instance a number that has the letter 'y' in it, or a string with a missing end quote.
Syntactic Analysis
During syntactic analysis, an input sequence of tokens is matched against a set of gramatical constructs called productions. As each production is matched, a semantic action routine is called. The role of each semantic action is to build an intermediate representation of the input program, such as a list of variables and functions, and a sequence of instructions comprising each function.
Readers accustomed to programming may benefit from a few examples of errors that can be detected during this phase. A Syntactic analyzer could detect a syntactic error, such as a missing semicolon or curly brace. A syntactic analyzer cannot detect the use of an undeclared variable. This is because the declaration of a variable before its use is a context sensitive langauge requirement, though syntactic analyzers are generally context-free language recognizers.
Semantic Analysis
During semantic analysis, a compiler builds and examines an intermediate representation of the source program and checks it for consistency.
Readers accustomed to programming may benefit from a few examples of errors that can be detected during this phase. A semantic analyzer could detect errors, such as undeclared variables or functions.
Optimization
During optimization, a compiler attempts to alter its internal representation of the input program as to improve code speed, size, or many other code characteristics.