What is Optimization in Compiler design? (Part One)

4 min readJan 20, 2021

First of all, let’s introduce some basic ideas around optimization independently.

Optimization problem is a general term in various fields of science. Basically, it is a problem of finding the best solution from all feasible solutions. For a given problem, we have some condition (constraints) that the solution must satisfy. The act of satisfying these constraints yields a feasible region. Furthermore, various mathematical methods were offered to model and solve according to nature of the problem. The goal of optimization methods is to find optimal or near-optimal solution with low computational effort (computation time and memory). While the effort to achieve the best out of a problem sometimes grows polynomially with the problem size, it is usually exponential which means methods we use couldn’t be applied when the problem size grows. If you are interested in the topic of optimization, I recommend you to read Design of Modern Heuristics, Chapter 3 (Optimization Methods).

For a programmer it is more convenient to write code that is easy to maintain and read. Some redundancies will show up while using high-level languages, which are side effects to the run-time of the program. For example, a programmer might repeat some calculations and let the compiler make efficiency decisions (i.e., elimination) after identification. Letting compiler decide, we can have best of both worlds. Code efficiency and maintainability.

In order to improve execution efficiency, compiler uses a couple of techniques that preserve semantic of the original program. This is called compiler optimization.

So, what is the need to do a compiler optimization? Well, if we naively translate each independent high-level language construct to machine code, this will introduce a substantial run-time overhead. Good news is that it is possible to run our program faster with theoretical transformations discussed in this post. The compiler cannot understand algorithms and their implementations in our code. Thus, it can’t replace the whole algorithm with substantially different and more efficient algorithm. In contrast, it only knows how to apply relatively low-level semantic transformation.

Now we see compiler consists of multiple stages from Lexical analysis to Code generation. Generally, there is two type of optimization. Machine-independent code optimization, as its name suggests, refers to performing optimization independently of the target machine for which we are generating code block. On the other hand, Machine-dependent code optimization requires knowledge of the target machine.

We need to get to know some subjects before we get deeper. Basic blocks, flow graphs and next-use information.

Basic blocks

Basic blocks are useful for discussing code optimization and generation. They are maximal sequences of consecutive three-address instructions with the properties discussed below:

· There are no jumps into the middle of the block.

· Except at the last instruction in the block, branching and halting is not allowed in the body of the block.

We should identify some instructions, named as leader, with the following rules:

· The first instruction in the intermediate code is leader.

· The target of jump or branch instructions is also a leader.

· The instructions after jump or branch are leaders.

Basic block representation of a code (creating a 10x10 unitary matrix, 8 byte for each array element)

Flow graph

This graph shows the flow of code so we can use it later for compiler optimization purposes. In order to populate the graph, we should assign each node a basic block. The edges will then represent the links between the blocks caused by jumps and branches. There are also two other nodes called start and end which is the start of the control flow and its end, respectively. There is always one link between start node and the first block. However, end node might receive multiple links from the blocks.

Next-use information

For generating a good code, it is necessary to know whether a variable will be used in future or not. If a variable will not be referenced later in code, it is the wasteful to keep it in the computer memory, say processor registers, so to make more rooms for other variables. A variable is live at statement i, if it is assigned to a value. Assuming the value is use in statement j and didn’t change between statement i and statement j, it is said to be used in statement j. This information is stored by the compiler and is essential to apply optimization algorithms and techniques.

Summary

In this part, compiler optimization and fundamental concepts were introduced. We will explain methods and techniques that compiler can use to optimize code in next parts.

References

Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman, 2006. Compilers: Principles, Techniques, and Tools. 2nd ed. Pearson