SN Systems - Quick wins for speedy links

TECH : Quick wins for speedy links

ARTICLE BY: Binutils Team
POSTED: Mar 03, 2016

TAGS: De-duplication, Dead-code stripping, Linker, LTO, Toolchain

Linking is the final stage when building your C/C++ program code. As a developer, you must wait for the link to complete before you can run or debug your program. Any delay in the process slows development and can become frustrating.

This article aims to ease the pain by offering suggestions on how to keep link times under control. In summary:

love your linker
supercharge your workstation
reduce input and output sizes
only use expensive options when needed

Love your linker

You don’t really need to love your linker, but it is worth appreciating what the linker is doing. At the high-level, the linker reads in object files, lays out their code and data, resolves addresses, and then copies the result into an output file. This operation is performed in three phases:

Scan: the linker inspects the object files, collecting information on the code and data sections they contain, and the relocations which will need applying when these sections are moved into position in the final output.
Layout: information collected during the scan is used to calculate how to structure the output file.
Output: the object files are revisited, this time to copy their code and data, apply relocations, and transfer them to the correct position in the output file.

The linker is primarily engaged in reading and writing files. Consequently it spends most of its time in the scan (30%) and output (50%) phases.

The linker is designed for correctness and efficiency. Each of the three phases should scale proportionately to the size of the inputs. There is a fundamental speed limit: fully linking a set of inputs can be no quicker than concatenating the files together.

Supercharge your workstation

File IO is the bottleneck for large links. It is not uncommon for code to occupy several GB on disk. The primary recommendation for linking this volume of data is to invest in suitable hardware. Specifically, use an SSD for fast file system access, ensure the file system has adequate free space, and keep inputs and outputs local (e.g. not on a network share). You should install as much RAM as you can to maximise the operating system file cache and keep hard page faults to a minimum. Increasing the number of CPU cores may not have much effect on reducing link times because file IO rather than processing is the bottleneck.

It is also worth using the operating system administration tools to check what else is accessing the file system. As an example, if you try and link at the same time as checking files out of version control, the link will slow down.

Warm and cold links

If the same link is run twice, using exactly the same options and inputs, the second run often takes a fraction of the time of the first run. This is because the operating system uses RAM to cache the files accessed on the first link, reducing physical disk access times on any subsequent link. We refer to the first link as the cold link, and any subsequent, faster links as warm links. Having more system memory installed increases the likelihood that input files will be present in memory on the subsequent links.

A slow link is quicker than no link

It’s infuriating to get to the end of a lengthy build only for it to fail due to the linker being unable to find a symbol it requires. What’s worse, the inputs all need to be scanned before the linker can issue this error. To fix the error, the inputs will need to be corrected, and the subsequent rebuild will require linking again.

The only advice to offer here is to pause before rebuilding! Are you aware of any reason why the build might fail? Have you added all the input files and defined any new functions or data your program requires?

Only use expensive options when needed

Some link options will increase link times. They include:

Dead-code stripping and de-duplication: stripping out unused functions and data, and eliminating duplicate symbols can reduce the loadable size of the final linked executable, but requires the linker to perform a significant amount of extra work.
LTO: Link Time Optimisation (LTO) is a powerful optimisation feature. When this feature is enabled, the compiler generates object files in an intermediate format. When these intermediate objects are linked, they get combined and optimised as a whole. This process is CPU and memory intensive.

Unlike the other options listed above, the use of LTO can be phased in: if most of the gains from using LTO can be realised by compiling a fraction of the inputs, then the build time will be less affected.

Reduce input and output sizes

Remove any unused code in your inputs to reduce the input size.

Good luck in speeding up your builds.