Linux

Let us get the objective of our mission clear before we spring into action. I plan to reiterate a little bit of rudimentary information on the operations of compiler and linker and their mutual interaction. Following that, we will ask ourselves a couple of questions to practice thinking out-of-the-box and present the obvious answers. Since, we are ‘Truly Tech’, we will perform two quick experiments to justify our hypothesis; surprisingly fail to prove it; take a closer look, i.e. ‘inspect’ what is going on under the hood; redo the experiments with a subtle change and arrive at the expected result. The whole exercise is to show that the test environment of the trivial-most experiment can be quite tricky to set-up.

The compiler (e.g. gcc) compiles a source code (e.g. C file) to an object file (e.g. ELF REL (Relocatable)). Any function/method that is ‘not’ internal or not supplied by the source code is left ‘unresolved’ by the compiler. It’s the linker’s (e.g. ld) responsibility to turn that object file into a fully-functional binary executable (e.g. ELF EXEC (Executable)) resolving all the references to external method calls by linking to either static (e.g. .a archive) or dynamic libraries (.so). Those libraries are either system provided (GNU libc or glibc) or third-party ones (uClibc, libELF). As a practical example, when ld links an object file having a reference to  printf()  method, most likely it will dynamically link to glibc (default linking mode for ld is dynamic) unless a static linking (-static) is requested. When the operating system begins executing a program, the application loader transparently loads the dynamic libraries the application requires and maps those into application’s address space. All the methods a particular library offers/exports are thus made available to the program at run-time. How does the application loader know where to load a library from? It looks in a set of standard locations following a predetermined search order.

This is the platform I will be using throughout.

Question# 1: Let’s say, we have access to the source code of a C program that makes use of pow() method from libm. Is it possible to supply an alternate implementation of the same using a shared library so that our version of pow()overrides the system-provided one? In other words, if we write our own pow(), pack it in a shared library and furnish it to the compiler/linker from the command line during compilation/linking, how are these tools going to behave? Too loosely speaking, our task is to perform a code-injection while we have access to the source code.

As we can see that there are a couple of implementations of the same method (linker treats method names as ‘symbols’) to choose from; of course, the linker will either bring it to the notice of the user or take a decision itself. Can there be any third possibility? No, apparently. Either play ‘dumb and ask’ or be ‘smart and silent’ by falling-back to some default behavior. Shortly it will turn out that our favorite linker ld quietly swallows the fact. Let’s get our hands dirty with experiment# 1.

Here’s the driver program that makes pow() call.

Our implementation of a lazy pow(). It’s declaration mimics the ‘real’ one. But, no matter whatever the arguments are, it always return a b-o-r-i-n-g 10.000000.

Let’s compile (1) the main program, the shared library (2, 3), dynamically link (4) those together and execute (5) the program. To be on the safer side, we will use -O0 to disable all gcc optimizations and -Wall to display all compiler warnings, if any.

What did just take place? ld is silent about duplicate implementations of pow()  method, one appearing in libm, the system library and the other one being present in libpow, the shared library we created. Moreover, ld is ignoring our version of the method while the system-provided one is taking precedence over. Had it been the other way round, the result would be 10.000000 instead of 1024.000000 (= 2^10).

Question# 2: Let’s say, we don’t have access to the source code of a C program that makes use of pow() method from libm linked dynamically. The only artifact in our hand is the compiled executable. Is it possible to supply an alternate implementation of the same using a shared library so that our version of pow()overrides the system-provided one? In other words, if we write our own pow(), pack it in a shared library and ‘somehow’ preload the library before even the execution begins, how is Linux application loader going to behave? Loosely speaking, our task is to perform a code-injection while we don’t have access to the source code.

Linux provides us with a nifty hack, the LD_PRELOAD environment variable, to preload a dynamic library (.so) before even the application loader attempts to resolve the dynamic references . It serves as a handy ‘backdoor’ to intercept dynamically linked method calls. Let’s hijack the pow call.

What did take place now? We observe that LD_PRELOAD does not work with the  pow() call. How come? Our fresh knowledge about LD_PRELOAD says exactly the opposite. Is there anything special about  libm? If the trick works for srand() as shown example, what’s so special about pow()? We are confused now! To probe into the issue, the first step we take is to dissect the object file.

Holy shit! Where is the pow symbol? Wasn’t it supposed to be there? Of course, it was. The type marker “U” beside printf suggests that the symbol is undefined. Quite reasonably, it seems that the compiler has resolved the reference to the pow() method statically from the math library. But…wait. Resolving reference isn’t really the responsibility of the compiler, but the linker. At least, we can expect the pow symbol to be retained as an unresolved one in the object code, if not in the final executable. gcc never fails to surprise us.

To be sure, let’s peek into the relocation section of the object file to find any entry corresponding to pow call. As expected, such a call is missing.

As the last resort, we will disassemble the executable itself to peek into it. The most common tool for that is objdump. In fact,  objdump -M intel -d pow will generate the disassembly of all the executable sections in the binary in Intel assembly format. However, it will include what we need and a l…o…t more. We are interested in the disassembly of main function only. Here’s a nifty shortcut for you.

Can you see something suspicious at  0x0000000000400535 <+8>? It may not be immediately apparent unless you are familiar with IEEE 754 floating point representation, the template to represent a single or double precision floating point number inside computer’s memory. Even if you don’t know what it is, an online converter should tell you that  0x4090000000000000 in IEEE 754 format is equivalent to the floating point number  1024.000000. The instruction  movabs rax,0x4090000000000000 moves the constant to the register rax.

We have been lied so far. A quick googling will bring out numerous post suggesting the use of gcc -O0 to stop all compiler optimizations. GNAT user guide confuses us even more.

‘-O0’

          No optimization (the default); generates unoptimized code but has the fastest compilation time.

          Note that many other compilers do fairly extensive optimization even if  ‘no optimization’ is specified. With gcc, it is very unusual to use -O0 for production if execution time is of any concern, since -O0 really does mean no optimization at all. This difference between gcc and other compilers should be kept in mind when doing performance comparisons.

Why on earth does it comment on gcc’s behavior? gcc, in its present form and shape, falls exactly in the category of “many other compilers” which does extensive optimizations even in the absence of any optimization flags or at gcc -O0. The clue to the discrepancy above hides in these couple of benign statements from gcc doc’s.

Most optimizations are only enabled if an -O level is set on the command line. Otherwise they are disabled, even if individual optimization flags are specified.

Depending on the target and how GCC was configured, a slightly different set of optimizations may be enabled at each -O level than those listed here. You can invoke GCC with ‘-Q –help=optimizers’ to find out the exact set of optimizations that are enabled at each level. See Overall Options, for examples.

Look carefully. The salient point is, “Most optimizations are only enabled…”, most but not all. To rephrase, there are optimizations those are enabled even at gcc -O0. To see what all are the optimization enabled at gcc -O0, we run the following:

Voila! A plethora of optimizers are enabled by default. Let’s try disabling all of them.

For the sake of brevity, though I am not explicitly showing here, take my word that the upshot of repeating of the entire procedure outlined above with the optimizers turned off remains the same. gcc inevitably performs constant folding,  no matter what your optimization level/options are. There seems no way to alter this behavior.

Constant folding requires compiler to know operands apriori. What if we change one of the constant arguments to pow a variable to be supplied by the user?

Recompile the driver program as well as the library.

Unlike the earlier run, now the linker really searches for libpow.so during linking. We are happy to see that the output is as expected. Though both -lpow and -lm switches are passed to the linker, the linker is prioritizing the user supplied library over the system one. Is it so? Are we sure? Let’s swap the order of the libraries and see what happens.

Math library takes over here. To conclude, the behavior of ld is to scan the libraries supplied on the command line from left to right with decreasing priorities.

To check whether our LD_PRELOAD solution works:

Lastly, we may not be able to prevent gcc from folding constants, but it provides -fno-builtin- switch as a means to disable built-in function(s) during compilation. We can exploit this feature to make the original versions of our program work without introducing an user variable.

In either of the case above, both nm and readelf outputs corroborates our expection, too.