Jordan Savant # Software Engineer

C++ Compiling and Linking

The Process

$ g++ -o prog1 prog1.cpp
  1. Preprocessor

    • Copies contents from included header files into the source code file being compiled
    • Replaces symbolic constants using #define with their values
    • Use -E option to stop after preprocessing: g++ -E prog1.cpp -o prog.ii
  2. Compiled into Assembly

    • Expanded source code from preprocessor is compiled into the assembly language for the platform
    • Use -S option to stop after compiling: g++ -S prog1.cpp will save to prog1.s
  3. Assembler Code into Object Code

    • Assembly language source code is compiled into the Object Code (or Machine Code) such as binary
    • Use -c option to stop after assembly: g++ -c prog1.cpp will save to prog1.o
  4. Object Code Linked

    • Object code generated is linked with other Object Code files for any library functions
    • Executable is produced

Basic Example

Here is a basic C++ process. Does not do much clearly.

File: prog.cpp

int sum(int a, int b) {
    return a + b;
}

int main(int argc, char* argv[]) {
    int c = sum(1, 2);
    return 0;
}

Preprocess only:

$ g++ -E prog.cpp -o prog.ii
# 1 "prog.cpp"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "<command-line>" 2
# 1 "prog.cpp"
int sum(int a, int b) {
    return a + b;
}

int main(int argc, char* argv[]) {
    int c = sum(1, 2);
    return 0;
}

To Assembly:

$ g++ -S prog.cpp -o prog.s
    .file   "prog.cpp"
    .text
    .globl  _Z3sumii
    .type   _Z3sumii, @function
_Z3sumii:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    %edi, -4(%rbp)
    movl    %esi, -8(%rbp)
    movl    -4(%rbp), %edx
    movl    -8(%rbp), %eax
    addl    %edx, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   _Z3sumii, .-_Z3sumii
    .globl  main
    .type   main, @function
main:
.LFB1:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $32, %rsp
    movl    %edi, -20(%rbp)
    movq    %rsi, -32(%rbp)
    movl    $2, %esi
    movl    $1, %edi
    call    _Z3sumii
    movl    %eax, -4(%rbp)
    movl    $0, %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609"
    .section    .note.GNU-stack,"",@progbits

To Object Code:

Viewed with nm which lists the symbols from object files objfile.

$ g++ -c prog.cpp -o prog.o
$ nm prog.o
0000000000000014 T main
0000000000000000 T _Z3sumii

C++ "mangles" the function symbol names by renaming them to match their format. This is how it accomplishes overloading of a function (having a function with the same name twice with different paramters).

_Z3sumii is mangled and the "ii" portion corresponds to the int, int parameters defined.

We can see the de-mangled versions with an addition to nm.

$ nm -C prog.o
0000000000000014 T main
0000000000000000 T sum(int, int)

To Link and Run:

p Linked and run (note there is nothing really linked nor output in this simple example):

$ g++ prog.cpp -o a.out
$ ./a.out # outputs nothing but does not break so the executable is working
$ nm -C a.out
0000000000601030 B __bss_start
0000000000601030 b completed.7594
0000000000601020 D __data_start
0000000000601020 W data_start
0000000000400410 t deregister_tm_clones
0000000000400490 t __do_global_dtors_aux
0000000000600e18 t __do_global_dtors_aux_fini_array_entry
0000000000601028 D __dso_handle
0000000000600e28 d _DYNAMIC
0000000000601030 D _edata
0000000000601038 B _end
0000000000400594 T _fini
00000000004004b0 t frame_dummy
0000000000600e10 t __frame_dummy_init_array_entry
00000000004006f0 r __FRAME_END__
0000000000601000 d _GLOBAL_OFFSET_TABLE_
                 w __gmon_start__
00000000004005a4 r __GNU_EH_FRAME_HDR
0000000000400390 T _init
0000000000600e18 t __init_array_end
0000000000600e10 t __init_array_start
00000000004005a0 R _IO_stdin_used
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
0000000000600e20 d __JCR_END__
0000000000600e20 d __JCR_LIST__
                 w _Jv_RegisterClasses
0000000000400590 T __libc_csu_fini
0000000000400520 T __libc_csu_init
                 U __libc_start_main@@GLIBC_2.2.5
00000000004004ea T main
0000000000400450 t register_tm_clones
00000000004003e0 T _start
0000000000601030 D __TMC_END__
00000000004004d6 T sum(int, int)