导读
“工欲善其事,必先利其器。”
本文会介绍 ELF 文件以及阅读 ELF 文件的工具,熟悉 ELF 文件对探索动态链接是很有好处的。
环境
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 FROM debian:busterLABEL maintainer="837940593@qq.com" ENV DEBIAN_FRONTEND noninteractiveRUN apt-get update RUN apt-get install -y build-essential bear make gcc g++ gdb RUN mkdir /root/glibc WORKDIR /root/glibc RUN apt-get install -y wget RUN apt-get install -y gawk bison texinfo gettext RUN wget http://ftp.gnu.org/gnu/libc/glibc-2.28.tar.gz RUN tar -xzvf glibc-2.28.tar.gz RUN mkdir build WORKDIR /root/glibc/build RUN ../glibc-2.28/configure CFLAGS="-O1 -ggdb -w" --with-tls --enable-add-ons=nptl --prefix="$PWD /install" RUN bear make -j8 RUN make install -j8 RUN apt-get install -y bsdmainutils RUN apt-get install -y python3 python3-pip RUN pip3 install lief CMD /bin/bash
1 2 3 4 5 4.19.76-linuxkit VERSION_ID="10" ID=debian
1 2 3 4 gcc (Debian 8.3.0-6) 8.3.0 ldd (Debian GLIBC 2.28-10) 2.28
一个小例子
1 2 3 4 5 extern void foo () ;int main () { foo (); }
了解 ELF 文件
工具概述
Dump 二进制
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 0000000 00010102464c457f 0000010 0000000 464c457f 00010102 0000010 0000000 457f 464c 0102 0001 0000010 0000000 7f 45 4c 46 02 01 01 00 0000010 0000000 7f 45 4c 46 02 01 01 00 177 E L F 002 001 001 \0 0000010
阅读字符串:
1 2 3 4 5 6 7 350+0 records in 350+0 records out 350 bytes copied, 0.000862803 s, 406 kB/s crtstuff.c deregister_tm_clones
查找字符串:
1 2 3 319 __gmon_start__ 328 _ITM_deregisterTMCloneTable
x 表示用十六进制显示字符串的 offset 。
Dump 汇编代码
1 2 3 4 5 6 7 8 9 10 11 0000000000001020 <.plt>: 1020: ff 35 e2 2f 00 00 pushq 0x2fe2(%rip) 1026: ff 25 e4 2f 00 00 jmpq *0x2fe4(%rip) 102c: 0f 1f 40 00 nopl 0x0(%rax) 00000000000011b0 <__libc_csu_fini>: 11b0: c3 retq 0000000000004028 <__dso_handle>: 4028: 28 40 00 00 00 00 00 00 (@......
Dump 元信息
解析特定 sections
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Dynamic section at offset 0x2dd8 contains 28 entries: Tag Type Name/Value 0x0000000000000001 (NEEDED) Shared library: [libfoo.so] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] Symbol table '.dynsym' contains 7 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND -- Symbol table '.symtab' contains 69 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND Symbol table '.dynsym' contains 7 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND Relocation section '.rela.dyn' at offset 0x4c8 contains 8 entries: Offset Info Type Sym. Value Sym. Name + Addend 000000003dc8 000000000008 R_X86_64_RELATIVE 1130 -- Relocation section '.rela.plt' at offset 0x588 contains 1 entry: Offset Info Type Sym. Value Sym. Name + Addend 000000004018 000400000007 R_X86_64_JUMP_SLO 0000000000000000 _Z3foov + 0 String dump of section '.strtab' : [ 1] crtstuff.c [ c] deregister_tm_clones [ 21] __do_global_dtors_aux
ELF 文件概述
File Header 和 Program Header 在 ELF 文件的开头,Section Header 在 ELF 文件的结尾。
接下来我们会用 readelf 直接查看元数据,也会用 od 以二进制方式看看每一个 Header 。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Shared object file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x1050 Start of program headers: 64 (bytes into file) Start of section headers: 14680 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 11 Size of section headers: 64 (bytes) Number of section headers: 30 Section header string table index: 29
File Header 各个字段的含义可以参考维基百科 。
The ELF header is 52 or 64 bytes long for 32-bit and 64-bit binaries respectively.
0000000
7f 45 4c 46
ELF (magic number).
02
1 is 32-bit format, 2 is 64-bit format.
01
1 is big endianness, 2 is litte endianness.
01
Set to 1 for the original and current version of ELF.
00
ABI version, it is often set to 0.
00
Further specifies the ABI version.
00 00 00 00 00 00 00
Padding, should be filled with zeros.
0000020
03 00
Identifies object file type, 0x3 is ET_DYN.
3e 00
Specifies instruction set architecture, 0x3e is amd64.
01 00 00 00
Set to 1 for the original version of ELF.
50 10 00 00 00 00 00 00
Entry point address.
0000040
40 00 00 00 00 00 00 00
The start of the program header table. 0x40 = 64.
58 39 00 00 00 00 00 00
The start of the section header table.
0000060
00 00 00 00
Interpretation depends on the target architecture.
40 00
Size of file header.
38 00
Size of a program header table entry.
0b 00
Number of entries in the program header table.
40 00
Size of a section header table entry.
1e 00
Number of entries in the section header table.
1d 00
Index of the section header table entry that contains the section names.
0000100
File Header 帮助链接器:
确认是否可以装载文件,包括系统是 32 位还是 64 位、大小端、ABI 版本等;
决定如何装载文件,包括 Program Header 和 Section Header 的位置及大小、如何寻找 section 名称、entry point address 等。
1 2 3 4 5 6 7 8 9 10 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040 0x0000000000000268 0x0000000000000268 R 0x8 INTERP 0x00000000000002a8 0x00000000000002a8 0x00000000000002a8 0x0000000000000033 0x0000000000000033 R 0x1 [Requesting program interpreter: /root/glibc/build/install/lib/ld-linux-x86-64.so.2] ...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 typedef uint32_t Elf64_Word;typedef uint64_t Elf64_Xword;typedef struct { Elf64_Word p_type; Elf64_Word p_flags; Elf64_Off p_offset; Elf64_Addr p_vaddr; Elf64_Addr p_paddr; Elf64_Xword p_filesz; Elf64_Xword p_memsz; Elf64_Xword p_align; } Elf64_Phdr;
0000170
00000003
PT_INTERP
00000004
Segment flags.
00000000000002a8
Segment file offset.
0000210
00000000000002a8
Segment virtual address.
00000000000002a8
Segment physical address.
0000230
0000000000000033
Segment size in file.
0000000000000033
Segment size in memory.
0000250
0000000000000001
Segment alignment.
比较让人迷惑的字段是 Segment physical address ,根据 What is a section and why do we need it 和写一个工具来了解ELF文件(二) 两篇文章,Segment physical address 在现代操作系统中已经没有用处了,GCC 一般将其置为 Segment virtual address 。
1 2 3 4 5 6 7 8 9 10 0001250 2f 72 6f 6f 74 2f 67 6c 69 62 63 2f 62 75 69 6c / r o o t / g l i b c / b u i l 0001270 64 2f 69 6e 73 74 61 6c 6c 2f 6c 69 62 2f 6c 64 d / i n s t a l l / l i b / l d 0001310 2d 6c 69 6e 75 78 2d 78 38 36 2d 36 34 2e 73 6f - l i n u x - x 8 6 - 6 4 . s o 0001330 2e 32 00 . 2 \0 0001333
根据 Program Header 的指导,从 0x2a8 开始连续读 0x33 个字节,就是 interpreter 在文件系统中的路径。
Program Header 最重要的作用是指导链接器如何装载 ELF 文件,要注意:由于对齐或者前面的某个 Segment 在文件中的大小和在内存中的大小不一致,Segment 在文件中的起始地址未必等于在内存中的起始地址,比如:
1 2 3 4 5 6 Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align ... LOAD 0x0000000000002dc8 0x0000000000003dc8 0x0000000000003dc8 0x0000000000000268 0x0000000000000270 RW 0x1000
LOAD Segment 在文件中的起始地址是 0x2dc8 ,在内存中的起始地址是 0x3dc8 ,两者并不相等。
1 2 3 4 5 6 7 8 9 10 11 Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [13] .plt.got PROGBITS 0000000000001040 00001040 0000000000000008 0000000000000008 AX 0 0 8 ... [28] .strtab STRTAB 0000000000000000 00003650 00000000000001fa 0000000000000000 0 0 1 ...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 typedef uint32_t Elf64_Word;typedef uint64_t Elf64_Xword;typedef struct { Elf64_Word sh_name; Elf64_Word sh_type; Elf64_Xword sh_flags; Elf64_Addr sh_addr; Elf64_Off sh_offset; Elf64_Xword sh_size; Elf64_Word sh_link; Elf64_Word sh_info; Elf64_Xword sh_addralign; Elf64_Xword sh_entsize; } Elf64_Shdr;
0036230
00000094
Section name (string table index).
00000001
Section type, SHT_PROGBITS.
0000000000000006
Section flags, SHF_ALLOC | SHF_EXECINSTR.
0036250
0000000000001040
Section virtual address.
0000000000001040
Section file offset.
0036270
0000000000000008
Section size in bytes.
00000000
Link to another section.
00000000
Additional section information.
0036310
0000000000000008
Section alignment.
0000000000000008
Entry size if section holds table.
根据 man elf 的描述,sh_link / sh_info 的含义都取决于 section 。
.plt & .got.plt / .plt.got & .got
Debug 技巧:先用 info proc mappings 获取 start address ,再用 watch *(unsigned long long*)(<start_addr> + <addr>) 就能看到改变特定地址的栈了。
.plt.got & .got 同 .plt & .got.plt 一样,都是一组用于重定位的 sections ;
不同之处是:
.plt.got & .got 没有 lazy binding ,由链接器直接触发重定位;
.plt & .got.plt 有 lazy binding ,在第一次调用函数时触发重定位。
.plt & .got.plt
1 2 3 4 5 6 7 8 9 10 0000000000001020 <.plt>: 1020: ff 35 e2 2f 00 00 pushq 0x2fe2(%rip) 1026: ff 25 e4 2f 00 00 jmpq *0x2fe4(%rip) 102c: 0f 1f 40 00 nopl 0x0(%rax) 0000000000001030 <_Z3foov@plt>: 1030: ff 25 e2 2f 00 00 jmpq *0x2fe2(%rip) 1036: 68 00 00 00 00 pushq $0x0 103b: e9 e0 ff ff ff jmpq 1020 <.plt>
1 2 3 4 5 [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [23] .got.plt PROGBITS 0000000000004000 00003000 0000000000000020 0000000000000008 WA 0 0 8
1 2 3 0030030 0000000000001036 0030040
1 2 3 4 Relocation section '.rela.plt' at offset 0x590 contains 1 entry: Offset Info Type Sym. Value Sym. Name + Addend 000000004018 000400000007 R_X86_64_JUMP_SLO 0000000000000000 _Z3foov + 0
.got.plt 表项(虚存地址是 0x1036 + 0x2fe2)会发生两次改变:
从 0x1036 变成 start address + 0x1036 :
由运行时链接器在 .rela.plt (R_X86_64_JUMP_SLOT) 表项的指导下完成,调用栈是 dl_main -> _dl_relocate_object -> elf_dynamic_do_Rela ;
是同文件重定位,仅仅加上了 start address ,不需要查找符号,执行速度快;
从 start address + 0x1036 变成 foo 函数的首地址 :
由用户代码在函数 foo 第一次被调用时触发,调用栈是 main -> _dl_runtime_resolve_xsavec -> _dl_fixup ;
是跨文件重定位,需要查找符号,执行速度慢。
.plt.got & .got
1 2 3 4 0000000000001040 <__cxa_finalize@plt>: 1040: ff 25 b2 2f 00 00 jmpq *0x2fb2(%rip) 1046: 66 90 xchg %ax,%ax
1 2 3 4 5 [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [22] .got PROGBITS 0000000000003fd8 00002fd8 0000000000000028 0000000000000008 WA 0 0 8
1 2 3 0027770 0000000000000000 0030000
1 2 3 Offset Info Type Sym. Value Sym. Name + Addend 000000003ff8 000600000006 R_X86_64_GLOB_DAT 0000000000000000 __cxa_finalize@GLIBC_2.2.5 + 0
.got 表项(虚存地址是 0x1046 + 0x2fb2)只会发生一次改变,从 0x0 变成 __cxa_finalize 函数的首地址:
由运行时链接器在 .rela.dyn (R_X86_64_GLOB_DAT) 表项的指导下完成,调用栈是 dl_main -> _dl_relocate_object -> elf_dynamic_do_Rela -> elf_machine_rela ;
是跨文件重定位,需要查找符号,执行速度慢。
.rela.dyn & .rela.plt
以 foo.cpp 为例:
1 2 3 4 5 6 7 #include <iostream> namespace {int var = 1 ;void bar () { std::cout << "bar" << std::endl; } }void foo () {}
.rela.dyn / .rela.plt
根据 Linux Foundation Referenced Specifications: Additional Special Sections 的说法:
.rela.plt section 负责配合 .plt section 完成跨文件重定位 ;
.rela.dyn 负责其它类型的重定位。
.rela.dyn & .rela.plt 与 .symtab 的关系
.symtab 中的 undefined symbols 都能在 relocation sections 中找到:
1 2 3 4 5 6 7 8 9 10 11 12 13 __cxa_atexit@@GLIBC_2.2.5 __cxa_finalize@@GLIBC_2.2.5 __gmon_start__ _ITM_deregisterTMCloneTable _ITM_registerTMCloneTable _Jv_RegisterClasses _ZNSolsEPFRSoS_E@@GLIBCXX_3.4 _ZNSt8ios_base4InitC1Ev@@GLIBCXX_3.4 _ZNSt8ios_base4InitD1Ev@@GLIBCXX_3.4 _ZSt4cout@@GLIBCXX_3.4 _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_@@GLIBCXX_3.4 _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@@GLIBCXX_3.4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 __cxa_atexit@GLIBC_2.2.5 __cxa_finalize@GLIBC_2.2.5 __gmon_start__ _ITM_deregisterTMCloneTable _ITM_registerTMCloneTable _Jv_RegisterClasses offset _ZNSolsEPFRSoS_E@GLIBCXX_3.4 _ZNSt8ios_base4InitC1Ev@GLIBCXX_3.4 _ZNSt8ios_base4InitD1Ev@GLIBCXX_3.4 _ZSt4cout@GLIBCXX_3.4 _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_@GLIBCXX_3.4 _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@GLIBCXX_3.4
参考资料
ELF (except .plt and .got.plt and etc.):
.plt and .got.plt and etc.:
.rela.dyn and .rela.plt: