AFL源码分析(一)

百家 作者:Chamd5安全团队 2022-11-19 08:24:34


a.alf-gcc.c

1.find_as

这个函数的功能是获取使用的汇编器。首先获取环境变量AFL_PATH,如果这个环境变量存在的话,接着把他和/as拼接,并判断次路径下的as文件是否存在。如果存在,就使得as_path = afl_path = getenv("AFL_PATH")。如果不存在就通过第二种方式尝试获取其路径。首先判断是否存在/,并把最后一个/之后的路径清空,之后为其前面的路径分配空间,并与/afl-as拼接后判断这个文件是否存在,如果存在,则使得as_path = dir = ck_strdup(argv0)。如果这两种方式都不能找到相应路径,即会爆出异常。

2.edit_params

这个函数的主要功能是对编译所用到的参数进行编辑。先为cc_params分配一大块内存空间,然后尝试获取argv[0]的最后一个/的位置,如果存在就把它后面的内容设为name,否则name=argv[0]。之后判断我们预期的编译是不是afl-clang模式,如果是的话就设置clang_mode = 1,设置环境变量CLANG_ENV_VAR为 1,并添加相应的编译参数。如果不是clang模式,则判断name是否等于afl-g++,afl-gcj等选项,并添加相应的参数。接着从argv[1]开始遍历编译选项,会跳过-B -integrated-as -pipe这三个选项,因为edit_params会自动添加这三个编译选项。最后cc_params[cc_par_cnt] = NULL标志结束对选项的编辑。

3.main

int?main(int?argc,?char**?argv)?{

??if?(isatty(2)?&&?!getenv("AFL_QUIET"))?{

????SAYF(cCYA?"afl-cc?"?cBRI?VERSION?cRST?"?by?<lcamtuf@google.com>\n");

??}?else?be_quiet?=?1;

??if?(argc?<?2)?{

????SAYF("\n"
?????????"This?is?a?helper?application?for?afl-fuzz.?It?serves?as?a?drop-in?replacement\n"
?????????"for?gcc?or?clang,?letting?you?recompile?third-party?code?with?the?required\n"
?????????"runtime?instrumentation.?A?common?use?pattern?would?be?one?of?the?following:\n\n"

?????????"??CC=%s/afl-gcc?./configure\n"
?????????"??CXX=%s/afl-g++?./configure\n\n"

?????????"You?can?specify?custom?next-stage?toolchain?via?AFL_CC,?AFL_CXX,?and?AFL_AS.\n"
?????????"Setting?AFL_HARDEN?enables?hardening?optimizations?in?the?compiled?code.\n\n",
?????????BIN_PATH,?BIN_PATH);

????exit(1);

??}

??find_as(argv[0]);

??edit_params(argc,?argv);

??execvp(cc_params[0],?(char**)cc_params);

??FATAL("Oops,?failed?to?execute?'%s'?-?check?your?PATH",?cc_params[0]);

??return?0;

}

总体来说就是先调用find_as(argv[0])获取使用的汇编器,再调用edit_params(argc, argv)对编译选项进行编辑,再通过execvp去进行编译。总的来说alf-gcc是对gcc或clang的一个wrapper。而其中强制加上的-B as_path实际上是给其指定汇编器,也就是我们下面会提到的afl-as。实际的插桩也就是在afl-as里进行插桩的。

b.afl-as

1.edit_params

这个函数的主要功能是编辑汇编器所用到的参数。首先获取环境变量TMPDIRAFL_AS。接着根据是否是clang模式并且afl_as是否为空,去判断是否要重新获取afl_as的值,直到其不为空。接着获取tmp_dir的值,直到其不为空。下面就是给as_params分配一大块空间,并开始对参数进行编辑。首先先设置as_params[0],也即汇编器,一般来说这里都是as。接着从argv[1]遍历到argv[argc-1],看是否存在--64,如果存在--64就使得use_64bit = 1。如果定义了__APPLE__,那么如果存在-arch x86_64就使得use_64bit = 1。并且其会忽略-q或者-Q选项。其余选项参数都会依此加到as_params[as_par_cnt++]中。如果定义了__APPLE__,接下来会判断是否是clang模式,如果是那么添加-c -x assembler的选项。紧接着把argv[argc - 1]赋给?input_file,即最后一个参数的值为input_file的值。下面会判断input_file,是否与--version相等,如果相等,标志just_version=1,可能是代表查询版本。如果不等那么将input_file与?tmp_dir、/var/tmp/、/tmp/进行比较,如果都不相同,则设置pass_thru = 1。并通过格式化字符串设置modified_file = tmp_dir/.afl-getpid()-(u32)time(NULL).s。最后设置as_params[as_par_cnt++] = modified_file,并结束对as_params的编辑。

2.add_instrumentation

这个函数就是进行插桩的关键函数了。首先判断文件是否存在并且可读,不满足就抛出异常。然后打开modified_file里的临时文件,获得其句柄outfd,再通过句柄拿到文件对应的指针。

??if?(input_file)?{?????????????????????????????????????????????????????????//?判断文件是否存在并可读

????inf?=?fopen(input_file,?"r");
????if?(!inf)?PFATAL("Unable?to?read?'%s'",?input_file);

??}?else?inf?=?stdin;???????????????????????????????????????????????????????//?文件不存在,则标准输入作为?input_file

??outfd?=?open(modified_file,?O_WRONLY?|?O_EXCL?|?O_CREAT,?0600);???????????//?打开这个临时文件

??if?(outfd?<?0)?PFATAL("Unable?to?write?to?'%s'",?modified_file);

??outf?=?fdopen(outfd,?"w");

??if?(!outf)?PFATAL("fdopen()?failed");??

接下来就是插桩的关键部分了。

??while?(fgets(line,?MAX_LINE,?inf))?{??????????????????????????????????????//?逐行从inf读取文件到line数组里

????/*?In?some?cases,?we?want?to?defer?writing?the?instrumentation?trampoline
???????until?after?all?the?labels,?macros,?comments,?etc.?If?we're?in?this
???????mode,?and?if?the?line?starts?with?a?tab?followed?by?a?character,?dump
???????the?trampoline?now.?*/


????if?(!pass_thru?&&?!skip_intel?&&?!skip_app?&&?!skip_csect?&&?instr_ok?&&?//?判断是否满足插桩条件
????????instrument_next?&&?line[0]?==?'\t'?&&?isalpha(line[1]))?{

??????fprintf(outf,?use_64bit???trampoline_fmt_64?:?trampoline_fmt_32,???????//?根据use_64bit插入相应的插桩代码
??????????????R(MAP_SIZE));

??????instrument_next?=?0;
??????ins_lines++;

????}

首先是一个大while循环,通过fgets逐行从inf读取文件到line数组里,最多MAX_LINE也即8192字节。并且通过几个标记的值来判断是否要插入相应的代码。并且根据use_64bit的值来确定插入的是trampoline_fmt_64还是trampoline_fmt_32

????fputs(line,?outf);???????????????????????????????????????????????????????//?把?line?写到?modified_file?里

????if?(pass_thru)?continue;

????/*?All?right,?this?is?where?the?actual?fun?begins.?For?one,?we?only?want?to???????//?通过注释可以知道我们只对.text进行插桩
???????instrument?the?.text?section.?So,?let's?keep?track?of?that?in?processed????????//?通过?instr_ok?来标记是否在?.text?段
???????files?-?and?let's?set?instr_ok?accordingly.?*/


????if?(line[0]?==?'\t'?&&?line[1]?==?'.')?{

??????/*?OpenBSD?puts?jump?tables?directly?inline?with?the?code,?which?is
?????????a?bit?annoying.?They?use?a?specific?format?of?p2align?directives
?????????around?them,?so?we?use?that?as?a?signal.?*/


??????if?(!clang_mode?&&?instr_ok?&&?!strncmp(line?+?2,?"p2align?",?8)?&&
??????????isdigit(line[10])?&&?line[11]?==?'\n')?skip_next_label?=?1;

??????if?(!strncmp(line?+?2,?"text\n",?5)?||????????????????????????????????//?如果?line?的值为?\t.text\n
??????????!strncmp(line?+?2,?"section\t.text",?13)?||???????????????????????//?或?\t.section\t.text
??????????!strncmp(line?+?2,?"section\t__TEXT,__text",?21)?||???????????????//?或?\t.section\t__TEXT,__text
??????????!strncmp(line?+?2,?"section?__TEXT,__text",?21))?{????????????????//?或?\t.section?__TEXT,__text
????????instr_ok?=?1;???????????????????????????????????????????????????????//?设置?instr_ok?=?1,并跳转到开头读取下一行内容
????????continue;?
??????}

??????if?(!strncmp(line?+?2,?"section\t",?8)?||?????????????????????????????//?如果?line?的值为?\t.section\t
??????????!strncmp(line?+?2,?"section?",?8)?||??????????????????????????????//?或?\t.section
??????????!strncmp(line?+?2,?"bss\n",?4)?||?????????????????????????????????//?或?\tbss\n
??????????!strncmp(line?+?2,?"data\n",?5))?{????????????????????????????????//?或?\tdata\n
????????instr_ok?=?0;???????????????????????????????????????????????????????//?设置?instr_ok?=?0,并跳转到开头读取下一行内容
????????continue;
??????}

????}

我们会把line里的值写道outf(即modified_file)里。根据官方给的注释可以知道我们只期望对text段进行插桩,并且通过设置instr_ok来标记是否是text段。如果line+2匹配到\t.text\n、\t.section\t.text等就设置instr_ok=1,如果line+2匹配到\t.section\t、\t.section等就设置instr_ok=0。并跳过下面的代码,直接跳到循环的开头读取下一行的内容。

????????????????????????????????????????????????????????????????????????????//?接下来设置一些其他的标志
????/*?Detect?off-flavor?assembly?(rare,?happens?in?gdb).?When?this?is
???????encountered,?we?set?skip_csect?until?the?opposite?directive?is
???????seen,?and?we?do?not?instrument.?*/


????if?(strstr(line,?".code"))?{????????????????????????????????????????????//?判断?off-flavor

??????if?(strstr(line,?".code32"))?skip_csect?=?use_64bit;
??????if?(strstr(line,?".code64"))?skip_csect?=?!use_64bit;

????}

????/*?Detect?syntax?changes,?as?could?happen?with?hand-written?assembly.
???????Skip?Intel?blocks,?resume?instrumentation?when?back?to?AT&T.?*/


????if?(strstr(line,?".intel_syntax"))?skip_intel?=?1;??????????????????????//?跳过?Intel汇编的插桩
????if?(strstr(line,?".att_syntax"))?skip_intel?=?0;

????/*?Detect?and?skip?ad-hoc?__asm__?blocks,?likewise?skipping?them.?*/

????if?(line[0]?==?'#'?||?line[1]?==?'#')?{?????????????????????????????????//?跳过?ad-hoc?__asm__(内联汇编)?的插桩

??????if?(strstr(line,?"#APP"))?skip_app?=?1;
??????if?(strstr(line,?"#NO_APP"))?skip_app?=?0;

????}

在往下就是设置一些其他的标志来判断是否跳过插桩。主要是跳过与设置架构不同的架构的汇编,跳过Intel汇编,跳过内联汇编的插桩。

????/*?If?we're?in?the?right?mood?for?instrumenting,?check?for?function
???????names?or?conditional?labels.?This?is?a?bit?messy,?but?in?essence,
???????we?want?to?catch:

?????????^main:??????-?function?entry?point?(always?instrumented)
?????????^.L0:???????-?GCC?branch?label
?????????^.LBB0_0:???-?clang?branch?label?(but?only?in?clang?mode)
?????????^\tjnz?foo??-?conditional?branches

???????...but?not:

?????????^#?BB#0:????-?clang?comments
?????????^?#?BB#0:???-?ditto
?????????^.Ltmp0:????-?clang?non-branch?labels
?????????^.LC0???????-?GCC?non-branch?labels
?????????^.LBB0_0:???-?ditto?(when?in?GCC?mode)
?????????^\tjmp?foo??-?non-conditional?jumps

???????Additionally,?clang?and?GCC?on?MacOS?X?follow?a?different?convention
???????with?no?leading?dots?on?labels,?hence?the?weird?maze?of?#ifdefs
???????later?on.

?????*/


????if?(skip_intel?||?skip_app?||?skip_csect?||?!instr_ok?||
????????line[0]?==?'#'?||?line[0]?==?'?')?continue;

????/*?Conditional?branch?instruction?(jnz,?etc).?We?append?the?instrumentation
???????right?after?the?branch?(to?instrument?the?not-taken?path)?and?at?the
???????branch?destination?label?(handled?later?on).?*/


????if?(line[0]?==?'\t')?{

??????if?(line[1]?==?'j'?&&?line[2]?!=?'m'?&&?R(100)?<?inst_ratio)?{

????????fprintf(outf,?use_64bit???trampoline_fmt_64?:?trampoline_fmt_32,??????????//?通过?use_64bit,判断写入trampoline_fmt_64还是trampoline_fmt_32
????????????????R(MAP_SIZE));

????????ins_lines++;

??????}

??????continue;

????}

????/*?Label?of?some?sort.?This?may?be?a?branch?destination,?but?we?need?to
???????tread?carefully?and?account?for?several?different?formatting
???????conventions.?*/


#ifdef?__APPLE__

????/*?Apple:?L<whatever><digit>:?*/

????if?((colon_pos?=?strstr(line,?":")))?{

??????if?(line[0]?==?'L'?&&?isdigit(*(colon_pos?-?1)))?{

#else

????/*?Everybody?else:?.L<whatever>:?*/

????if?(strstr(line,?":"))?{?????????????????????????????????????????????????????//?检查?line?里是否有?:

??????if?(line[0]?==?'.')?{??????????????????????????????????????????????????????//?判断?line?是否以?.?开始

#endif?/*?__APPLE__?*/

????????/*?.L0:?or?LBB0_0:?style?jump?destination?*/

#ifdef?__APPLE__

????????/*?Apple:?L<num>?/?LBB<num>?*/

????????if?((isdigit(line[1])?||?(clang_mode?&&?!strncmp(line,?"LBB",?3)))
????????????&&?R(100)?<?inst_ratio)?{

#else

????????/*?Apple:?.L<num>?/?.LBB<num>?*/

????????if?((isdigit(line[2])?||?(clang_mode?&&?!strncmp(line?+?1,?"LBB",?3)))???//?如果?line[2]?是数字,或者在?clang?模式下,line?=?.LBB
????????????&&?R(100)?<?inst_ratio)?{

#endif?/*?__APPLE__?*/

??????????/*?An?optimization?is?possible?here?by?adding?the?code?only?if?the
?????????????label?is?mentioned?in?the?code?in?contexts?other?than?call?/?jmp.
?????????????That?said,?this?complicates?the?code?by?requiring?two-pass
?????????????processing?(messy?with?stdin),?and?results?in?a?speed?gain
?????????????typically?under?10%,?because?compilers?are?generally?pretty?good
?????????????about?not?generating?spurious?intra-function?jumps.

?????????????We?use?deferred?output?chiefly?to?avoid?disrupting
?????????????.Lfunc_begin0-style?exception?handling?calculations?(a?problem?on
?????????????MacOS?X).?*/


??????????if?(!skip_next_label)?instrument_next?=?1;?else?skip_next_label?=?0;?//?如果?skip_next_label?==?0

????????}

??????}?else?{?????????????????????????????????????????????????????????????????//?否则就是函数(function),给?function?直接设置?instrument_next?=?1

????????/*?Function?label?(always?instrumented,?deferred?mode).?*/

????????instrument_next?=?1;
????
??????}

????}

??}

接下来是对其他的标志进行设置,可以从注释中看出我们想对main、.L0、.LBB0_0(clang mode)、\tjnz foo或者function等地方设置instrument_next = 1。其他部分看我对源码加的注释。

循环结束后,接下来如果ins_lines不为空,那么通过use_64bit,判断向 outf 里写入main_payload_64还是main_payload_32。并且关闭两个文件。

3.main

主函数就比较简单了。首先获取环境变量AFL_INST_RATIO,并检测其是否合法(在0-100之间)。通过当前时间和进程号来获取并设置srandom的随机种子。获取环境变量AS_LOOP_ENV_VAR,如果存在就抛出异常。

调用edit_params设置相关参数。获取环境变量AFL_USE_ASANAFL_USE_MSAN,如果有一个存在就设置?sanitizer = 1,inst_ratio /= 3,这是因为在进行ASAN的编译时,AFL无法识别出ASAN特定的分支,导致插入很多无意义的桩代码,所以直接暴力地将插桩概率/3。最后fork出一个子进程,执行?execvp(as_params[0], (char**)as_params)

有注释的源码也放在了我的girhub项目里:https://github.com/fxc233/my-afl-interpret

end


招新小广告

ChaMd5?Venom?招收大佬入圈

新成立组IOT+工控+样本分析?长期招新

欢迎联系admin@chamd5.org


关注公众号:拾黑(shiheibook)了解更多

[广告]赞助链接:

四季很好,只要有你,文娱排行榜:https://www.yaopaiming.com/
让资讯触达的更精准有趣:https://www.0xu.cn/

公众号 关注网络尖刀微信公众号
随时掌握互联网精彩
赞助链接