Difference between revisions of "LDC LLVM profiling instrumentation"
JohanEngelen (talk | contribs) (Fix header hierarchy) |
m (→How to use?) |
||
(38 intermediate revisions by one other user not shown) | |||
Line 2: | Line 2: | ||
This page functions as a place to collect information and to document how the implementation in LDC is done. | This page functions as a place to collect information and to document how the implementation in LDC is done. | ||
− | = LLVM InstrProf = | + | == Profile-Guided Optimization (PGO) status in LDC == |
+ | |||
+ | PGO is available in LDC from version 1.1.0. | ||
+ | PGO requires LLVM 3.7 or newer. | ||
+ | |||
+ | See https://johanengelen.github.io/ldc/2016/07/15/Profile-Guided-Optimization-with-LDC.html | ||
+ | |||
+ | Many thanks to Xinliang David Li for his help with profile-rt and for his bug fixes in LLVM and profile-rt, without him there would be no PGO on Windows and Linux! | ||
+ | |||
+ | |||
+ | === How to use? === | ||
+ | To use PGO, use the following command sequence ([http://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation same as Clang]): | ||
+ | # Compile with instrumentation turned on:<!-- | ||
+ | --><br /><code> ldc2 -fprofile-instr-generate test.d -of=test-instr</code> | ||
+ | # Run executable: <br /><code> ./test-instr </code> <br /> This generates a "default.profraw" file. | ||
+ | # Run the ldc-profdata tool: <br /><code> ldc-profdata merge default.profraw -output test.profdata </code> | ||
+ | # Compile again, now using profile data: <br /><code>ldc2 -fprofile-instr-use=test.profdata test.d -of=test </code> | ||
+ | You should test whether the executable is faster than without PGO. (Remember that <code>test-instr</code> contains instrumentation code, and will run significantly slower than standard compiled version!) | ||
+ | |||
+ | === Implementation status === | ||
+ | * PGO is available with LLVM 3.7 or newer. | ||
+ | * Instrumentation data is output with -fprofile-instr-generate=<optional filename>, and can be used with -fprofile-instr-use=<filename> | ||
+ | * D statements with only partial instrumentation and branch weight metadata (i.e. the TODO list): | ||
+ | ** try-finally (and thus scope neither) | ||
+ | * pragma(LDC_profile_instr, <bool>) gives fine-grained control over whether profile instrumentation code is generated. This pragma behaves like pragma(inline). | ||
+ | |||
+ | === Ideas for improvements === | ||
+ | * use profile counters to also emit coverage data for llvm-cov (look at how clang emits coverage maps) | ||
+ | * commandline option to turn instrumentation off per default (opt-in for functions, instead of current opt-out) | ||
+ | * CTFE availability of execution counts | ||
+ | * expose IR level instrumentation http://reviews.llvm.org/D15829 | ||
+ | |||
+ | == LLVM InstrProf == | ||
[describe what and how LLVM supports profile instrumentation] | [describe what and how LLVM supports profile instrumentation] | ||
− | + | slides about PGO in LLVM (from 2013): http://llvm.org/devmtg/2013-11/slides/Carruth-PGO.pdf | |
− | + | Branch weights and function entry counts are added to IR (in 2nd compile pass) as metadata: http://llvm.org/docs/doxygen/html/MDBuilder_8h_source.html | |
+ | |||
+ | === LLVM intrinsics === | ||
+ | |||
+ | These intrinsics were introduced with [http://reviews.llvm.org/rL223672 LLVM commit r223672], and Clang was modified to use them in [http://reviews.llvm.org/rL223683 Clang commit r223683]. | ||
LLVM features two intrinsics for instrumentation: | LLVM features two intrinsics for instrumentation: | ||
− | * [http://llvm.org/docs/LangRef.html#llvm-instrprof-increment-intrinsic |llvm.instrprof_increment] | + | * [http://llvm.org/docs/LangRef.html#llvm-instrprof-increment-intrinsic |llvm.instrprof_increment], available from LLVM 3.6 |
* [http://llvm.org/docs/LangRef.html#llvm-instrprof-value-profile-intrinsic |llvm.instrprof_value_profile] | * [http://llvm.org/docs/LangRef.html#llvm-instrprof-value-profile-intrinsic |llvm.instrprof_value_profile] | ||
+ | The "value_profile" intrinsic is very new, and will not be used for now. | ||
+ | "instrprof_increment" is used to increment control flow counters (function entry, if-statement flow, loops, etc.). | ||
+ | |||
+ | Unfortunately, these intrinsics will not show up in the LLVM IR generated by LDC, because they will be lowered to standard LLVM IR by the InstrProf pass which is always executed. | ||
+ | |||
+ | === LLVM runtime calls === | ||
+ | |||
+ | The InstrProf pass adds several external symbols to IR, for example: | ||
+ | @__llvm_profile_runtime = external global i32 ; [#uses = 1] | ||
+ | define linkonce_odr hidden i32 @__llvm_profile_runtime_user() #2 { | ||
+ | %1 = load i32, i32* @__llvm_profile_runtime ; [#uses = 1] | ||
+ | ret i32 %1 | ||
+ | } | ||
+ | define internal void @__llvm_profile_init() unnamed_addr #2 { | ||
+ | call void @__llvm_profile_override_default_filename(i8* getelementptr inbounds ([16 x i8], [16 x i8]* @0, i32 0, i32 0)) | ||
+ | ret void | ||
+ | } | ||
+ | declare void @__llvm_profile_override_default_filename(i8*) | ||
+ | |||
+ | These need to be added to LDCs runtime. They can be ported from LLVM's compiler-rt/lib/profile, or... I chose to copy LLVM's compiler-rt code. | ||
+ | The generated IR by LLVM depends on the target triple. On some platforms, more calls are generated by LLVM so that the runtime can figure out where the profile data (function names, hashes, counters) are located in the binary. On other platforms (Apple for example), it relies on linker magic. | ||
+ | compiler-rt's code also takes care of outputting the raw profile data file after execution. | ||
+ | |||
+ | == Example with Clang 3.7 == | ||
+ | |||
+ | The sequence used to [http://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation build a program with PGO using clang is]: | ||
+ | clang -fprofile-instr-generate pgo.c -o pgo | ||
+ | ./pgo | ||
+ | llvm-profdata merge -output=pgo.profdata default.profraw | ||
+ | clang -fprofile-instr-use=pgo.profdata -S -emit-llvm pgo.c -o pgo0.ll | ||
+ | ./pgo 1 | ||
+ | llvm-profdata merge -output=pgo.profdata default.profraw | ||
+ | clang -fprofile-instr-use=pgo.profdata -S -emit-llvm pgo.c pgo1.ll | ||
+ | |||
+ | Perhaps it'd be nice for LDC if the llvm-profdata tool is not needed. OTOH, when the profiling data is in a standard LLVM format, all of LLVM's tools can be used to interpret the data. | ||
+ | |||
+ | === pgo.c === | ||
+ | |||
+ | int main(int argc, const char *argv[]) { | ||
+ | if (argc) | ||
+ | return 0; | ||
+ | else | ||
+ | return 1; | ||
+ | } | ||
+ | |||
+ | === LLVM IR with clang -fprofile-instr-generate === | ||
+ | |||
+ | Unimportant IR has been stripped. | ||
+ | |||
+ | @__llvm_profile_name_main = private constant [4 x i8] c"main", section "__DATA,__llvm_prf_names", align 1 | ||
+ | @__llvm_profile_counters_main = private global [2 x i64] zeroinitializer, section "__DATA,__llvm_prf_cnts", align 8 | ||
+ | @__llvm_profile_data_main = private constant { i32, i32, i64, i8*, i64* } { i32 4, i32 2, i64 10, i8* getelementptr inbounds ([4 x i8], [4 x i8]* @__llvm_profile_name_main, i32 0, i32 0), i64* getelementptr inbounds ([2 x i64], [2 x i64]* @__llvm_profile_counters_main, i32 0, i32 0) }, section "__DATA,__llvm_prf_data", align 8 | ||
+ | @__llvm_profile_runtime = external global i32 | ||
+ | |||
+ | ; Function Attrs: nounwind ssp uwtable | ||
+ | define i32 @main(i32 %argc, i8** %argv) #0 { | ||
+ | entry: | ||
+ | %retval = alloca i32, align 4 | ||
+ | %argc.addr = alloca i32, align 4 | ||
+ | %argv.addr = alloca i8**, align 8 | ||
+ | store i32 0, i32* %retval | ||
+ | store i32 %argc, i32* %argc.addr, align 4 | ||
+ | store i8** %argv, i8*** %argv.addr, align 8 | ||
+ | %pgocount = load i64, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @__llvm_profile_counters_main, i64 0, i64 0) | ||
+ | %0 = add i64 %pgocount, 1 | ||
+ | store i64 %0, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @__llvm_profile_counters_main, i64 0, i64 0) | ||
+ | %1 = load i32, i32* %argc.addr, align 4 | ||
+ | %cmp = icmp sgt i32 %1, 1 | ||
+ | br i1 %cmp, label %if.then, label %if.else | ||
+ | if.then: ; preds = %entry | ||
+ | %pgocount1 = load i64, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @__llvm_profile_counters_main, i64 0, i64 1) | ||
+ | %2 = add i64 %pgocount1, 1 | ||
+ | store i64 %2, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @__llvm_profile_counters_main, i64 0, i64 1) | ||
+ | store i32 0, i32* %retval | ||
+ | br label %return | ||
+ | if.else: ; preds = %entry | ||
+ | store i32 1, i32* %retval | ||
+ | br label %return | ||
+ | return: ; preds = %if.else, %if.then | ||
+ | %3 = load i32, i32* %retval | ||
+ | ret i32 %3 | ||
+ | } | ||
+ | |||
+ | ; Function Attrs: nounwind | ||
+ | declare void @llvm.instrprof.increment(i8*, i64, i32, i32) #1 | ||
+ | |||
+ | ; Function Attrs: noinline | ||
+ | define linkonce_odr hidden i32 @__llvm_profile_runtime_user() #2 { | ||
+ | %1 = load i32, i32* @__llvm_profile_runtime | ||
+ | ret i32 %1 | ||
+ | } | ||
+ | |||
+ | === pgo0.ll, LLVM IR after clang -fprofile-instr-use === | ||
+ | |||
+ | ; Function Attrs: inlinehint nounwind ssp uwtable | ||
+ | define i32 @main(i32 %argc, i8** %argv) #0 !prof !2 { | ||
+ | entry: | ||
+ | %retval = alloca i32, align 4 | ||
+ | %argc.addr = alloca i32, align 4 | ||
+ | %argv.addr = alloca i8**, align 8 | ||
+ | store i32 0, i32* %retval | ||
+ | store i32 %argc, i32* %argc.addr, align 4 | ||
+ | store i8** %argv, i8*** %argv.addr, align 8 | ||
+ | %0 = load i32, i32* %argc.addr, align 4 | ||
+ | %cmp = icmp sgt i32 %0, 1 | ||
+ | br i1 %cmp, label %if.then, label %if.else, !prof !3 | ||
+ | if.then: ; preds = %entry | ||
+ | store i32 0, i32* %retval | ||
+ | br label %return | ||
+ | if.else: ; preds = %entry | ||
+ | store i32 1, i32* %retval | ||
+ | br label %return | ||
+ | return: ; preds = %if.else, %if.then | ||
+ | %1 = load i32, i32* %retval | ||
+ | ret i32 %1 | ||
+ | } | ||
+ | |||
+ | !2 = !{!"function_entry_count", i64 1} | ||
+ | !3 = !{!"branch_weights", i32 1, i32 2} | ||
+ | |||
+ | |||
+ | === pgo1.ll, LLVM IR after clang -fprofile-instr-use === | ||
+ | The file is identical to pgo0.ll except the last line: | ||
+ | !3 = !{!"branch_weights", i32 2, i32 1} | ||
− | + | [[Category:LDC]] | |
− | [ |
Latest revision as of 13:11, 19 July 2017
This page functions as a place to collect information and to document how the implementation in LDC is done.
Contents
Profile-Guided Optimization (PGO) status in LDC
PGO is available in LDC from version 1.1.0. PGO requires LLVM 3.7 or newer.
See https://johanengelen.github.io/ldc/2016/07/15/Profile-Guided-Optimization-with-LDC.html
Many thanks to Xinliang David Li for his help with profile-rt and for his bug fixes in LLVM and profile-rt, without him there would be no PGO on Windows and Linux!
How to use?
To use PGO, use the following command sequence (same as Clang):
- Compile with instrumentation turned on:
ldc2 -fprofile-instr-generate test.d -of=test-instr
- Run executable:
./test-instr
This generates a "default.profraw" file. - Run the ldc-profdata tool:
ldc-profdata merge default.profraw -output test.profdata
- Compile again, now using profile data:
ldc2 -fprofile-instr-use=test.profdata test.d -of=test
You should test whether the executable is faster than without PGO. (Remember that test-instr
contains instrumentation code, and will run significantly slower than standard compiled version!)
Implementation status
- PGO is available with LLVM 3.7 or newer.
- Instrumentation data is output with -fprofile-instr-generate=<optional filename>, and can be used with -fprofile-instr-use=<filename>
- D statements with only partial instrumentation and branch weight metadata (i.e. the TODO list):
- try-finally (and thus scope neither)
- pragma(LDC_profile_instr, <bool>) gives fine-grained control over whether profile instrumentation code is generated. This pragma behaves like pragma(inline).
Ideas for improvements
- use profile counters to also emit coverage data for llvm-cov (look at how clang emits coverage maps)
- commandline option to turn instrumentation off per default (opt-in for functions, instead of current opt-out)
- CTFE availability of execution counts
- expose IR level instrumentation http://reviews.llvm.org/D15829
LLVM InstrProf
[describe what and how LLVM supports profile instrumentation]
slides about PGO in LLVM (from 2013): http://llvm.org/devmtg/2013-11/slides/Carruth-PGO.pdf
Branch weights and function entry counts are added to IR (in 2nd compile pass) as metadata: http://llvm.org/docs/doxygen/html/MDBuilder_8h_source.html
LLVM intrinsics
These intrinsics were introduced with LLVM commit r223672, and Clang was modified to use them in Clang commit r223683.
LLVM features two intrinsics for instrumentation:
- |llvm.instrprof_increment, available from LLVM 3.6
- |llvm.instrprof_value_profile
The "value_profile" intrinsic is very new, and will not be used for now. "instrprof_increment" is used to increment control flow counters (function entry, if-statement flow, loops, etc.).
Unfortunately, these intrinsics will not show up in the LLVM IR generated by LDC, because they will be lowered to standard LLVM IR by the InstrProf pass which is always executed.
LLVM runtime calls
The InstrProf pass adds several external symbols to IR, for example:
@__llvm_profile_runtime = external global i32 ; [#uses = 1] define linkonce_odr hidden i32 @__llvm_profile_runtime_user() #2 { %1 = load i32, i32* @__llvm_profile_runtime ; [#uses = 1] ret i32 %1 } define internal void @__llvm_profile_init() unnamed_addr #2 { call void @__llvm_profile_override_default_filename(i8* getelementptr inbounds ([16 x i8], [16 x i8]* @0, i32 0, i32 0)) ret void } declare void @__llvm_profile_override_default_filename(i8*)
These need to be added to LDCs runtime. They can be ported from LLVM's compiler-rt/lib/profile, or... I chose to copy LLVM's compiler-rt code. The generated IR by LLVM depends on the target triple. On some platforms, more calls are generated by LLVM so that the runtime can figure out where the profile data (function names, hashes, counters) are located in the binary. On other platforms (Apple for example), it relies on linker magic. compiler-rt's code also takes care of outputting the raw profile data file after execution.
Example with Clang 3.7
The sequence used to build a program with PGO using clang is:
clang -fprofile-instr-generate pgo.c -o pgo ./pgo llvm-profdata merge -output=pgo.profdata default.profraw clang -fprofile-instr-use=pgo.profdata -S -emit-llvm pgo.c -o pgo0.ll ./pgo 1 llvm-profdata merge -output=pgo.profdata default.profraw clang -fprofile-instr-use=pgo.profdata -S -emit-llvm pgo.c pgo1.ll
Perhaps it'd be nice for LDC if the llvm-profdata tool is not needed. OTOH, when the profiling data is in a standard LLVM format, all of LLVM's tools can be used to interpret the data.
pgo.c
int main(int argc, const char *argv[]) { if (argc) return 0; else return 1; }
LLVM IR with clang -fprofile-instr-generate
Unimportant IR has been stripped.
@__llvm_profile_name_main = private constant [4 x i8] c"main", section "__DATA,__llvm_prf_names", align 1 @__llvm_profile_counters_main = private global [2 x i64] zeroinitializer, section "__DATA,__llvm_prf_cnts", align 8 @__llvm_profile_data_main = private constant { i32, i32, i64, i8*, i64* } { i32 4, i32 2, i64 10, i8* getelementptr inbounds ([4 x i8], [4 x i8]* @__llvm_profile_name_main, i32 0, i32 0), i64* getelementptr inbounds ([2 x i64], [2 x i64]* @__llvm_profile_counters_main, i32 0, i32 0) }, section "__DATA,__llvm_prf_data", align 8 @__llvm_profile_runtime = external global i32
; Function Attrs: nounwind ssp uwtable define i32 @main(i32 %argc, i8** %argv) #0 { entry: %retval = alloca i32, align 4 %argc.addr = alloca i32, align 4 %argv.addr = alloca i8**, align 8 store i32 0, i32* %retval store i32 %argc, i32* %argc.addr, align 4 store i8** %argv, i8*** %argv.addr, align 8 %pgocount = load i64, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @__llvm_profile_counters_main, i64 0, i64 0) %0 = add i64 %pgocount, 1 store i64 %0, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @__llvm_profile_counters_main, i64 0, i64 0) %1 = load i32, i32* %argc.addr, align 4 %cmp = icmp sgt i32 %1, 1 br i1 %cmp, label %if.then, label %if.else if.then: ; preds = %entry %pgocount1 = load i64, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @__llvm_profile_counters_main, i64 0, i64 1) %2 = add i64 %pgocount1, 1 store i64 %2, i64* getelementptr inbounds ([2 x i64], [2 x i64]* @__llvm_profile_counters_main, i64 0, i64 1) store i32 0, i32* %retval br label %return if.else: ; preds = %entry store i32 1, i32* %retval br label %return return: ; preds = %if.else, %if.then %3 = load i32, i32* %retval ret i32 %3 }
; Function Attrs: nounwind declare void @llvm.instrprof.increment(i8*, i64, i32, i32) #1
; Function Attrs: noinline define linkonce_odr hidden i32 @__llvm_profile_runtime_user() #2 { %1 = load i32, i32* @__llvm_profile_runtime ret i32 %1 }
pgo0.ll, LLVM IR after clang -fprofile-instr-use
; Function Attrs: inlinehint nounwind ssp uwtable define i32 @main(i32 %argc, i8** %argv) #0 !prof !2 { entry: %retval = alloca i32, align 4 %argc.addr = alloca i32, align 4 %argv.addr = alloca i8**, align 8 store i32 0, i32* %retval store i32 %argc, i32* %argc.addr, align 4 store i8** %argv, i8*** %argv.addr, align 8 %0 = load i32, i32* %argc.addr, align 4 %cmp = icmp sgt i32 %0, 1 br i1 %cmp, label %if.then, label %if.else, !prof !3 if.then: ; preds = %entry store i32 0, i32* %retval br label %return if.else: ; preds = %entry store i32 1, i32* %retval br label %return return: ; preds = %if.else, %if.then %1 = load i32, i32* %retval ret i32 %1 }
!2 = !{!"function_entry_count", i64 1} !3 = !{!"branch_weights", i32 1, i32 2}
pgo1.ll, LLVM IR after clang -fprofile-instr-use
The file is identical to pgo0.ll except the last line:
!3 = !{!"branch_weights", i32 2, i32 1}