-
Akash Kothari authoredAkash Kothari authored
Workflow
Source Code Generation
Generate new benchmarks with new parameter calls from pre-existing benchmarks. Each new benchmark generated corresponds to an inputted knob ID. Each new benchmark contains calls to the online profiler and prints the total time and energy usage at the end of the benchmark. Source code: llvm/projects/hpvm-tensor-rt/code_autogenerators/source_code_autogenerator.py
Usage:
python source_code_autogenerator.py
[per_tensor]- table file: File containing table containing parameters to be changed (see "Table" section for more info)
- original filenames file: File containing newline separated file names to generate code from. A simple example: ../dnn_sources/src/half/profiling/alexnet2_cifar10_half_profiling.cc ../dnn_sources/src/half/profiling/alexnet_cifar10_half_profiling.cc ../dnn_sources/src/half/profiling/mobilenet_depthwise_half_profiling.cc ../dnn_sources/src/half/profiling/mobilenet_shallow_depthwise_half_profiling.cc ../dnn_sources/src/half/profiling/resnet18_cifar10_half_profiling.cc
- per_tensor is an optional parameter. If "per_tensor" is included, the code autogenerator inserts profile calls around each tensor operation, which is what's desired, and outputs a list of all tensor calls and their corresponding times/energies. If "per_tensor" is not included, the code autogenerator inserts profile calls at the beginning and end of the entire benchmark. python source_code_autogenerator.py clean
- Deletes all autogenerated files not including the autogenerated CMakeLists.txt file
Table
Format: approx_type,knob_id additional_param1,additional_param2,... 0 old_function_name new_function_name
- Note that the parameters in the table file are ADDITIONAL parameters to be added to the function calls.
- The current approx_types supported are: fp32 (copies the source code over and doesn't modify it), fp16 (converts all fp32 calls to fp16 calls by replacing tensor with tensorHalf), perf (knob ids 20 - 29), and samp (knob ids 31 - 36). Adding additional approximation types requires changing the source code. samp,31 1,1,2,0 1.88 tensorHalfConvolution tensorConvApproxHalf samp,32 1,1,2,1 1.88 tensorHalfConvolution tensorConvApproxHalf samp,33 1,1,4,0 1.88 tensorHalfConvolution tensorConvApproxHalf samp,34 1,1,4,1 1.88 tensorHalfConvolution tensorConvApproxHalf samp,35 1,1,4,2 1.88 tensorHalfConvolution tensorConvApproxHalf samp,36 1,1,4,3 1.88 tensorHalfConvolution tensorConvApproxHalf
Output
For each file inputted (in the original filenames file), the code autogenerator creates a directory called <original_source_name>different_knobs in the samne directory as source_code_autogenerator.py. This directory contains files named <original_source_name>.txt, where the id corresponds to the kob id. Note: The code autogenerator handles local include paths by converting them to global paths. Example usage: python source_code_autogenerator.py knob_config_fp16_knobs_31_36.txt filenames_fp16_remainder.txt per_tensor
CMakeLists.txt File Generation
Generates a CMakeLists.txt file for all generated files in a specific directory based off a hardcoded CMakeLists.txt file path. Source code: llvm/projects/hpvm-tensor-rt/code_autogenerators/cmakelists_generator.py
Input
We have the choice of inputting an arbitrarily long list containing names of all generated files directories
- Ex: alexnet_cifar10_autogenerated_knobs mobilenet_cifar10_autogenerated_knobs If 0 parameters were inputted, this code generator generates the CMakeLists.txt file for all generated files (all directories ending with "autogenerated_knobs") in the current directory. This second approach works for generating a CMakeLists.txt file for the generated sources described in the previous section.
Running Benchmarks
After generating all files required (see previous steps) and building the generated benchmarks, we can either run the binaries manually or we can use an automater I created for convenience. Source code: llvm/projects/hpvm-tensor-rt/code_autogenerators/benchmark_testing_automater.py Usage: python online_benchmark_testing_automator.py <outputs_file_name> [per_tensor]
- builds_dir refers to the directory the binaries are in
- outputs_file_name refers to the name of the file containing all the profiling info (the output file)
- per_tensor MUST be set if the benchmarks were generated using the per_tensor parameter --> this parameter is needed to correctly parse the raw output of the binaries.
Generating the Table
After running all the benchmarks and getting the raw profiling data (which the runtime generates), we generate a table that behaves like a massive cache and stores all the profiling data. We do this to avoid having to run every benchmark over again; instead, we can just simulate the benchmarks using the table data. Source code: llvm/projects/soc_simulator/src/table_generator.py IMPORTANT: THIS FILE SHOULD ONLY GENERATE THE TABLE BASED ON THE EXISTING PROFILE DATA FILES. IT SHOULD NOT RERUN THE BINARIES!!!
Input
Usage: python table_generator.py <soc_ops file>
- refers to the name of the benchmark (ex: alexnet2) and is used to generate the output (see "output" section)
- refers to the directory containing the binaries that were run. IMPORTANT: This path must be the same as the path to the dir containing all the binaries we generated and ran at an earlier step; the table generator reads all profiling files generated by the runtime and organizes them into a table.
- soc_ops file: ~/soc_simular/_cifar10/_ops.txt
- num_itrs: Number of itrs we want to run the binaries for. This doesn't matter anymore because we're not running the binaries in this step.
- profiler_bin_path: Path to the offline profiler. this doesn't matter anymore bc we're not runing the binaries in this step.
Output
The table generator creats a directory called _results and a file within that directory called _tensors.txt, which contains the table. The table is outputted in the following format: ** LayerName NumOpsInLayer OpName Col1Val Col2Val ... ** Conv1 1 h2f_time h2f_energy fp32_time fp32_energy f2h_time f2h_energy fp16_perf_time fp16_perf_energy fp16_time fp16_energy Conv1 51.8808 97.2844 319.582 601.966 12.81 18.758 388.092 650.649 340.037 590.664
Example usage:
python ../../soc_simulator/src/table_generator.py lenet_keras/ 10 ~/awesome_profiler/pp
SOC Simulator
Instead of rerunning each benchmark, we simulate the benchmark runs on the gpu/promise depending on our inputted autotuner file. Source code: llvm/projects/soc_simulator/src/driver_new_config_fp16_repl.py
Input
python driver.py
- layer_info: contains info on benchmark's layers (should be in ~/soc_simulator)
- tensors_info: the table file generated in the previous step
- configurations: the file outputted from the autotuner
- results file: the name of the results file
Output
A copy of the inputted autotuner config file is created. For each configuration, the simulator computes the relative speedup and energy reduction compared to the baseline (fp32 baseline or fp16 baseline depending on the config). Then, we replace the autotuner's estimated speedup and energy reduction (first line of each configuration) with the real speedup and energy reduction. IMPORTANT NOTE: THE FIRST CONFIGURATION OF THE INPUTTED CONFIGURATIONS FILE MUST BE FOR THE FP32 BASELINE -- the soc simulator assumes that the first configuration refers to the fp32 baseline version so all speedups/energy reductions will be off if the first configuration is an actual approx config.
Example usage:
python driver_new_config_fp16_repl.py ~/soc_simulator/alexnet2_cifar10/alexnet2_layers.txt ~/sd_card/HPVMApprox/tensor_tables/alexnet2_results/alexnet2_tensors.txt ~/Gitlab/hpvm/llvm/test/VISC/DNN_Benchmarks/benchmarks/alexnet2/data/autotuner_data/tuner_confs_batch220.txt ~/Gitlab/hpvm/llvm/test/VISC/DNN_Benchmarks/benchmarks/alexnet2/data/soc_data/tuner_confs_batch220.txt