SPEC2017 测试指南v1.0

快速入门与安装

  1. 下载SPEC2017
    https://pan.baidu.com/s/1kMoMJ5Ufg5oZql4HjyacAg&pwd=5thr

  2. 上传

    cpu2017-1.0.5.iso由sftp上传到某仓库路径

  3. 创建文件夹
    mkdir -p /home/speccpu2017

  4. 挂载镜像
    mount cpu2017-1.0.5.iso /mnt/

  5. 安装镜像到home目录
    cd /mnt/ && ./install.sh 输入 /home/speccpu2017 后输入 yes

  6. 配置SPEC2017(cfg为配置文件,这里使用手动构建的LLVM作为测试目标,见附录)

    cd /home/speccpu2017/ && source shrc && cd config && cp Example-clang-llvm-linux-x86.cfg x86-llvm.cfg

  7. 修改相关参数
    vim x86-llvm.cfg 并注意所有EDIT标识(修改见附录),部分参量用lscpu命令查看(注意OPENMP,FORTRAN要配置!)

基础使用

环境依赖

  • numactl ,numactl-libs, perl, gcc, gfortran

源码时效性

需要修改已经过时的部分源码:

  • ${SPEC2017_source}/benchspec/CPU/510.parest_r/src/source/base/parameter_handler.cc
    line750:AssertThrow ((s.c_str()[0] != '\0') || (*endptr == '\0'),

  • ${SPEC2017_source}/benchspec/CPU/527.cam4_r/src/mpi.c

    使用vim键入:%s/FORT_NAME/int FORT_NAME/g

  1. 编译选择性能测试
    runcpu --config=x86-llvm.cfg --action=build all或者fprate,fpspeed,intrate,intspeed or all

  2. 运行性能测试
    runcpu --config=x86-llvm.cfg -s specspeed_int_base

  3. 编译所有题并进行测试
    runcpu --config=x86-llvm-yz-basic.cfg --size=ref --tune=base --action=run all

  4. 编译单测题目
    runcpu --config=x86-llvm-yz-basic.cfg --tune=base 628(628也可以是完整的题目名)

  5. 挂机完整base优化测试
    nohup time -p runcpu --tune=base --rebuild --config=x86-llvm-yz-basic.cfg all >log 2>&1 &

    * 注意base是基础优化,peak是激进优化,暂时还有编译bug

高级使用

结合perf (!注意perl语言版本,runcpu由perl语言书写)

perf stat -e instructions,其他数据 runcpu --config=x86-llvm-yz.cfg --size=test --iterations=1 --fake --tune=base 619.lbm_s

此方法会导致将编译过程也算在内(可以采用上述--action=build的办法缓解)

建议使用以下脚本进行结合perf的单独测试

发现过程:runcpu --config=x86-llvm-yz-basic.cfg --size=ref --tune=base --action=run all > 1中有运行命令

#!/bin/bash
# 清除所有现有构建项
rm -f /data/speccpu2017/result/* 
cd /data/speccpu2017 && rm -f nohup.out perf_result.txt clean.log build.log
# find ./benchspec/CPU -type d -name "run_base*mytest-m64.*" -exec rm -rf {} +
runcpu --action=clean --tune=base --config=x86-llvm-yz-basic.cfg all > /data/speccpu2017/clean.log 2>&1
# 编译构建所有基准测试
runcpu --action=build --tune=base --rebuild --config=x86-llvm-yz-basic.cfg all > /data/speccpu2017/build.log 2>&1
# 557.xr存在校验码问题,不可单独提取测试

output_file="/data/speccpu2017/perf_result.txt"

# Function to run a benchmark with perf stat using numactl
run_with_perf_numactl() {
  local benchmark_name=$1
  local command=$2
  echo "Running $benchmark_name..." >> $output_file
  perf stat -e cycles,instructions --append --output=$output_file numactl -m 0 --physcpubind=39 bash -c "$command 0<&- > /dev/null 2>> /data/speccpu2017/error.log"
  echo "" >> $output_file
}

# Function to run a benchmark with perf stat without numactl
run_with_perf_direct() {
  local benchmark_name=$1
  local command=$2
  echo "Running $benchmark_name..." >> $output_file
  perf stat -e cycles,instructions --append --output=$output_file bash -c "$command 0<&- > /dev/null 2>> /dev/null"
  echo "" >> $output_file
}

######### SPECrate2017 整型 #########
# 500.perlbench_r 
cd /data/speccpu2017/benchspec/CPU/500.perlbench_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "500.perlbench_r" "../run_base_refrate_mytest-m64.0000/perlbench_r_base.mytest-m64 -I./lib checkspam.pl 2500 5 25 11 150 1 1 1 1"

# 502.gcc_r
cd /data/speccpu2017/benchspec/CPU/502.gcc_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "502.gcc_r" "../run_base_refrate_mytest-m64.0000/cpugcc_r_base.mytest-m64 gcc-pp.c -O3 -finline-limit=0 -fif-conversion -fif-conversion2 -o gcc-pp.opts-O3_-finline-limit_0_-fif-conversion_-fif-conversion2.s"

# 505.mcf_r
cd /data/speccpu2017/benchspec/CPU/505.mcf_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "505.mcf_r" "../run_base_refrate_mytest-m64.0000/mcf_r_base.mytest-m64 inp.in"

# 520.omnetpp_r
cd /data/speccpu2017/benchspec/CPU/520.omnetpp_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "520.omnetpp_r" "../run_base_refrate_mytest-m64.0000/omnetpp_r_base.mytest-m64 -c General -r 0"

# 523.xalancbmk_r
cd /data/speccpu2017/benchspec/CPU/523.xalancbmk_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "523.xalancbmk_r" "../run_base_refrate_mytest-m64.0000/cpuxalan_r_base.mytest-m64 -v t5.xml xalanc.xsl"

# 525.x264_r
cd /data/speccpu2017/benchspec/CPU/525.x264_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "525.x264_r" "../run_base_refrate_mytest-m64.0000/x264_r_base.mytest-m64 --pass 1 --stats x264_stats.log --bitrate 1000 --frames 1000 -o BuckBunny_New.264 BuckBunny.yuv 1280x720"

# 531.deepsjeng_r
cd /data/speccpu2017/benchspec/CPU/531.deepsjeng_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "531.deepsjeng_r" "../run_base_refrate_mytest-m64.0000/deepsjeng_r_base.mytest-m64 ref.txt"

# 541.leela_r
cd /data/speccpu2017/benchspec/CPU/541.leela_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "541.leela_r" "../run_base_refrate_mytest-m64.0000/leela_r_base.mytest-m64 ref.sgf"

# 548.exchange2_r
cd /data/speccpu2017/benchspec/CPU/548.exchange2_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "548.exchange2_r" "../run_base_refrate_mytest-m64.0000/exchange2_r_base.mytest-m64 6"

######### SPECrate2017 浮点型 #########
# 503.bwaves_r
cd /data/speccpu2017/benchspec/CPU/503.bwaves_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "503.bwaves_r (bwaves_1)" "../run_base_refrate_mytest-m64.0000/bwaves_r_base.mytest-m64 bwaves_1 < bwaves_1.in"
run_with_perf_numactl "503.bwaves_r (bwaves_2)" "../run_base_refrate_mytest-m64.0000/bwaves_r_base.mytest-m64 bwaves_2 < bwaves_2.in"
run_with_perf_numactl "503.bwaves_r (bwaves_3)" "../run_base_refrate_mytest-m64.0000/bwaves_r_base.mytest-m64 bwaves_3 < bwaves_3.in"
run_with_perf_numactl "503.bwaves_r (bwaves_4)" "../run_base_refrate_mytest-m64.0000/bwaves_r_base.mytest-m64 bwaves_4 < bwaves_4.in"

# 507.cactuBSSN_r
cd /data/speccpu2017/benchspec/CPU/507.cactuBSSN_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "507.cactuBSSN_r" "../run_base_refrate_mytest-m64.0000/cactusBSSN_r_base.mytest-m64 spec_ref.par"

# 508.namd_r
cd /data/speccpu2017/benchspec/CPU/508.namd_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "508.namd_r" "../run_base_refrate_mytest-m64.0000/namd_r_base.mytest-m64 --input apoa1.input --output apoa1.ref.output --iterations 65"

# 510.parest_r
cd /data/speccpu2017/benchspec/CPU/510.parest_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "510.parest_r" "../run_base_refrate_mytest-m64.0000/parest_r_base.mytest-m64 ref.prm"

# 511.povray_r
cd /data/speccpu2017/benchspec/CPU/511.povray_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "511.povray_r" "../run_base_refrate_mytest-m64.0000/povray_r_base.mytest-m64 SPEC-benchmark-ref.ini"

# 519.lbm_r
cd /data/speccpu2017/benchspec/CPU/519.lbm_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "519.lbm_r" "../run_base_refrate_mytest-m64.0000/lbm_r_base.mytest-m64 3000 reference.dat 0 0 100_100_130_ldc.of"

# 521.wrf_r
cd /data/speccpu2017/benchspec/CPU/521.wrf_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "521.wrf_r" "../run_base_refrate_mytest-m64.0000/wrf_r_base.mytest-m64"

# 526.blender_r
cd /data/speccpu2017/benchspec/CPU/526.blender_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "526.blender_r" "../run_base_refrate_mytest-m64.0000/blender_r_base.mytest-m64 sh3_no_char.blend --render-output sh3_no_char_ --threads 1 -b -F RAWTGA -s 849 -e 849 -a"

# 527.cam4_r
cd /data/speccpu2017/benchspec/CPU/527.cam4_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "527.cam4_r" "../run_base_refrate_mytest-m64.0000/cam4_r_base.mytest-m64"

# 538.imagick_r
cd /data/speccpu2017/benchspec/CPU/538.imagick_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "538.imagick_r" "../run_base_refrate_mytest-m64.0000/imagick_r_base.mytest-m64 -limit disk 0 refrate_input.tga -edge 41 -resample 181% -emboss 31 -colorspace YUV -mean-shift 19x19+15% -resize 30% refrate_output.tga"

# 544.nab_r
cd /data/speccpu2017/benchspec/CPU/544.nab_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "544.nab_r" "../run_base_refrate_mytest-m64.0000/nab_r_base.mytest-m64 1am0 1122214447 122"

# 549.fotonik3d_r
cd /data/speccpu2017/benchspec/CPU/549.fotonik3d_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "549.fotonik3d_r" "../run_base_refrate_mytest-m64.0000/fotonik3d_r_base.mytest-m64"

# 554.roms_r
cd /data/speccpu2017/benchspec/CPU/554.roms_r/run/run_base_refrate_mytest-m64.0000
run_with_perf_numactl "554.roms_r" "../run_base_refrate_mytest-m64.0000/roms_r_base.mytest-m64 < ocean_benchmark2.in.x"

######### SPECspeed2017 整型 #########
# 600.perlbench_s
cd /data/speccpu2017/benchspec/CPU/600.perlbench_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "600.perlbench_s (checkspam)" "../run_base_refspeed_mytest-m64.0000/perlbench_s_base.mytest-m64 -I./lib checkspam.pl 2500 5 25 11 150 1 1 1 1"
run_with_perf_direct "600.perlbench_s (diffmail)" "../run_base_refspeed_mytest-m64.0000/perlbench_s_base.mytest-m64 -I./lib diffmail.pl 4 800 10 17 19 300"
run_with_perf_direct "600.perlbench_s (splitmail)" "../run_base_refspeed_mytest-m64.0000/perlbench_s_base.mytest-m64 -I./lib splitmail.pl 6400 12 26 16 100 0"

# 602.gcc_s
cd /data/speccpu2017/benchspec/CPU/602.gcc_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "602.gcc_s (opts-O5_-fipa-pta)" "../run_base_refspeed_mytest-m64.0000/sgcc_base.mytest-m64 gcc-pp.c -O5 -fipa-pta -o gcc-pp.opts-O5_-fipa-pta.s"
run_with_perf_direct "602.gcc_s (opts-O5_-finline-limit_1000_-fselective-scheduling_-fselective-scheduling2)" "../run_base_refspeed_mytest-m64.0000/sgcc_base.mytest-m64 gcc-pp.c -O5 -finline-limit=1000 -fselective-scheduling -fselective-scheduling2 -o gcc-pp.opts-O5_-finline-limit_1000_-fselective-scheduling_-fselective-scheduling2.s"
run_with_perf_direct "602.gcc_s (opts-O5_-finline-limit_24000_-fgcse_-fgcse-las_-fgcse-lm_-fgcse-sm)" "../run_base_refspeed_mytest-m64.0000/sgcc_base.mytest-m64 gcc-pp.c -O5 -finline-limit=24000 -fgcse -fgcse-las -fgcse-lm -fgcse-sm -o gcc-pp.opts-O5_-finline-limit_24000_-fgcse_-fgcse-las_-fgcse-lm_-fgcse-sm.s"

# 605.mcf_s
cd /data/speccpu2017/benchspec/CPU/605.mcf_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "605.mcf_s" "../run_base_refspeed_mytest-m64.0000/mcf_s_base.mytest-m64 inp.in"

# 620.omnetpp_s
cd /data/speccpu2017/benchspec/CPU/620.omnetpp_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "620.omnetpp_s" "../run_base_refspeed_mytest-m64.0000/omnetpp_s_base.mytest-m64 -c General -r 0"

# 623.xalancbmk_s
cd /data/speccpu2017/benchspec/CPU/623.xalancbmk_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "623.xalancbmk_s" "../run_base_refspeed_mytest-m64.0000/xalancbmk_s_base.mytest-m64 -v t5.xml xalanc.xsl"

# 625.x264_s
cd /data/speccpu2017/benchspec/CPU/625.x264_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "625.x264_s (pass 1)" "../run_base_refspeed_mytest-m64.0000/x264_s_base.mytest-m64 --pass 1 --stats x264_stats.log --bitrate 1000 --frames 1000 -o BuckBunny_New.264 BuckBunny.yuv 1280x720"
run_with_perf_direct "625.x264_s (pass 2)" "../run_base_refspeed_mytest-m64.0000/x264_s_base.mytest-m64 --pass 2 --stats x264_stats.log --bitrate 1000 --dumpyuv 200 --frames 1000 -o BuckBunny_New.264 BuckBunny.yuv 1280x720"
run_with_perf_direct "625.x264_s (seek)" "../run_base_refspeed_mytest-m64.0000/x264_s_base.mytest-m64 --seek 500 --dumpyuv 200 --frames 1250 -o BuckBunny_New.264 BuckBunny.yuv 1280x720"

# 631.deepsjeng_s
cd /data/speccpu2017/benchspec/CPU/631.deepsjeng_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "631.deepsjeng_s" "../run_base_refspeed_mytest-m64.0000/deepsjeng_s_base.mytest-m64 ref.txt"

# 641.leela_s
cd /data/speccpu2017/benchspec/CPU/641.leela_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "641.leela_s" "../run_base_refspeed_mytest-m64.0000/leela_s_base.mytest-m64 ref.sgf"

# 648.exchange2_s
cd /data/speccpu2017/benchspec/CPU/648.exchange2_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "648.exchange2_s" "../run_base_refspeed_mytest-m64.0000/exchange2_s_base.mytest-m64 6"

######### SPECspeed2017 浮点型 #########
# 603.bwaves_s
cd /data/speccpu2017/benchspec/CPU/603.bwaves_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "603.bwaves_s (bwaves_1)" "../run_base_refspeed_mytest-m64.0000/speed_bwaves_base.mytest-m64 bwaves_1 < bwaves_1.in"
run_with_perf_direct "603.bwaves_s (bwaves_2)" "../run_base_refspeed_mytest-m64.0000/speed_bwaves_base.mytest-m64 bwaves_2 < bwaves_2.in"

# 607.cactuBSSN_s
cd /data/speccpu2017/benchspec/CPU/607.cactuBSSN_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "607.cactuBSSN_s" "../run_base_refspeed_mytest-m64.0000/cactuBSSN_s_base.mytest-m64 spec_ref.par"

# 619.lbm_s
cd /data/speccpu2017/benchspec/CPU/619.lbm_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "619.lbm_s" "../run_base_refspeed_mytest-m64.0000/lbm_s_base.mytest-m64 2000 reference.dat 0 0 200_200_260_ldc.of"

# 621.wrf_s
cd /data/speccpu2017/benchspec/CPU/621.wrf_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "621.wrf_s" "../run_base_refspeed_mytest-m64.0000/wrf_s_base.mytest-m64"

# 627.cam4_s
cd /data/speccpu2017/benchspec/CPU/627.cam4_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "627.cam4_s" "../run_base_refspeed_mytest-m64.0000/cam4_s_base.mytest-m64"

# 628.pop2_s
cd /data/speccpu2017/benchspec/CPU/628.pop2_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "628.pop2_s" "../run_base_refspeed_mytest-m64.0000/speed_pop2_base.mytest-m64"

# 638.imagick_s
cd /data/speccpu2017/benchspec/CPU/638.imagick_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "638.imagick_s" "../run_base_refspeed_mytest-m64.0000/imagick_s_base.mytest-m64 -limit disk 0 refspeed_input.tga -resize 817% -rotate -2.76 -shave 540x375 -alpha remove -auto-level -contrast-stretch 1x1% -colorspace Lab -channel R -equalize +channel -colorspace sRGB -define histogram:unique-colors=false -adaptive-blur 0x5 -despeckle -auto-gamma -adaptive-sharpen 55 -enhance -brightness-contrast 10x10 -resize 30% refspeed_output.tga"

# 644.nab_s
cd /data/speccpu2017/benchspec/CPU/644.nab_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "644.nab_s" "../run_base_refspeed_mytest-m64.0000/nab_s_base.mytest-m64 3j1n 20140317 220"

# 649.fotonik3d_s
cd /data/speccpu2017/benchspec/CPU/649.fotonik3d_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "649.fotonik3d_s" "../run_base_refspeed_mytest-m64.0000/fotonik3d_s_base.mytest-m64"

# 654.roms_s
cd /data/speccpu2017/benchspec/CPU/654.roms_s/run/run_base_refspeed_mytest-m64.0000
run_with_perf_direct "654.roms_s" "../run_base_refspeed_mytest-m64.0000/sroms_base.mytest-m64 < ocean_benchmark3.in"

619运行脚本+perf自动取平均(脚本传入运行次数)

#/home/yangz//llvm/build/bin/clang -m64 -c -o lbm.o -DSPEC -DNDEBUG -DLARGE_WORKLOAD   -O2 -mavx -mllvm --misched-topdown          -DSPEC_OPENMP -fopenmp -Wno-return-type -DUSE_OPENMP -I /usr/lib/gcc/x86_64-redhat-linux/8/include/    -DSPEC_LP64  lbm.c
#/home/yangz//llvm/build/bin/clang -m64 -c -o main.o -DSPEC -DNDEBUG -DLARGE_WORKLOAD   -O2 -mavx -mllvm --misched-topdown          -DSPEC_OPENMP -fopenmp -Wno-return-type -DUSE_OPENMP -I /usr/lib/gcc/x86_64-redhat-linux/8/include/    -DSPEC_LP64  main.c
#!/bin/bash

# 检查参数
if [ "$#" -ne 1 ]; then
    echo "Usage: $0 <number of runs>"
    exit 1
fi

# 参数:执行次数
NUM_RUNS=$1

# 进入SPEC CPU测试目录
cd /home/speccpu2017/benchspec/CPU/619.lbm_s/run/run_base_test_mytest-m64.0000

# 初始化累加器
total_instructions=0
total_cycles=0
total_time=0

# 打印标题
echo "Run Number, Instructions, CPU Cycles, Time Elapsed (seconds)" > perf_results.txt

# 执行perf stat多次,并累加结果
for (( i=1; i<=NUM_RUNS; i++ ))
do
    perf stat -e instructions,cpu-cycles ./lbm_s_base.mytest-m64 20 reference.dat 0 1 200_200_260_ldc.of 0<&- > lbm.out 2>perf_output.tmp

    # 提取指令数、CPU周期数和时间
    instructions=$(grep 'instructions' perf_output.tmp | awk '{print $1}' | tr -d ',')
    cpu_cycles=$(grep 'cpu-cycles' perf_output.tmp | awk '{print $1}' | tr -d ',')
    time_elapsed=$(grep 'seconds time elapsed' perf_output.tmp | awk '{print $1}')

    # 累加结果
    total_instructions=$(echo "$total_instructions + $instructions" | bc)
    total_cycles=$(echo "$total_cycles + $cpu_cycles" | bc)
    total_time=$(echo "$total_time + $time_elapsed" | bc)

    # 输出当前运行的结果到文件
    echo "$i, $instructions, $cpu_cycles, $time_elapsed" >> perf_results.txt
done

# 计算均值
avg_instructions=$(echo "scale=0; $total_instructions / $NUM_RUNS" | bc)
avg_cycles=$(echo "scale=0; $total_cycles / $NUM_RUNS" | bc)
avg_time=$(echo "scale=2; $total_time / $NUM_RUNS" | bc)

# 输出均值到文件
echo "Average, $avg_instructions, $avg_cycles, $avg_time" >> perf_results.txt

# 清理临时文件
rm perf_output.tmp

# 打印最终结果
cat perf_results.txt
#/home/yangz//llvm/build/bin/clang -m64       -O2 -mavx -mllvm --misched-topdown   -z muldefs    -DSPEC_OPENMP -fopenmp -Wno-return-type -DUSE_OPENMP -I /usr/lib/gcc/x86_64-redhat-linux/8/include/  lbm.o main.o             -lm     -fopenmp=libomp -L/usr/lib/gcc/x86_64-redhat-linux/8/include/ -lomp    -o lbm_s

工具链使用

runcpu参数 解释
--reportable 生成SPEC的报告结果
—fakereport 仅生成报告而不实际编译或运行
—threads N 设置运行时所用的线程数
—tune=TUNE[,TUNE…] 选择调优级别,如base、peak,影响优化和运行配置
—output_format=FORMAT[,…] 设定输出格式,如html、pdf、text等
—debug LEVEL 设置调试信息的详细程度

附录A——参考文献

附录B——配置文件样例

#------------------------------------------------------------------------------
# SPEC CPU2017 config file for: LLVM / Linux / AMD64
#------------------------------------------------------------------------------
# 
# Usage: (1) Copy this to a new name
#             cd $SPEC/config
#             cp Example-x.cfg myname.cfg
#        (2) Change items that are marked 'EDIT' (search for it)
# 
# SPEC tested this config file with:
#    Compiler version(s):     LLVM/3.9.0
#    Operating system(s):     Linux
#    Hardware:                AMD64
#
# If your system differs, this config file might not work.
# You might find a better config file at http://www.spec.org/cpu2017/results
#
# Compiler issues: Contact your compiler vendor, not SPEC.
# For SPEC help:   http://www.spec.org/cpu2017/Docs/techsupport.html
#------------------------------------------------------------------------------


#--------- Label --------------------------------------------------------------
# Arbitrary string to tag binaries (no spaces allowed)
#                  Two Suggestions: # (1) EDIT this label as you try new ideas.
%define label mytest                # (2)      Use a label meaningful to *you*.


#--------- Preprocessor -------------------------------------------------------
%ifndef %{bits}                # EDIT to control 32 or 64 bit compilation.  Or, 
%   define  bits        64     #      you can set it on the command line using:
%endif                         #      'runcpu --define bits=nn'

%ifndef %{build_ncpus}         # EDIT to adjust number of simultaneous compiles.
%   define  build_ncpus 36      #      Or, you can set it on the command line: 
%endif                         #      'runcpu --define build_ncpus=nn'

# Don't change this part.
%define  os          LINUX
%if %{bits} == 64
%   define model        -m64   
%elif %{bits} == 32
%   define model        -m32   
%else
%   error Please define number of bits - see instructions in config file
%endif
%if %{label} =~ m/ /
%   error Your label "%{label}" contains spaces.  Please try underscores instead.
%endif
%if %{label} !~ m/^[a-zA-Z0-9._-]+$/
%   error Illegal character in label "%{label}".  Please use only alphanumerics, underscore, hyphen, and period.
%endif


#--------- Global Settings ----------------------------------------------------
# For info, see:
#            https://www.spec.org/cpu2017/Docs/config.html#fieldname   
#   Example: https://www.spec.org/cpu2017/Docs/config.html#tune

#backup_config          = 0                     # Uncomment for cleaner config/ directory
flagsurl01              = $[top]/config/flags/gcc.xml
flagsurl02              = $[top]/config/flags/clang.xml
ignore_errors           = 1
iterations              = 1
label                   = %{label}-m%{bits}
line_width              = 1020
log_line_width          = 1020
makeflags               = --jobs=%{build_ncpus}
mean_anyway             = 1
output_format           = txt,html,cfg,pdf,csv

preenv 			= $ENV{'PERF_COMMAND'} = "perf stat -e cycles,instructions,cache-references,cache-misses,branches,branch-misses -o /data/myrepo/perf_output.txt"
tune                    = base,peak

#--------- How Many CPUs? -----------------------------------------------------
# Both SPECrate and SPECspeed can test multiple chips / cores / hw threads
#    - For SPECrate,  you set the number of copies.
#    - For SPECspeed, you set the number of threads. 
# See: https://www.spec.org/cpu2017/Docs/system-requirements.html#MultipleCPUs
#
#    q. How many should I set?  
#    a. Unknown, you will have to try it and see!
#
# To get you started:
#
#     copies - This config file sets 1 copy per core (after you set the 
#              'cpucores' variable, just below).
#              Please be sure you have enough memory; if you do not, you might 
#              need to run a smaller number of copies.  See:
#              https://www.spec.org/cpu2017/Docs/system-requirements.html#memory
#
#     threads - This config file sets a starting point.  You can try adjusting it.
#               Higher thread counts are much more likely to be useful for
#               fpspeed than for intspeed.
#
#
# To do so, please adjust these; also adjust the 'numactl' lines, below

                               # EDIT to define system sizes 
%define  cpucores       20      #         number of physical cores
%define  cputhreads     40      #         number of logical cores
%define  numanodes      2      #         number of NUMA nodes for affinity

intrate,fprate:
   copies                  = 1 #%{cpucores}      
intspeed,fpspeed:
   threads                 = 1 #%{cputhreads}   

#-------- CPU binding for rate -----------------------------------------------
# When you run multiple copies for SPECrate mode, performance
# is improved if you bind the copies to specific processors.  EDIT the numactl stuff below.

intrate,fprate:
submit       = echo "$command" > run.sh ; $BIND bash run.sh

# Affinity settings:                     EDIT this section
# Please adjust these values for your 
# particular system as these settings are 
# for an 8 core, one NUMA node (-m 0) system.
bind0	= numactl -m 0 --physcpubind=0
# NUMA node 0
bind0   = numactl -m 0 --physcpubind=0
bind1   = numactl -m 0 --physcpubind=1
bind2   = numactl -m 0 --physcpubind=2
bind3   = numactl -m 0 --physcpubind=3
bind4   = numactl -m 0 --physcpubind=4
bind5   = numactl -m 0 --physcpubind=5
bind6   = numactl -m 0 --physcpubind=6
bind7   = numactl -m 0 --physcpubind=7
bind8   = numactl -m 0 --physcpubind=8
bind9   = numactl -m 0 --physcpubind=9
bind10  = numactl -m 0 --physcpubind=20
bind11  = numactl -m 0 --physcpubind=21
bind12  = numactl -m 0 --physcpubind=22
bind13  = numactl -m 0 --physcpubind=23
bind14  = numactl -m 0 --physcpubind=24
bind15  = numactl -m 0 --physcpubind=25
bind16  = numactl -m 0 --physcpubind=26
bind17  = numactl -m 0 --physcpubind=27
bind18  = numactl -m 0 --physcpubind=28
bind19  = numactl -m 0 --physcpubind=29

# NUMA node 1
bind20  = numactl -m 1 --physcpubind=10
bind21  = numactl -m 1 --physcpubind=11
bind22  = numactl -m 1 --physcpubind=12
bind23  = numactl -m 1 --physcpubind=13
bind24  = numactl -m 1 --physcpubind=14
bind25  = numactl -m 1 --physcpubind=15
bind26  = numactl -m 1 --physcpubind=16
bind27  = numactl -m 1 --physcpubind=17
bind28  = numactl -m 1 --physcpubind=18
bind29  = numactl -m 1 --physcpubind=19
bind30  = numactl -m 1 --physcpubind=30
bind31  = numactl -m 1 --physcpubind=31
bind32  = numactl -m 1 --physcpubind=32
bind33  = numactl -m 1 --physcpubind=33
bind34  = numactl -m 1 --physcpubind=34
bind35  = numactl -m 1 --physcpubind=35
bind36  = numactl -m 1 --physcpubind=36
bind37  = numactl -m 1 --physcpubind=37
bind38  = numactl -m 1 --physcpubind=38
bind39  = numactl -m 1 --physcpubind=39
#------- Compilers ------------------------------------------------------------
default:
#                                      EDIT paths to LLVM and libraries:
    BASE_DIR           = /data/yangz
    # LLVM_PATH specifies the directory path containing required LLVM files and
    # potentially multiple LLVM versions.
    LLVM_PATH          = $[BASE_DIR]/llvm-project
    # LLVM_ROOT_PATH specifies the directory path to the LLVM version to be
    # used. EDIT: Change llvm-v390 to appropriate directory name.
    LLVM_ROOT_PATH     = $[LLVM_PATH]/build
    LLVM_BIN_PATH      = $[LLVM_ROOT_PATH]/bin
    LLVM_LIB_PATH      = $[LLVM_ROOT_PATH]/lib
    LLVM_INCLUDE_PATH  = $[LLVM_ROOT_PATH]/include
    DRAGONEGG_PATH     = #$[LLVM_PATH]/dragonegg
    DRAGONEGG_SPECS    = #$[DRAGONEGG_PATH]/integrated-as.specs
    # DragonEgg version 3.5.2 requires GCC version 4.8.2.
    # EDIT LLVM_GCC_DIR to reflect the GCC path.
    LLVM_GCC_DIR       = /home/yangz/llvm/build/bin
    GFORTRAN_DIR       = /usr/bin
    # Specify Intel OpenMP library path.
    OPENMP_DIR         = /usr/lib/gcc/x86_64-redhat-linux/8/include/
   
    preENV_PATH             = $[LLVM_BIN_PATH]:%{ENV_PATH}

    CC                  = $(LLVM_BIN_PATH)/clang %{model}
    CXX                 = $(LLVM_BIN_PATH)/clang++ %{model}
    FORTRAN_COMP        = $(GFORTRAN_DIR)/gfortran
    FC                  = $(FORTRAN_COMP) %{model}
    CLD                 = $(LLVM_BIN_PATH)/clang %{model}
    FLD                 = $(FORTRAN_COMP) %{model}
    # How to say "Show me your version, please"
    CC_VERSION_OPTION   = -v
    CXX_VERSION_OPTION  = -v
    FC_VERSION_OPTION   = -v

default:
%if %{bits} == 64
   sw_base_ptrsize = 64-bit
   sw_peak_ptrsize = 64-bit
%else
   sw_base_ptrsize = 32-bit
   sw_peak_ptrsize = 32-bit
%endif

intrate,intspeed:                     # 502.gcc_r and 602.gcc_s may need the 
%if %{bits} == 32                     # flags from this section.  For 'base',
    EXTRA_COPTIMIZE = -fgnu89-inline  # all benchmarks must use the same 
%else                                 # options, so we add them to all of 
    LDCFLAGS        = -z muldefs     # integer rate and integer speed.  See:
%endif                                # www.spec.org/cpu2017/Docs/benchmarks/502.gcc_r.html

#--------- Portability --------------------------------------------------------
default:# data model applies to all benchmarks
%if %{bits} == 32
    # Strongly recommended because at run-time, operations using modern file 
    # systems may fail spectacularly and frequently (or, worse, quietly and 
    # randomly) if a program does not accommodate 64-bit metadata.
    EXTRA_PORTABILITY = -D_FILE_OFFSET_BITS=64
%else
    EXTRA_PORTABILITY = -DSPEC_LP64
%endif

# Benchmark-specific portability (ordered by last 2 digits of bmark number)

500.perlbench_r,600.perlbench_s:  #lang='C'
%if %{bits} == 32
%   define suffix IA32
%else
%   define suffix X64
%endif
   PORTABILITY    = -DSPEC_%{os}_%{suffix} 

521.wrf_r,621.wrf_s:  #lang='F,C'
   CPORTABILITY  = -DSPEC_CASE_FLAG 
   FPORTABILITY  = -fconvert=big-endian 

523.xalancbmk_r,623.xalancbmk_s:  #lang='CXX'
   PORTABILITY   = -DSPEC_%{os}

526.blender_r:  #lang='CXX,C'
    CPORTABILITY = -funsigned-char 
    CXXPORTABILITY = -D__BOOL_DEFINED 

527.cam4_r,627.cam4_s:  #lang='F,C'
   PORTABILITY   = -DSPEC_CASE_FLAG

628.pop2_s:  #lang='F,C'
    CPORTABILITY = -DSPEC_CASE_FLAG
    FPORTABILITY = -fconvert=big-endian 

#--------  Baseline Tuning Flags ----------------------------------------------
default=base:
    strict_rundir_verify = 0
    COPTIMIZE     = -O2 -mavx -mllvm --regalloc=basic
    CXXOPTIMIZE   = -O2 -mavx -std=c++14
    FOPTIMIZE     = -O2 -mavx -funroll-loops 
    #EXTRA_FFLAGS  = -fplugin=$(DRAGONEGG_PATH)/dragonegg.so -Wno-register  -std=c++14
    EXTRA_FFLAGS  = -Wno-register  -std=c++14
    EXTRA_FLIBS   = -L/usr/lib64 -lgfortran -lm 
    LDOPTIMIZE    = -z muldefs -no-pie
intrate,fprate:
    preENV_LIBRARY_PATH     = $[LLVM_LIB_PATH]
    preENV_LD_LIBRARY_PATH  = $[LLVM_LIB_PATH]
   #preENV_LIBRARY_PATH     = $[LLVM_LIB_PATH]:%{ENV_LIBRARY_PATH}
   #preENV_LD_LIBRARY_PATH  = $[LLVM_LIB_PATH]:%{ENV_LD_LIBRARY_PATH}

#
# Speed (OpenMP and Autopar allowed)
#
%if %{bits} == 32
   intspeed,fpspeed:
   #
   # Many of the speed benchmarks (6nn.benchmark_s) do not fit in 32 bits
   # If you wish to run SPECint2017_speed or SPECfp2017_speed, please use
   #
   #     runcpu --define bits=64
   #
   fail_build = 1
%else
   intspeed,fpspeed:
       OPENMP_LIB_PATH          = $[OPENMP_DIR]
       EXTRA_OPTIMIZE           = -DSPEC_OPENMP -fopenmp -Wno-return-type -DUSE_OPENMP -I $(OPENMP_DIR)
       EXTRA_LIBS               = -fopenmp -L$(OPENMP_LIB_PATH) -lomp
       EXTRA_FLIBS              = -fopenmp -L/usr/lib64 -lgfortran -lm
       preENV_LIBRARY_PATH      = $[LLVM_LIB_PATH]:$[OPENMP_LIB_PATH]
       preENV_LD_LIBRARY_PATH   = $[LLVM_LIB_PATH]:$[OPENMP_LIB_PATH]
      #preENV_LIBRARY_PATH      = $[LLVM_LIB_PATH]:$[OPENMP_LIB_PATH]:%{ENV_LIBRARY_PATH}
      #preENV_LD_LIBRARY_PATH   = $[LLVM_LIB_PATH]:$[OPENMP_LIB_PATH]:%{ENV_LD_LIBRARY_PATH}
       preENV_OMP_THREAD_LIMIT  = %{cputhreads}
       preENV_OMP_STACKSIZE     = 128M
       preENV_GOMP_CPU_AFFINITY = 0-%{cputhreads}
%endif

#--------  Peak Tuning Flags ----------------------------------------------
default=peak:
    strict_rundir_verify = 0	
    COPTIMIZE     = -Ofast -mavx
    CXXOPTIMIZE   = -O3 -mavx -std=c++14
    EXTRA_FLIBS   = -lgfortran -lm
    FOPTIMIZE     = -Ofast -mavx -funroll-loops -fno-stack-arrays
    #EXTRA_FFLAGS  = -fplugin=$(DRAGONEGG_PATH)/dragonegg.so -Wno-register  -std=c++14
    EXTRA_FFLAGS  =  -Wno-register  -std=c++14

502.gcc_r,602.gcc_s=peak:  #lang='C'                        
    LDOPTIMIZE    = -z muldefs    

521.wrf_r,621.wrf_s=peak:  #lang='F,C'                      
    COPTIMIZE     = -O3 -freciprocal-math -ffp-contract=fast -mavx
    EXTRA_FLIBS   = -lgfortran -lm
    FOPTIMIZE     = -O3 -freciprocal-math -ffp-contract=fast -mavx -funroll-loops
    #EXTRA_FFLAGS  = -fplugin=$(DRAGONEGG_PATH)/dragonegg.so

523.xalancbmk_r,623.xalancbmk_s=peak:  #lang='CXX           
    CXXOPTIMIZE   = -O3 -mavx

#------------------------------------------------------------------------------
# Tester and System Descriptions - EDIT all sections below this point              
#------------------------------------------------------------------------------
#   For info about any field, see
#             https://www.spec.org/cpu2017/Docs/config.html#fieldname
#   Example:  https://www.spec.org/cpu2017/Docs/config.html#hw_memory
#-------------------------------------------------------------------------------

#--------- EDIT to match your version -----------------------------------------
default:
   sw_compiler001   = C/C++: Version 17.0.6 of Clang, the
   sw_compiler002   = LLVM Compiler Infrastructure 
   sw_compiler003   = Fortran: Version 8.5.0 of GCC, the
   sw_compiler004   = GNU Compiler Collection   
   sw_compiler005   = DragonEgg: Version 3.5.2, the
   sw_compiler006   = LLVM Compiler Infrastructure
#--------- EDIT info about you ------------------------------------------------
# To understand the difference between hw_vendor/sponsor/tester, see:
#     https://www.spec.org/cpu2017/Docs/config.html#test_sponsor
intrate,intspeed,fprate,fpspeed: # Important: keep this line
   hw_vendor          = YLQH
   tester             = My Corporation
   test_sponsor       = My Corporation
   license_num        = nnn (Your SPEC license number)
#  prepared_by        = # Ima Pseudonym                       # Whatever you like: is never output


#--------- EDIT system availability dates -------------------------------------
intrate,intspeed,fprate,fpspeed: # Important: keep this line
                        # Example                             # Brief info about field
   hw_avail           = # Nov-2099                            # Date of LAST hardware component to ship
   sw_avail           = # Nov-2099                            # Date of LAST software component to ship

#--------- EDIT system information --------------------------------------------
intrate,intspeed,fprate,fpspeed: # Important: keep this line
                        # Example                             # Brief info about field
 # hw_cpu_name        = # Intel Xeon E9-9999 v9               # chip name
   hw_cpu_nominal_mhz = # 9999                                # Nominal chip frequency, in MHz
   hw_cpu_max_mhz     = # 9999                                # Max chip frequency, in MHz
 # hw_disk            = # 9 x 9 TB SATA III 9999 RPM          # Size, type, other perf-relevant info
   hw_model           = # TurboBlaster 3000                   # system model name
 # hw_nchips          = # 99                                  # number chips enabled
   hw_ncores          = # 9999                                # number cores enabled
   hw_ncpuorder       = # 1-9 chips                           # Ordering options
   hw_nthreadspercore = # 9                                   # number threads enabled per core
   hw_other           = # TurboNUMA Router 10 Gb              # Other perf-relevant hw, or "None"

#  hw_memory001       = # 999 GB (99 x 9 GB 2Rx4 PC4-2133P-R, # The 'PCn-etc' is from the JEDEC 
#  hw_memory002       = # running at 1600 MHz)                # label on the DIMM.

   hw_pcache          = # 99 KB I + 99 KB D on chip per core  # Primary cache size, type, location
   hw_scache          = # 99 KB I+D on chip per 9 cores       # Second cache or "None"
   hw_tcache          = # 9 MB I+D on chip per chip           # Third  cache or "None"
   hw_ocache          = # 9 GB I+D off chip per system board  # Other cache or "None"

   fw_bios            = # American Megatrends 39030100 02/29/2016 # Firmware information
 # sw_file            = # ext99                               # File system
 # sw_os001           = # Linux Sailboat                      # Operating system
 # sw_os002           = # Distribution 7.2 SP1                # and version
   sw_other           = # TurboHeap Library V8.1              # Other perf-relevant sw, or "None"
 # sw_state           = # Run level 99                        # Software state.

# Note: Some commented-out fields above are automatically set to preliminary 
# values by sysinfo
#       https://www.spec.org/cpu2017/Docs/config.html#sysinfo
# Uncomment lines for which you already know a better answer than sysinfo 

附录C——Q&A

运行时间要多久?

公制 配置测试 个人 基准测试 全面运行 (可报告)
SPECrate 2017 整数 1 份 6至10分钟 2.5 小时
SPECrate 2017 浮点 1 份 5 至 36 分钟 4.8 小时
SPECspeed 2017 整数 4 个线程 6至15分钟 3.1 小时
SPECspeed 2017 浮点 16 个线程 6 至 75 分钟 4.7 小时
一个使用 2016 年系统的任意示例。 选择了 2 次迭代,仅基础,无峰值。不包括编译时间。

整个基础套件概览

类型 套件 内容 指标 有多少份? 分数越高意味着什么?
intspeed SPECspeed® 2017 整数 10 个整数基准测试 SPECspeed®2017_int_base
SPECspeed®2017_int_peak
SPECspeed®2017_int_energy_base
SPECspeed®2017_int_energy_peak
SPECspeed 套件始终运行每
个基准测试的一个副本。
分数越高,所需时间越少。
fpspeed SPECspeed®2017 浮点 10 个浮点基准测试 SPECspeed®2017_fp_base
SPECspeed®2017_fp_peak
SPECspeed®2017_fp_energy_base
SPECspeed®2017_fp_energy_peak
intrate SPECrate® 2017 整数 10 个整数基准测试 SPECrate®2017_int_base
SPECrate®2017_int_peak
SPECrate®2017_int_energy_base
SPECrate®2017_int_energy_peak
SPECrate 套件会同时运行每
个基准测试的多个副本。
测试人员可选择运行多少个副本。
分数越高,吞吐量
(单位时间内的工作量)越大。
fprate SPECrate® 2017浮点 13个浮点基准测试 SPECrate®2017_fp_base
SPECrate®2017_fp_peak
SPECrate®2017_fp_energy_base
SPECrate®2017_fp_energy_peak

测试题目概览

==SPECrate®2017 Integer== ==SPECspeed®2017 Integer== Language[1] KLOC[2] Application Area
500.perlbench_r 600.perlbench_s C 362 Perl interpreter
502.gcc_r 602.gcc_s C 1,304 GNU C compiler
505.mcf_r 605.mcf_s C 3 Route planning
520.omnetpp_r 620.omnetpp_s C++ 134 Discrete Event simulation - computer network
523.xalancbmk_r 623.xalancbmk_s C++ 520 XML to HTML conversion via XSLT
525.x264_r 625.x264_s C 96 Video compression
531.deepsjeng_r 631.deepsjeng_s C++ 10 Artificial Intelligence: alpha-beta tree search (Chess)
541.leela_r 641.leela_s C++ 21 Artificial Intelligence: Monte Carlo tree search (Go)
548.exchange2_r 648.exchange2_s Fortran 1 Artificial Intelligence: recursive solution generator (Sudoku)
557.xz_r 657.xz_s C 33 General data compression
==SPECrate®2017 FP== ==SPECspeed®2017 FP== Language KLOC Application Area
503.bwaves_r 603.bwaves_s Fortran 1 Explosion modeling
507.cactuBSSN_r 607.cactuBSSN_s C++, C, Fortran 257 Physics: relativity
508.namd_r C++ 8 Molecular dynamics
510.parest_r C++ 427 Biomedical imaging: optical tomography with finite elements
511.povray_r C++, C 170 Ray tracing
519.lbm_r 619.lbm_s C 1 Fluid dynamics
521.wrf_r 621.wrf_s Fortran, C 991 Weather forecasting
526.blender_r C++, C 1,577 3D rendering and animation
527.cam4_r 627.cam4_s Fortran, C 407 Atmosphere modeling
628.pop2_s Fortran, C 338 Wide-scale ocean modeling (climate level)
538.imagick_r 638.imagick_s C 259 Image manipulation
544.nab_r 644.nab_s C 24 Molecular dynamics
549.fotonik3d_r 649.fotonik3d_s Fortran 14 Computational Electromagnetics
554.roms_r 654.roms_s Fortran 210 Regional ocean modeling
  1. 对于多语言benchmark,第一个语言决定库和链接选项
  2. KLOC = 构建中使用的源文件的行数(包括注释/空格)/1000(即代码量)

源码目录的5nn.benchmark和6nn.benchmark有点什么不同?

  1. 5nn.benchmark_r 是 SPECrate 版本
    6nn.benchmark_s 是 SPECspeed 版本
  2. 差异包括:工作负载大小、编译标志和运行规则[memory] [OpenMP] [rules]
  3. 细节区别:
    ①工作负载通常不同。 对于 SPECrate,您可以选择32位或64位编译;对于 SPECspeed,通常需要64位(-m64)
    ②OpenMP 指令:SPECrate 在构建 SPECrate 基准测试时从不使用;
    对所有 SPECspeed 2017 浮点基准测试和一个 SPECspeed 2017 整数基准测试 657.xz_s 是可选的OpenMP
    ③==对于 SPECrate,禁止编译器并行化——包括 OpenMP 和编译器自动并行化==
    ④其他:一些对在编译标志上启用了不同的源代码。

SPECspeed & SPECrate 衡量标准

  • 时间 - 例如,完成工作量所需的秒数。
  • 吞吐量——单位时间内完成的工作,例如每小时的工作量。【详见官网】

应该是使用speed还是rate?

  • 运行各种通用桌面程序的单个用户也许会对 SPECspeed2017_int_base 感兴趣。
  • 一组运行定制建模程序的科学家也许会对 SPECrate2017_fp_peak 感兴趣。

SPEC2017比2006升级之处

【见官网】

感兴趣。

SPEC2017比2006升级之处

【见官网】