ARM Chstone Benchmark Results

Last Author
bains
Last Updated
1015 Days Ago
Subscribers
None

ARM Benchmark Results

Initial Results

These results are with the L1 instruction cache on, and branch prediction on. The L2 cache, MMU, and L1 data cache are all off.

MIPS SWARM – No Cache -O0ARM – No Cache -O3ARM – No Cache LLVM
BenchmarkCyclesFreq.Time (us)CyclesFreq.Time (us)CyclesFreq.Time (us)CyclesFreq.Time (us)
adpcm19360774.262607701746980087721893766800236720469548002559
aes7377774.2699316555168002069780486800976411924800515
blowfish95456374.2612854239278688002991013667297800170841351507880016894
dfadd1649674.26222307817800385915318001144760380060
dfdiv7150774.2696326331480032991800800115800
dfmul679674.269212797580016027054800342011980025
dfsin299336974.2640309103752088001296927928478003491800
gsm3910874.2652710681408001335254839800319274252800343
jpeg2980263974.2640132811619680880014524637729565800471623691809180046148
mips4338474.265848472728001059225037800281216350800270
motion3675374.2649512297268001537904798001133099108009599
sha120952374.261628841578084800519736796182800849576788548009599

^Geomean| ^173331.98^74.26^2335.02| ^2715618.02^800^3394.52| ^712316.75^800^890.40| ^750768.45^800^1293.67 |
^Ratio| ^1^1^1| ^15.67^10.77^1.45| ^4.11^10.77^0.38| ^4.33^10.77^0.55 |

Results with MMU and L2 Cache Enabled

The following results were obtained after enabling the MMU and L2 cache.
So far the best results are with all caches and branch prediction enabled.
Several optimizations have been tested, but few of them produce noticeable improvements.

MIPS SWARM
L1 I & D Cache, L2 Cache, MMU, B. Predict
BenchmarkCyclesFreq.Time (us)CyclesFreq.Time (us)
chstone/adpcm19360774.262607150968800189
chstone/aes7377774.26993236462800296
chstone/blowfish95456374.261285416457458002057
chstone/dfadd1649674.262223442080043
chstone/dfdiv7150774.269634705280059
chstone/dfmul679674.26921319380016
chstone/dfsin299336974.264030914183808001773
chstone/gsm3910874.26527144940800181
chstone/jpeg2980263974.264013281014657680012683
chstone/mips4338474.265845591780070
chstone/motion3675374.2649555198007
chstone/sha120952374.261628823600868002950
dhrystone2885574.263897618280095
mandelbrot4586898774.266176814427177880055340
Geomean227146.4974.263060.05259945.76800324.93
Ratio1111.1410.770.11

More detailed results can be found here: arm_vs_mips.pdf

Summary of Benchmark Results

The following things were learned when performing benchmarking:

  • Branch prediction is very important
  • L1 instruction cache is very important (especially for compute-limited benchmarks)
  • L1 data cache provides modest improvements
  • L1 data prefetch provides modest improvements
  • L2 cache is very important (especially for memory bandwidth-limited benchmarks)
  • MMU is very important because it allows caches to be used to their full potential
  • Normal memory should be marked as cacheable, inner and outer write-back in translation table entries
  • Memory should be marked as non-shareable in translation table entries
  • L2 cache controller read, write, and hold delays should be set to their minimums

Details on setting up the caches on the ARM Cortex-A9 MPCore can be found here: Using ARM Caches