SmartApprox: Learning-based Configuration of Approximate Memories for Energy-efficient Execution

Our participation in the 12th International Green and Sustainable Computing Conference (IGSC)

Abstract

Approximate memories reduce power and increase energy efficiency, at the expense of errors in stored data. These errors may be tolerated, up to a point, by many applications with negligible impact on the quality of results. Uncontrolled errors in memory may, however, lead to crashes or broken outputs. Error rates are determined by fabrication and operation parameters, and error tolerance depends on algorithms, implementation, and inputs. An ideal configuration features parameters for approximate memory that minimize energy while allowing applications to produce acceptable results. This work introduces SmartApprox, a framework that configures approximation levels based on features of applications. In SmartApprox, a training phase executes a set of applications under different approximation settings, building a knowledge base that correlates application features (e.g., types of instructions and cache efficiency) with suitable approximate memory configurations. At runtime, features of new applications are sampled and approximation knobs are adjusted to correspond to the predicted error tolerance, according to existing knowledge and the current error scenario, in consonance with hardware characterization. In this work, we list and discuss sets of features that influence the approximation results and measure their impact on the error tolerance or applications. We evaluate SmartApprox on different voltage-scaled DRAM scenarios using a knowledge base of 26 applications, wherein energy savings of 36% are possible with acceptable output. An evaluation using a combined energy and quality metric shows that SmartApprox scores 97% of an exhaustive search for ideal configurations, with significantly lower effort and without application-specific quality evaluation.

Content

Presentation

Applications

The experimental evaluation of SmartApprox was performed with 26 applications on the knowledge base.

The following table contains the complete list of applications, their source, quality metric used, input workload, execution time with AxPIKE, and total instructions executed.

application benchmark quality metric input exec. time (s) total instructions
2mm Polybench [5] MAPE 32x32 double matrices 46 8,62E+07
atax Polybench [5] FEE 500x500 double matrices 43 6,72E+07
blackscholes AxBench [1] MAPE 1,000 entries 45 8,20E+07
bunzip2 CBench [3] SSI compressed 256x256 image 43 5,52E+07
bzip2 CBench [3] SSI 256x256 image 109 1,11E+08
correlation Polybench [5] FEE 64x64 long int matrices 26 4,77E+07
covariance Polybench [5] FEE 32x32 double matrices 61 9,09E+07
dijkstra MiBench [4] FEE complete graph of 60 vertices 18 3,37E+07
fannkuch-redux Benchmarks Game [2] MAPE N=9 64 1,29E+08
fasta Benchmarks Game [2] FEE N=50,000 10 2,15E+07
fft MiBench [4] MAPE Max waves 8, size 1024 82 1,56E+08
floyd-warshall Polybench [5] FEE complete graph of 64 vertices 68 1,30E+08
inversek2j AxBench [1] MAPE 1,000 coordinates 59 9,82E+07
jacobi-2d-imper Polybench [5] FEE Arrays of 32 double values 36 6,39E+07
jmeint AxBench [1] FEE 1,0000 coordinates 62 9,77E+07
jpeg AxBench [1] SSI 512x512 image 127 2,30E+08
k-nucleotide Benchmarks Game [2] MAPE 2 sequences of 50,000 nucleotides 121 9,79E+07
mandelbrot Benchmarks Game [2] SSI N=500 28 7,49E+07
mm MAPE 100x100 int matrices 42 8,01E+07
nbody Benchmarks Game [2] MAPE N=100,000 35 6,48E+07
pi MAPE N=1,000,000 17 3,53E+07
qsort MiBench [4] FEE 10,000 strings 36 6,68E+07
reg_detect Polybench [5] MAPE Iter=10,000, Length=32, MaxGrid=6 41 7,78E+07
reverse-complement Benchmarks Game [2] FEE 2 sequences of 50,000 nucleotides 30 5,67E+07
sobel SSI 256x256 image 124 1,54E+08
spectralnorm Benchmarks Game [2] MAPE N=500 48 1,26E+08

Benchmarks repositories:

[1] AxBench: http://axbench.org/

[2] Benchmarks Game: https://benchmarksgame-team.pages.debian.net/benchmarksgame/

[3] CBench: http://ctuning.org/cbench

[4] MiBench: http://vhosts.eecs.umich.edu/mibench/

[5] Polybench: https://web.cse.ohio-state.edu/~pouchet.2/software/polybench/

The source code of the applications are available at https://github.com/VArchC/apps

Quality Metrics

Quality metrics quantify how different is an approximate output compared to the accurate output. Thus, the quality metric is resulting from a comparison between the outputs of accurate and approximate executions of the same code with the same input. The quality metrics express how much the approximate output data array (F) deviates from an accurate output data array (A). Below, we show the quality metrics used for each application.

  • Fraction of Equal Elements (FEE)

FEE

where:
n: the number of elements in the data array;
A_i: the i-th element in the accurate array;
F_i: the i-th element in the approximate array.

where:
n: the number of elements in the data array;
A_i: the i-th element in the accurate array;
F_i: the i-th element in the approximate array.

SSI

where:
μ_x: the average of the array x;
δx^2: the variance of the array x;
δ
{xy}: the covariance of x and y;
c_1 = (k_1L)^2, c_2=(k_2L)^2 : two variables to stabilize the division with weak denominator;
L: the dynamic range of the pixel values (typicaly 2^{#bits per pixel} - 1);
k_1 = 0.01 and k_2=0.03.

Cite us

@INPROCEEDINGS{SmartApprox-IGSC2021,
  author={Jo\~ao {Fabr\'icio Filho} and Isa\'ias Felzmann and Lucas Wanner},
  booktitle={12th International Green and Sustainable Computing Conference (IGSC)}, 
  title={{SmartApprox: learning-based configuration of approximate memories for energy-efficient execution,}}, 
  year={2021}
}

[Download BibTeX]