Abstract.tex

  
 \begin{abstract}
%
Heterogeneous computing is widely used in the System-on-chip (SoC) processors
that power modern mobile devices in order to
reduce power consumption through specialization.
However, programming such systems can be extremely complex as a single
SoC combines multiple different
parallelism models, instruction sets, and memory hierarchies, and different 
SoCs use \emph{different combinations} of these features.
We propose \NAME{}, a new Virtual Instruction Set Architecture (ISA) that aims to 
address both functional portability and performance portability across
mobile heterogeneous SoCs by capturing the wide range of different 
parallelism models expected to be available on future SoCs.
Our virtual ISA design uses only two parallelism models to achieve this goal:
\emph{a hierarchical dataflow graph with side effects} and
\emph{parametric vector instructions}.
\NAME{} is more general than existing ones that focus heavily on GPUs,
such as PTX, HSAIL and SPIR, e.g., it can capture both streaming pipelined
parallelism and general dataflow parallelism found in many custom and 
semi-custom (programmable) accelerators.
We present a compilation strategy to generate code for a diverse range
of target hardware components from the common virtual ISA.
As a first prototype, we have implemented backends for 
GPUs that use nVidia's PTX,
vector hardware using Intel's AVX, and
host code running on X86 processors.
Experimental results show that code generated for vectors and GPUs 
from a single virtual ISA representation achieves
performance that is within about a factor of 2x of separately hand-tuned code,
and much closer in most cases.  
We further demonstrate qualitatively using a realistic example
that our virtual ISA abstractions are also suited for capturing pipelining and
streaming parallelism.
%
\end{abstract}