Revise to have less motivation, get to the point a bit quicker,

say explicitly how we are better than existing virtual ISAs, and say our compiler prototype (not our compilation strategy) is preliminary.

Revise to have less motivation, get to the point a bit quicker,
599b8a6d · Vikram Adve · 0cfc4b4b · 599b8a6d
Commit 599b8a6d authored 9 years ago by Vikram Adve
--- a/paper/Abstract.tex
+++ b/paper/Abstract.tex
 \begin{abstract}
-Heterogeneous computing systems are expected to provide a solution to the power
+%
-wall problem as they bring specialization to the table. However, programming
+Heterogeneous computing is widely used in the System-on-chip (SoC) processors
-such systems is getting increasingly complex with the addition of diverse
+that power modern mobile devices in order to
-compute elements on a single chip. These computing elements use different
+reduce power consumption through specialization.
-parallelism models, instruction sets, and memory hierarchy, making it difficult
+However, programming such systems can be extremely complex as a single
-to achieve performance and code portability on heterogeneous systems.
+SoC combines multiple different
-Application programming for such systems would be greatly simplified if a single
+parallelism models, instruction sets, and memory hierarchies, and different 
-object code representation could be used to generate code for different compute
+SoCs use \emph{different combinations} of these features.
-units in a heterogeneous system. Previous efforts aiming to address the source
+We propose a new Virtual Instruction Set Architecture (ISA) that aims to 
-and object code portability challenges arising in such systems, such as OpenCL,
+address both functional portability and performance portability across
-CUDA, SPIR, PTX and HSAIL focus heavily on GPUs, which makes them insufficient
+mobile heterogeneous SoCs by capturing the wide range of different 
-for today's SoCs.
+parallelism models expected to be available on future SoCs.
+Our virtual ISA design uses only two parallelism models to achieve this goal:
-Virtual Instruction Set Computing (VISC) is a powerful approach to better
+\emph{a hierarchical dataflow graph with side effects} and
-portability. We propose to use it to address the code and performance
+\emph{parametric vector instructions}.
-portability problem for heterogeneous mobile SoCs. In this paper we focus on the
+Our virtual ISA is more general than existing ones that focus heavily on GPUs,
-crux of VISC approach and present a novel virtual ISA design which adds dataflow
+such as PTX, HSAIL and SPIR, e.g., it can capture both streaming pipelined
-graph abstractions to LLVM IR, to capture the diverse forms of parallelism
+parallelism and general dataflow parallelism found in many custom and 
-models exposed by today's SoCs. We also present a compilation strategy to
+semi-custom (programmable) accelerators.
-generate code for AVX, PTX and X86 backends from single virtual ISA
+We present a compilation strategy to generate code for a diverse range
-representation of a program. Through a set of experiments we show that code
+of target hardware components from the common virtual ISA.
-generated for CPUs and GPUs from single virtual ISA representation achieves
+As a first prototype, we have implemented backends for 
-acceptable performance, compared with hand-tuned code.  We further demonstrate
+GPUs that use nVidia's PTX,
-that these virtual ISA abstractions are also suited for capturing pipelining and
+vector hardware using Intel's AVX, and
+host code running on X86 processors.
+Experimental results show that code generated for vectors and GPUs 
+from a single virtual ISA representation achieves
+performance that is within about a factor of 2x of separately hand-tuned code,
+and much closer in most cases.  
+We further demonstrate qualitatively using a realistic example
+that our virtual ISA abstractions are also suited for capturing pipelining and
 streaming parallelism.
+%
 \end{abstract}