DESIGN.md



Hercules' Design
Hercules' is a compiler targeting heterogenous devices. The key goals of Hercules are listed below:

Generate optimized, memory efficient, and parallel code for devices containing CPUs, GPUs, and other processing elements.
Explore language design for programming heterogenous systems in a performant, expressive, and safe manner.
Expose detailed configuration of code generation and scheduling through a novel scheduling language.
Design an intermediate representation that allows for fine-grained control of what code is executed on what device in a system.
Develop a runtime system capable of dynamically scheduling generated code fragments on a heterogenous machine.


Front-end Language Design
TODO: @aaronjc4

Scheduling Language Design
TODO: @aaronjc4

Compiler Design
The Hercules' compiler is split into the following components:

Hercules IR
The IR of the Hercules compiler is similar to the sea of nodes IR presented in "A Simple Graph-Based Intermediate Representation", with a few differences.

There is a more expressive type system than the original sea of nodes IR, including struct, array, and enum types.
There are dynamic constants, which are constants provided dynamically to the runtime system - these can be used to specify array types, unlike input dependent values.
There is no single global store. The closest analog are individual values with an array type, which support dynamic indexed read and write operations.
There is no I/O, or other side effects.
There is no recursion.
The implementation of Hercules IR does not follow the original object oriented design.

A key design consideration of Hercules IR is the absence of a concept of memory. A downside of this approach is that any language targetting Hecules IR must also be very restrictive regarding memory - in practice, this means tightly controlling or eliminating first-class references. The upside is that the compiler has complete freedom to layout data however it likes in memory when performing code generation. This includes deciding which data resides in which address spaces, which is a necessary ability for a compiler striving to have fine-grained control over what operations are computed on what devices.

Optimizations
TODO: @rarbore2

Partitioning
TODO: @rarbore2

Code Generation
Hercules uses LLVM for generating CPU and GPU code. Memory is "introduced" into the program representation at this stage. Operations in a function are separated into basic blocks. The data layout of values is decided on, and memory is allocated on the stack or is designated as separately allocated and passed into functions as necessary. Code is generated corresponding to possibly several estimates of dynamic constants.

Runtime System
The runtime system is responsible for dynamically executing code generated by Hercules. It exposes a Rust API for executing Hercules code. It takes care of memory allocation, synchronization, and scheduling.