Misc. improvements for GPU testing
- Draw functions w/ colors per-device in graphviz visualization.
- Fix bugs in GCM and loop nesting to allow forks to get to codegen.
- Add GPU schedules for dot and matmul tests that contain forks.
- Currently commented out in the build scripts, just switch which schedule_in_src line is commented out in both to actually use the GPU schedule when using --features=cuda.