GPU codegen issue with cooperative_groups and sync
On my branch matmul_paper_schedules
(which currently just contains a few small tweaks to fork opts, this bug is unrelated but those tweaks are needed to get matmul scheduled the way I want it).
Building the Juno matmul (I'm using a schedule that creates one thread for each element of the output matrix via n/32 * l/32 blocks and 32 * 32 threads per block) I end up getting errors out of nvcc on miranda that:
/tmp/.tmppAVmvH/matmul.cu(89): error: class "cooperative_groups::__v1::thread_block_tile<1024U, void>" has no member "sync"
/tmp/.tmppAVmvH/matmul.cu(108): error: class "cooperative_groups::__v1::thread_block_tile<1024U, void>" has no member "sync"
/tmp/.tmppAVmvH/matmul.cu(127): error: class "cooperative_groups::__v1::thread_block_tile<1024U, void>" has no member "sync"
NVCC version is 11.5.119
on miranda if that's relevant.