GPU codegen issue with cooperative_groups and sync

So on my machine, with NVCC version 12.8, the example compiles fine. On miranda, I see the following error:

/usr/include/cooperative_groups.h(1510): error: static assertion failed with "Tiled partition with Size > 32 is supported only by cooperative_groups::experimental::tiled_partition available with experimental features enabled"
            detected during instantiation of "cooperative_groups::__v1::thread_block_tile<Size, ParentT> cooperative_groups::__v1::tiled_partition<Size,ParentT>(const ParentT &) [with Size=1024U, ParentT=cooperative_groups::__v1::thread_block]"

What are the benefits/drawbacks of the number of threads per block? It seems like this is saying we either need to enable experimental features or use less threads per block

Well it's a matter of capability - having at most 32 threads per block limits the size of tiles in general. We should really either update CUDA or enable experimental features (I don't actually know how to do this).

Probably resolved by !214 (merged)

closed

GPU codegen issue with cooperative_groups and sync

Designs

Child items ...

Activity