Something went wrong on our end
notes.org 7.45 KiB
active
make rigid body object device pointer
organization
RgidBodyController (RBC) holds device pointers, manages force evaluation and integration
Opportunities for memory bandwidth savings
each block should (ideally) contain a compact set of density grid points
cache (automatically!?) potential grid lookups and coefficients!
each block should have same transformation matrices applied to each grid point?!
each block could have same inverse transformation matrix applied to each grid point
how well this peforms will depend on the number
but also simplifies reductions
new data structure for grids?
each grid contains blocks of data of a size optmized for the device
Each thread can operate on multiple data points if needed (any advantage?)
questions
Q: overhead of dynamic parallelism?
Where does it make sense to have a kernel call subkernels? A: At least: to sychronize blocks