Skip to content
Snippets Groups Projects

GPU backend

Merged prathi3 requested to merge gpu-cg into main

Adds GPU backend and "CUDA" feature to all the tests. As of MR creation, there were no forks in tested IRs so only vanilla non-parallel (single block, single thread) codegen has been tested.

Merge request reports

Pipeline #201302 passed

Pipeline passed for 386d6159 on gpu-cg

Approval is optional

Merged by rarbore2rarbore2 9 months ago (Jan 31, 2025 11:54pm UTC)

Merge details

  • Changes merged into main with 1c25c5c8 (commits were squashed).
  • Did not delete the source branch.
  • Auto-merge enabled

Pipeline #201303 passed

Pipeline passed for 1c25c5c8 on main

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • rarbore2
  • rarbore2
  • rarbore2
  • rarbore2
  • rarbore2
  • rarbore2
  • rarbore2
  • rarbore2
  • rarbore2
  • rarbore2
  • rarbore2
  • rarbore2
  • rarbore2
  • rarbore2
  • rarbore2
  • rarbore2
  • 672 }
    673 }
    674
    675 /*
    676 * This analysis determines the parallelization strategy within threadblocks.
    677 * We run post-order traversal on the fork tree to get the thread quota per
    678 * subtree. In particular, each fork starts with a base factor as the
    679 * maximum over its descendants (leafs have base 1). We traverse up (details
    680 * in helper) and pass the factor and a map from fork node to a tuple of
    681 * (max quota of its siblings (including itself), its quota, its fork factor)
    682 * from each node to its parents. The parent then compares
    683 * - all three are needed for codegen. A node is in the map IFF it will be parallelized.
    684 * If not, the fork will use the parent's quota and serialize over the Fork's
    685 * ThreadIDs. Nodes may be removed from the map when traversing up the tree
    686 * due to an ancestor having a larger factor that conflicts.
    687 */
    • This kind of fork analysis seems like it'd be extremely useful in general, potentially for other backends for example. I think we should discuss how we could abstract this out into something more general.

    • Please register or sign in to reply
  • rarbore2
  • rarbore2
  • rarbore2
  • prathi3 added 1 commit

    added 1 commit

    Compare with previous version

  • prathi3 added 1 commit

    added 1 commit

    Compare with previous version

  • prathi3 added 1 commit

    added 1 commit

    Compare with previous version

  • prathi3 added 1 commit

    added 1 commit

    Compare with previous version

  • prathi3 added 1 commit

    added 1 commit

    Compare with previous version

  • prathi3 added 1 commit

    added 1 commit

    Compare with previous version

  • prathi3 added 1 commit

    added 1 commit

    Compare with previous version

  • prathi3 added 1 commit

    added 1 commit

    Compare with previous version

  • prathi3 added 23 commits

    added 23 commits

    Compare with previous version

  • prathi3 added 1 commit

    added 1 commit

    Compare with previous version

  • prathi3 added 1 commit

    added 1 commit

    Compare with previous version

  • prathi3 added 1 commit

    added 1 commit

    Compare with previous version

  • prathi3 added 1 commit

    added 1 commit

    Compare with previous version

  • prathi3 added 16 commits

    added 16 commits

    Compare with previous version

  • prathi3 added 1 commit

    added 1 commit

    • d4a8a948 - not fixed yet but switching machines

    Compare with previous version

  • prathi3 added 1 commit

    added 1 commit

    • 948fe3b9 - before get exposed by forkify

    Compare with previous version

  • prathi3 added 21 commits

    added 21 commits

    Compare with previous version

  • prathi3 added 1 commit
  • prathi3 added 1 commit

    added 1 commit

    Compare with previous version

  • rarbore2
  • 1 use std::collections::{HashMap, HashSet};
    2
    3 use crate::*;
    4
    5 /*
    6 * Construct a map from fork node to all control nodes (including itself) satisfying:
    7 * a) domination by F
    8 * b) no domination by F's join
    9 * c) no domination by any other fork that's also dominated by F, where we do count self-domination
    • Ah, I see that this is actually the wrong condition. The condition should be post-dominated by the join, and I see that this needs to be fixed in the fork-join nesting as well. I'll fix this later. My bad.

      Edited by rarbore2
    • Please register or sign in to reply
  • rarbore2
  • rarbore2 added 1 commit

    added 1 commit

    Compare with previous version

  • rarbore2 added 1 commit

    added 1 commit

    Compare with previous version

  • rarbore2 added 1 commit

    added 1 commit

    • 386d6159 - make cava call medianMatrix again

    Compare with previous version

  • rarbore2 marked this merge request as ready

    marked this merge request as ready

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading