ROCm inference optimization engine: MIGraphX

April 6, 2024

This page records my notes about exploring MIGraphX, henceforth MGX.

Overview

When reading a ONNX (or other formats) model, MGX will try to parse and represent it with the abstraction. MGX will first try to finalize all operations in a program (i.e., all of its modules and all instructions in the modules), then invoke the compute function defined in each operation to evaluate the model.

Parse and compile an ONNX model

Compiler

Pass

When specifying a target (CPU, GPU or FPGA), MGX will obtain all its necessary passes via std::vector<pass> target::get_passes (e.g., from src/targets/gpu/target.cpp). It returns all registered and supported optimization passes based on environment variables and device types.

Finalize

After applying the optimization passes, the program will be finalized (determining what kernels to be used to achieve the semantic of operation). Meanwhile, the operator parameters inferred at compile time will also be dumped to the saved .mxr model file, e.g., the MIOpen solution ID obtained from compile time tuning.

Evaluate and run a compiled program

After creating a program, users can now call the eval function to run the model.

Quick ref: common data structures

Type erasure

Many data structures in MGX follows the type erasure design pattern to hide the type information and expose the unified interfaces to outside. Usually, the xxx struct contains the xxx_impl unique pointer with the real data member stored.

Common code snippets

template <class T, class F, F f> // NOLINT
using manage_ptr = std::unique_ptr<T, manage_deleter<F, f>>;
#define MIGRAPHX_MANAGE_PTR(T, F) \
    migraphx::manage_ptr<std::remove_pointer_t<T>, decltype(&F), &F>

Macro MIGRAPHX_MANAGE_PTR will return a std::unique_ptr with the destructor set.

Values

Shapes and arguments

struct MIGRAPHX_EXPORT shape
{
    shape(type_t t, std::vector<std::size_t> l);
    shape(type_t t, std::vector<std::size_t> l, std::vector<std::size_t> s);
    // for dynamic tensor shape
    struct MIGRAPHX_EXPORT dynamic_dimension
    {
        std::size_t min = 0;
        std::size_t max = 0;
        std::set<std::size_t> optimals{};
    };

    std::size_t elements() const;
    std::size_t bytes() const;
};

shape is described as dimension sizes and optional stride sizes. In additional to fixed shape tensors, MGX also supports shapes with dynamic dimensions (with min and max values alongside the dimension).

struct MIGRAPHX_EXPORT argument : raw_data<argument>
{
    template <class T>
    argument(shape s, T* d)
        : m_shape(std::move(s))
    {
        assign_buffer([d] { return reinterpret_cast<char*>(d); });
    }

    template <class T>
    argument(shape s, std::shared_ptr<T> d)
        : m_shape(std::move(s))
    {
        assign_buffer([d] { return reinterpret_cast<char*>(d.get()); });
    }

    private:
    void assign_buffer(std::function<char*()> d);
    struct data_t
    {
        std::function<char*()> get = nullptr;
        std::vector<data_t> sub = {};
        data_t share() const;
        static data_t from_args(const std::vector<argument>& args);
    };
    argument(const shape& s, const data_t& d);
    shape m_shape;
    data_t m_data{};
};

argument is the structure to place the substantial data described by shape.

Instructions and op

using instruction_ref = std::list<instruction>::iterator;
// members in an instruction
struct MIGRAPHX_EXPORT instruction
{
    operation op;
    shape result{};
    std::vector<instruction_ref> output;
    std::vector<instruction_ref> arguments;
    std::vector<module_ref> module_args;
    literal lit;
    bool normalized       = false;
    std::size_t target_id = 0;
};

Modules

struct MIGRAPHX_EXPORT module
{
    std::unique_ptr<module_impl> impl;
};

struct module_impl
{
    // A list is used to keep references to an instruction stable
    std::list<instruction> instructions;
    std::unordered_set<instruction*> instruction_set;
    std::string name;
    uint32_t nparams = 0;
    bool bypass      = false;
};

Programs

struct MIGRAPHX_EXPORT program
{
    std::unique_ptr<program_impl> impl;
};

struct program_impl
{
    // A map is used to keep references to modules of the program
    std::unordered_map<std::string, module> modules;
    std::vector<context> contexts;
    std::vector<target> targets;
};

Target and context

struct context
{
    context(std::size_t device_id = 0, std::size_t n = value_of(MIGRAPHX_NSTREAMS{}, 1))
        : current_device(std::make_shared<hip_device>(device_id, n)),
          begin_event(create_event()),
          finish_event(create_event())
    {
    }

    private:
    // TODO: Make this a vector to support multiple devices
    std::shared_ptr<hip_device> current_device;
    std::vector<shared<hip_event_ptr>> events;
    bool exhaustive_tune = false;
    bool measure_perf    = false;
    // for event perf timing
    shared<hip_event_ptr> start_event = nullptr;
    shared<hip_event_ptr> stop_event  = nullptr;
    // for stream syncronization
    shared<hip_event_ptr> begin_event  = nullptr;
    shared<hip_event_ptr> finish_event = nullptr;
    problem_cache pc{};
};

context is an abstraction for managing HIP device (GPU, stream and event).

struct MIGRAPHX_GPU_EXPORT target
{
    std::string name() const;
    std::vector<pass> get_passes(migraphx::context& gctx, const compile_options& options) const;
    migraphx::context get_context() const;
    argument copy_to(const argument& arg) const;
    argument copy_from(const argument& arg) const;
    argument allocate(const shape& s) const;
};

target rather defines the possible "actions" (both compile time and execution time) taken on the device.

Ranges

template <class Iterator>
struct iterator_range
{
    Iterator start;
    Iterator last;

    Iterator begin() const { return start; }

    Iterator end() const { return last; }
};

template <class Iterator, MIGRAPHX_REQUIRES(not std::is_integral<Iterator>{})>
iterator_range<Iterator> range(Iterator start, Iterator last)
{
    return {start, last};
}

inline iterator_range<iota_iterator> range(std::ptrdiff_t start, std::ptrdiff_t last)
{
    return {{start, {}}, {last, {}}};
}
inline iterator_range<iota_iterator> range(std::ptrdiff_t last) { return range(0, last); }

Use case: for (auto i : range(N)) {}, which will iterative i from 0 to N - 1.

  • https://zhuanlan.zhihu.com/p/682962732
  • https://blog.csdn.net/qianqing13579/article/details/124917730