How Does the Dispatcher Work?
I wanted to write about how PT2 does autograd, but that requires understanding eager autograd, which requires understanding the dispatcher. So let’s start there. Let’s pretend we’re building Torch. Let’s start from first principles with the problems we encounter and how to solve them. Problem 1: We want to be able to call operators for each backend. Solution: Polymorphism! We just define a class where we have every operator defined as a virtual method. Backends just implement every operator. class Torch: def mm(self, a: Tensor, b: Tensor) -> Tensor: ... def einsum(self, equation: str, *operands: Tensor) -> Tensor: ... ... Now I just need to implement Torch for each “real” backend (CPU, Cuda, …
Continue reading →