I did trained some research models using the existing PyTorch/XLA on TPUs, and it was a mess of undocumented behavior and bugs (silently hanging after 8 hours of training!).
If anyone is trying to use PyTorch on TPU before TorchTPU is released, you can check out the training pipeline that I ended up building to support my research: https://github.com/aklein4/easy-torch-tpu
Looks like a new backend, similar to MPS for Apple Silicon?
This is great to see.
I did trained some research models using the existing PyTorch/XLA on TPUs, and it was a mess of undocumented behavior and bugs (silently hanging after 8 hours of training!).
If anyone is trying to use PyTorch on TPU before TorchTPU is released, you can check out the training pipeline that I ended up building to support my research: https://github.com/aklein4/easy-torch-tpu
Sounds good, but my main question is: is this a fork, or a new backend they're building in (like MPS)?