Okay, so I finally got around to playing with TRT and ViT. It was a bit of a mess at first, but I think I’ve got it sorted. Here’s how it went down:

First, I needed to get my environment set up. You know, the usual stuff. I installed the latest NVIDIA drivers, made sure I had a compatible CUDA version, and grabbed the TensorRT SDK. I already had Python and PyTorch ready to go.
Getting the Model
Next up, I needed a ViT model. I decided to go with a pre-trained one from the `transformers` library. I figured that would be the easiest way to get started. I just used `pip` to install it.
Conversion Time!
This is where things got interesting. I had to convert the PyTorch model to an ONNX format first. I found some example code online and tweaked it to fit my needs. it took some trail and error. Basically, I was just telling the script which model I wanted to convert and what input shape to expect.
Once I had the ONNX model, I used the `trtexec` tool that comes with TensorRT to convert it to a TensorRT engine. This part was also a bit tricky. I had to play around with the command-line arguments to find the best settings for my GPU. This also required a lot of trial and error, going back and changing and trying again.
Putting It All Together
After I had the TensorRT engine, I wrote a simple Python script to load it and run inference. This involved loading the engine, allocating some memory on the GPU, copying the input data to the GPU, running the inference, and then copying the output back to the CPU. Honestly, this was the most time-consuming part for me.
The Results
I was pretty impressed with the speedup I got from using TensorRT. It was noticeably faster than running the model directly in PyTorch. So, that was time well spent! I’ll be using TensorRT for ViT models from now on.
It was a journey, but a good one. I hope this helps anyone else trying to do the same thing!