The Tool
I packaged the profiler into something anyone can use. It's as simple as adding 3 lines to your training code.
Run training for 100 steps and it tells you:
- Where time is going
- What's slow
- How to fix it
It's on GitHub now: gpu-training-profiler
If you're training on multiple GPUs and haven't checked for issues like this, give it a shot. Takes about 5 minutes to run.
It's open source so PRs welcome, issues welcome, feedback welcome. I'm still figuring this stuff out.
And if you find it useful (or useless), let me know. That's how I learn.
No comments:
Post a Comment