-
Notifications
You must be signed in to change notification settings - Fork 18
Description
grudge has recently started picking up PoCL 7.1 when installing for the downstream tests in pytato, and some of the CI tests have since started running very slowly or hanging. Aside from the CI, I've also been able to reproduce this on my Macbook and on Tuolumne (CPUs; the GPUs there don't use PoCL). mirgecom's CI runs out of time when using 7.1 as well. When I revert to PoCL 6, all of these work fine.
From what I can tell it's only tests that use inverse_metric_derivative or inverse_surface_metric_derivative that encounter the issue (but not all of them do). I took one of the slow/hanging grudge tests (test_dt_utils::test_wave_dt_estimate with dim == 2 and degree == 2) and tried to create a minimal reproducer. Here's what I have so far: reproducer
Some observations:
- It seems related to calling these inverse metric functions multiple times; when I do that, device operations after it (like the
x = actx.from_numpy(...)) end up getting stuck (specifically, they get stuck somewhere inside_enqueue_write_bufferon the C++ side). If I add aself.queue.finish()insideactx.freeze()just afterpt_prg(...), then it gets stuck there instead. - In the reproducer, if I freeze the calls separately, it runs fast.
- Most of the tests that get stuck sit there seemingly forever, but the reproducer does eventually finish in 5-10 minutes. And on subsequent runs it's fast until I blow the PoCL cache away.
It sounds like maybe something is slow in the kernel compilation? Though I don't know if that happens inside pt_prg(...) or before.
Here is the generated loopy kernel: code. (I would include the generated OpenCL too if I remembered how to retrieve that...)
I'm not really sure where to go from here. Any ideas @inducer?