Current implementation uses host-pinned memory. Experiments with using fine-grained device memory failed. Device code does not appear to see changes to memory by the host during the run of the kernel, despite fine-grained allocation being documented to support that.