Instead of using set up like the following:
let threadgroupSize = MTLSize(width: pipeline.threadExecutionWidth, height: pipeline.maxTotalThreadsPerThreadgroup / pipeline.threadExecutionWidth, depth: 1)
let threadgroups = MTLSize(width: (Int(imageSize.x) + threadgroupSize.width - 1) / threadgroupSize.width,
height: (Int(imageSize.y) + threadgroupSize.height - 1) / threadgroupSize.height,
depth: 1)
It may be more prudent to use fixed and known thread group sizes like:
let threadgroupSize = MTLSize(width: 16, height: 16, depth: 1)