We spend a lot of time on this a and it should be simpler. I'd guess the Java PDFBox library would be the way to go here and can in general be used for this. The R/tabulizer package includes PDFBox, so that might be a way to start exploring.
This wouldn't be part of the standard workflow. If doing it in R turns out to be too tedious, we could also go straight to Java, but that's less fun to run from the commandline...