rhdiff is a simple implementation of input comparison tool based on rolling hash algorithm
The library uses the rsync algorithm to compare chunks of two files in rolling manner.
Given the delta between two input src and dst needs to be calculated, differ will do the following:
- Split
srcinto non-overlapping chunks of sizeS - Calculate weak checksum (adler32) and strong checksum (sha256) for each chunk
-
CalculateDeltafunction iterates over each overlapping chunk indstusing window of sizeS -
Create a map from
srcwith a weak checksum of each chunk as a key -
For each overlapping chunk consecutively check checksums and add moving blocks into the change list
- Calculate weak checksum of the current window
- If weak checksum is in the map, calculate strong checksum of the current window
- If strong and weak checksums match, the chunk in
dstis the same as insrc - Remove the key-value pair of fully matched from the map
- If offset of src chunk and offset of dst chunk match, the chunk wasn't moved
- If offset of src chunk and offset of dst chunk don't match, the chunk was moved
-
Write bytes between matching chunks into a buffer and dump the changes from the buffer as soon as a next matching block is found
-
Finally, iterate over remaining keys in the
srcmap and collect deleted chunks
Basic comparison of two strings split into one byte chunks
package main
import (
"bytes"
"fmt"
"rhdiff/pkg/differ"
)
const chunkSizeBytes = 3
func main() {
src := bytes.NewReader([]byte("abcxyzfoo"))
dst := bytes.NewReader([]byte("abc12xyzfo"))
srcChunks := differ.Split(src, chunkSizeBytes)
changes := differ.CalculateDelta(srcChunks, dst, chunkSizeBytes)
for _, change := range changes {
fmt.Printf(
"operation: %s, srcOffset %d; dstOffset %d; data: %s\n",
change.Operation,
change.SrcOffset,
change.DstOffset,
string(change.Data),
)
}
}Output
$ go run examples/basic.go
operation: EQUAL, srcOffset 0; dstOffset 0; data: abc
operation: MOVE, srcOffset 3; dstOffset 5; data: xyz
operation: ADD, srcOffset 0; dstOffset 3; data: 12
operation: ADD, srcOffset 0; dstOffset 8; data: fo
operation: DELETE, srcOffset 6; dstOffset 0; data: foo$ go test ./... -cover