Skip to content

[RFC] Get file hashing sum by faster algorithms like xxh #4290

@qiankehan

Description

@qiankehan

According to Use Fast Data Algorithms, md5sum is slow for the trusted data hashing. And the better alternative is xxh. Here are the results for a 21G image data hashing:

➜  ~ du -sh /var/lib/libvirt/images/win11.qcow2 
21G     /var/lib/libvirt/images/win11.qcow2
➜  ~ time md5sum /var/lib/libvirt/images/win11.qcow2
8c4fa4df69d95a8b18af68d46213cf5b  /var/lib/libvirt/images/win11.qcow2
md5sum /var/lib/libvirt/images/win11.qcow2  27.35s user 2.51s system 36% cpu 1:22.86 total
➜  ~ time xxh3sum /var/lib/libvirt/images/win11.qcow2
XXH3_b8b1976b55cd6f59  /var/lib/libvirt/images/win11.qcow2
xxh3sum /var/lib/libvirt/images/win11.qcow2  0.50s user 2.29s system 99% cpu 2.798 total

It will save much time for the hashing of big data.
To use xxh hash algorithm, the package xxhash of rpm or python should be installed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions