Opened 16 years ago

Closed 16 years ago

#887 closed Defect (fixed)

Serious sample_bitwise_validator issues with binary files

Reported by: Nicolas Owned by: davea
Priority: Critical Milestone: Undetermined
Component: Server - Validator Version: 6.6.20
Keywords: Cc:

Description

sample_bitwise_validator works by calculating MD5 hashes of the files to be compared, then comparing the hashes, not the file data itself.

To calculate the hash, it loads the whole file in memory, calls md5_string on the data, then deletes the data.

The read_file_string function it uses to read the file needs file_size*2 RAM and truncates the data at the first null byte. See #886 for that problem.

The result is that if two files are overall different, but identical in the data before first null byte, the validator will say they match. I tested, for example, that it considers all object files (.o) and all linked binaries of the server code as identical. I tested it on six zip archives with completely different contents, and all except one were marked as valid to each other, which is perhaps more worrying (because projects actually use those).

While fixing #886 will also fix the binary file issue here, it will still load the whole file in memory. I attach a patch that uses md5_file instead, making it have O(1) memory usage, and fixing this problem independently of #886.

Attachments (1)

zip-run.html (3.3 KB) - added by Nicolas 16 years ago.
Test run of bitwise validator (before this fix), comparing five zip files. OK means files matched.

Download all attachments as: .zip

Change History (4)

Changed 16 years ago by Nicolas

Attachment: zip-run.html added

Test run of bitwise validator (before this fix), comparing five zip files. OK means files matched.

comment:2 Changed 16 years ago by romw

Owner: changed from Bruce Allen to davea

comment:3 Changed 16 years ago by davea

Resolution: fixed
Status: newclosed

(In [17966]) - sample bitwise validator: make it work for binary files

fixes #886, #887

Note: See TracTickets for help on using tickets.