MD5 vs. SFV
Moderators: Big-O Ryan, Big-O Mark
MD5 vs. SFV
I see that MD5 has a much larger checksum value. Does anyone know what the probability is that a large file can have a byte modified and have SFV not detect it? How does the SFV probablity compare to MD5?
- Big-O Ryan
- Developer
- Posts: 612
- Joined: Fri Oct 19, 2001 11:00 pm
- Location: Big-O Software
- Contact:
I'm not sure if the probability for reaching a false positive file identification is really going to tell you much.. Even for crc-32 (sfv), the probability of failure is very, very small. For md5 it is obviously much, much smaller.
Neither algorithms are perfect - the simple fact of the matter is that there are only 2^32 crc-32 codes, and that means it can only accurately (uniquely) hash 2^32 different files. MD5 can accurately assign unique hashes to 2^128 files. Both of these statements are assuming these algorithms work ideally, which they do not (SHA-1 is now considered more secure than md5). Obviously your chances are substantially better with MD5 - but your success with either will be pretty good.
Depending on your use, you may eventually encounter two files with the same crc-32 code - it is very unlikely that you will do so with MD5. In fact, many websites use MD5 codes as unique identifiers for images/documents - if they were come across two files with the same md5, this system would fail, but it seems to hold up well.
One of the saving graces of algorithms like these, when used for file verification, is that some files have specific characteristics that define a 'valid' file, in addition to its md5/crc hash. So, while your chances of changing a byte and having a file result in the same md5 are very, very low, the chances of you being able to change a byte of a zip file, and have it both result in the same md5, and still be a valid zip (or rar, or exe) file are astronomically low. Unfortunately, most files do not have validity checks like this (mp3, mpg, jpg, etc - do not).
From a security standpoint, crc-32 isn't really considered, and thus I don't see much research information on it, but you won't have too much trouble digging up research on MD5..
Neither algorithms are perfect - the simple fact of the matter is that there are only 2^32 crc-32 codes, and that means it can only accurately (uniquely) hash 2^32 different files. MD5 can accurately assign unique hashes to 2^128 files. Both of these statements are assuming these algorithms work ideally, which they do not (SHA-1 is now considered more secure than md5). Obviously your chances are substantially better with MD5 - but your success with either will be pretty good.
Depending on your use, you may eventually encounter two files with the same crc-32 code - it is very unlikely that you will do so with MD5. In fact, many websites use MD5 codes as unique identifiers for images/documents - if they were come across two files with the same md5, this system would fail, but it seems to hold up well.
One of the saving graces of algorithms like these, when used for file verification, is that some files have specific characteristics that define a 'valid' file, in addition to its md5/crc hash. So, while your chances of changing a byte and having a file result in the same md5 are very, very low, the chances of you being able to change a byte of a zip file, and have it both result in the same md5, and still be a valid zip (or rar, or exe) file are astronomically low. Unfortunately, most files do not have validity checks like this (mp3, mpg, jpg, etc - do not).
From a security standpoint, crc-32 isn't really considered, and thus I don't see much research information on it, but you won't have too much trouble digging up research on MD5..
-Ryan
Big-O Software
Big-O Software
Thanks for the informative response. My reason for asking is I have recently started added checksum files to my DVD-R and CD-R disks. I don't need the checksums to be unique, just accurate as possible in reporting a defective file.
Creating MD5 checksums doesn't seem to take any additional time, so I started going that way instead of SFV. From your response it seem clear either method will be adequate for my needs, but why not use the more accurate method if it doesn't take more time.
Creating MD5 checksums doesn't seem to take any additional time, so I started going that way instead of SFV. From your response it seem clear either method will be adequate for my needs, but why not use the more accurate method if it doesn't take more time.
You can use FastSum from www.fastsum.comQuantum wrote:Thanks for the informative response. My reason for asking is I have recently started added checksum files to my DVD-R and CD-R disks.
i'm glad
well i'm glad he said that cuz i was looking for a decent .sfv and .md5 file checker and i tried that one and it's really nice... if anyone has any other suggestions for file checkers that support .sfv and .md5 files and recursive directory support during creation and verification, it'd be nice to hear about it...
Who is online
Users browsing this forum: No registered users and 0 guests