Summary of Paper: RSync Algorithm

 Rsync Algorithm: TR-CS-96-05.dvi (cmu.edu)

RSync is a utility which is used to transfer files or folders from computer to another in unix based systems. The key advantage of Rsync is once the full data is transferred to destination, only the changed bytes are transferred next time onwards, saving time and network bandwidth.

The above mentioned paper describes the algorithm used to find the changed bytes and how to file data is transferred from source and recreated at destination.

Here is the summary.

Instead of transferring complete file, divide the file into chunks of bytes and calculate its hashes. First time all the bytes are transferred to the destination and then for each chunk of bytes a weak rolling checksum (inspired by Adler-32 Checksum) and strong checksum of each block of bytes is calculated and shared with the source.

At source, the combination of rolling hash and strong hash are used to find the indexes of chunks of bytes that are same as some block of bytes destination. This way the bytes are different in source also can be found. Then source sends to destination those bytes that are different, the index of the previous block of bytes which were matching at both source and destination and the index of the current chunk of bytes before which the new data has to be inserted. At destination the file is recreated using the received bytes.

Only the chunks which are changed are transferred in this scheme.

No comments: