Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm confused about the reason a block can't exist in both S3 and Glacier at the same time, if the deduplication code decides that block is needed in a new archive.

Why couldn't you simply have a rule that each file is either S3 or Glacier, and S3 lists-of-blocks can only reference other S3 blocks, while Glacier lists of blocks can only reference other Glacier blocks?

In the worst case, where every block was in both archives, this would only increase costs by 10% if Glacier costs a tenth what S3 costs.



What happens when you decide that you don't want that new archive to be in S3 any more and tell Tarsnap to migrate it over to Glacier?


To S3 it's like you just deleted the file; to Glacier it's like you just created the file.


Except that then we're trying to store two different blocks in Glacier with the same hash ID.


From that I assume that if a block's hash matches something that's already in the archives then you retrieve the archive block(s) with the same hash ID in order to verify that it is exactly the same (byte-for-byte)? And this wouldn't be possible with Glacier as you can't just retrieve the block from storage to check there and then.

Do you have any stats on the number of collisions you've seen?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: