Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Physical books wear out in decades to centuries, computers break in years to decades. Also digital formats are constantly evolving and backwards compatibility for old formats is removed. Even without publisher/legal issues it's clear that preservation of digital media for even the next 10 years, and especially for the next 100 or 1000, is at risk.

But I really wonder why is there no other major service like the Internet Archive? The way to preserve media is to distribute it and store it in many places, not just one central location. Does the Internet Archive store redundant copies in many locations? Does it use long-term physical formats?

Also, figuring out what information is "meaningful" and which is completely useless or redundant is extremely important. Because we want to store everything with the potential to be important in the future, but we also have an incredible amount of data (hence one of the reasons why IA is the only major archiver). It may be good to strategize data collection, e.g. a shallow list of all sources and a deeper list of popular sources, a sample of various sources from every region, etc



> But I really wonder why is there no other major service like the Internet Archive?

Copyright, of course. The fact the Internet Archive hasn't been sued out of existence yet never ceases to amaze me.


> The fact the Internet Archive hasn't been sued out of existence yet never ceases to amaze me.

Because it's a library. If you happen to be lucky enough to go there and meet the people running it, it's very clear they are librarians doing what librarians do.


While I cherish the sentiment I very much doubt it's shared by all content right holders and copyright lawyers. And it takes only a handful to disagree.


I think they put reasonable barriers to piracy. They are of course possible to circumvent, as every other barrier in existence, but you can also scan or make photocopies of a physical book. And this is actually a good thing. Mildly related: the other they I was watching something on Netflix on my phone and found something funny so I tried to take an screenshot to share with my friends, which quite frankly I think is fair use. It didn't work because it seems that although I paid for my phone, it's not really my phone.


Scanning and making copies of portions of a book used to be encouraged. In fact, it's how you collected information for book reports and research. It wasn't uncommon, for example, for me to print out entire journal articles or scan entire chapters at the University library. Now, they'd probably shoot my dog.

I tried prime video once and the way it locked down my computer to guarantee so-called "rights" was stupid. I tried Netflix, same thing. I'm slowly but surely trending back to unabashed piracy because it's the only way I can own anything anymore - and that's sad. I will happily spend hard earned money as long as I can own something. At the very least I'd happily donate to my local library if they didn't spend time enforcing the will of some faceless multinational publisher.

Shockingly, as I've gotten older I seem to own less and every year the apes we elect take more of my rights away.


I was shocked to find that this wouldn't even work on my desktop! You have to go to the browser settings and turn off hardware acceleration. If you rent an HD movie from Prime Video, you get a notice saying that your screen has to be compliant with some specific standard, and part of the standard is essentially baked-in DRM that prevents taking screenshots of the content. The device belongs to me, the content was streamed to me, the device has no business preventing me from taking a screenshot or recording video. Yet here we are, apparently DRM is baked into all of our screens now.


"Why Rosyna Can't Take A Movie Screenshot" :

https://web.archive.org/web/20170617174641/https://www.alexr...

And that's why it's important to boycott devices and software that allow DRM : Intel and Ryzen AMD CPUs, Windows 11 (requires a DRM chip), maybe Chrome/ium if it doesn't allow to easily disable DRM support, Steam...


Boycotton all recent x86 CPUs is probably not reasonable for most people. More actionable is to make sure you control the software (most importantly, the OS) that runs on those CPUs so that their user-hostile features are not used for DRM.


You're right, not for most people. But we aren't most people here, and can stick with older AMD for desktops, as well as should be heavily investigating the alternatives like RISC-V (and also the "true" Linux, non-Android smartphone alternatives for somewhat related reasons).


> Yet here we are, apparently DRM is baked into all of our screens now.

This situation sucks. It's as if movie and tv studios (including Netflix, apparently) said "we want you to implement a bunch of intrusive, complicated, wasteful technology that will punish paying users while having little to no effect on pirates" and all the tech companies said "great idea, let's do it!"

I suppose the (self-correcting) endgame is when DRM makes paid media so inconvenient that it drives the masses to piracy.


It's already happening. You cannot get better than 720p on most services on linux, while pirated content is available in 4k


It is baked into Windows since Vista. They had to add it in order to license HDMI and this has led to the need to digitally sign drivers - they had to be able to revoke your driver if it turns out that you were doing something the movie and record industry didn't want you to do.


Just wait till they make impossible to take a photo with your phone of your desktop screen.


Like they already made it impossible to make color photocopies of some content? https://en.wikipedia.org/wiki/EURion_constellation


That's why I encourage everyone with a desktop system to run their HDMI outputs through a capture card as well as the monitors.


You will own nothing and be happy.


Those user hostile features are what drive me to piracy. No way I'll pay money to have the privilege of installing spyware just so I can stream low-quality video to whatever device the provider deems worthy at that particular moment.

I'll just go on 1337x and in 2 minutes I have an high quality mkv that I can play anywhere I want. I'd happily pay 5-10$ a pop for this, and for indie movies I do make the effort of seeing if there's a way of paying to watch it somewhere, even if I just pay and don't watch it there, and watch the pirated rip anyway.


What a way to justify a crime. "I don't like how they act, so I don't comply with their rules".

You pirate because you have no respect for law, because you are a entitled child or because you don't have enough money. Piracy isn't plain theft, but it's consuming a service without paying for it to its legimate owners. It's close, isn't it?

Would you sneak into a amusement park? Not pay for dinner if you found it not to your liking or if the brand don't meet your standards?

The reason you think piracy is ok is because you don't associate the crime with the victim.


No, it isn't close. The reason none of those things are comparable is because the service provider has lost something in the transaction in your example. Sneaking into an amusement park is theft of services, not paying for dinner after you consumed it is plain old theft. In both of those cases, the seller's resources were consumed. They have lost resources or effort.

With digital items it isn't like that. I can make infinite copies at no harm, real or imagined, to the seller. What might make this more clear is the distinction between downloading a movie illicitly versus watching a friend's legit purchased copy with them. In both of these cases, the maker of the content was not compensated for a watch.

The reason I think piracy (defined as personal copyright infringement) is okay is because:

1. It is generally a victimless "crime" (and even this is a misnomer, the act of downloading a movie is a civil, not criminal matter. It has more in common with driving over the speed limit than it does stealing.), and as so is morally neutral.

2. Statistics show that habitual pirates tend to be habitual purchasers as well, further complicating any concept of harm.

3. Copyright is a government granted monopoly, not some natural right, and its continued abuse and deadweight on our culture is not something that should be respected.

4. Even if the previous 3 points were all invalid, why should I have any respect for an entity that has none for me? Slavish adherence to the law is mere obedience, it is not noble or useful on its own.


You realize the endgame for the MPAA is for all TVs to have mandatory cameras with face id to identify the people watching and automatically charge each person per view.

There already are unenforced (currently unenforceable) maximums on the number of friends you are allowed to enjoy something together with. Beyond that you would need a broadcast license.

There is no limit for the greed of the copyright cartel.


Wow, someone copied something. Better send out the battleships to blockade their country and bring them to their knees for this unspeakable crime against humanity.

You're damn right I have no respect for copyright law. I want to see it abolished. As far as I'm concerned, copyright infringement is civil disobedience and a moral imperative. These monopolists don't have any respect for our rights either: they systematically rob us of our fair use and public domain rights. So why should anyone give a shit about some monopolist's imaginary property?


The reason people think piracy is ok is because there is no victim being deprived of property.

But more than that, you're not even technically correct. piracy is copyright infringement, which is not criminal.


> The reason you think piracy is ok is because you don't associate the crime with the victim.

piracy is a victimless crime. you were not going to buy the license anyway, so its not a lost digital sale for the creator. additionally digital copies cost exactly nothing to make so nobody lost anything in the pirated transaction.


The reason you think that it is a crime is that you associate legality with morality. However there are deeply cruel and immoral things occurring entirely within the established "system" (as there have been all throughout human history), so that's a very poor compass indeed.


I would just as easily pirate movies as much as I would buy an xbox controller off amazon, replace it with a broken controller, and send it back saying it doesn't work.

These massive companies don't owe you anything. These companies exist purely to extract as much wealth out of you as possible while convincing you that they're the only ones that can provide that service.

It's so convenient that I can pirate videogames, movies, tv shows. These companies want you to keep paying for this mass produced content controlled and influenced by money and what "sells" based on algorithms, social media influence... These companies control and influence you (read: consumers/society) more than we think.

I'd rather see these companies go bankrupt and blown up into a billion different smaller companies than have massive conglomerates control and influence what we see.

If we were to see more originality in the higher-value sections of markets (think indie developers/filmmakers vs companies like EA, TenCent, etc.), then I'd probably pirate less things.

The hyper-optimization of art and culture by these massive companies designed solely to make profit (and the population willing to throw their cash at it for toys, theme park rides, etc.) is morally repugnant to me.


> but it's consuming a service without paying for it to its legimate owners

The legitimate owners of information are all of us. Copyright is a special monopoly only granted to authors only because it was thought to benefit society as a whole.


To unlock that feature you have to buy a second phone.


> It didn't work because it seems that although I paid for my phone, it's not really my phone.

Root it.


Not practical if you want to use the phone for payments and identity services because a lot of those will refuse to work on a rooted device.


I was thinking of doing that but then I don't think you can use netflix at all or if you make your way around that your banking app might not work etc


Hardware remote attestation will end that soon. Every app will refuse to run on modified devices and it will be impossible to circumvent because it's in hardware.


How do you know the phone doesn't have multiple levels of root and you're only breaking the topmost one?


Not an option.

Rooting the device voids the warranty.

It's not a reasonable thing to ask the average person.


> Rooting the device voids the warranty.

I’m pretty sure it would depend on the country you’re in . Even if that’s actually the case in the US


Even if it doesn’t void the warranty, it voids SafetyNet, so now Netflix and other apps might refuse to work.

And even if it didn’t, it increases the attack surface. Random apps can ask for root, and the average user won’t know what it means.

And even if you know the risks, now it’s much more of a hassle keeping your device up-to-date.

So yes, rooting is theoretically a solution, but it’s not practical in the least.


If Netflix refuses to offer their services on a device that you own, you're always free not to pay netflix and instead take your money and attention elsewhere.


Just remember that that doesn't mean piracy.


If there's no ethical company left to purchase goods from, then there's no ethical issue with piracy.


It doesn't if you're using Magisk - which is the case 99% of the time. Magisk has been updated and passes SafetyNet under Android 12 too - no other means of rooting a modern phone is safe anyway.


So now you need to stay up-to-date with the newest advances in rooting (which I clearly wasn't).

It's nice if this is a hobby (and I have rooted my phones in the past), but for the average user (even the average technical user) who just wants to get on with their day without becoming an expert in rooting, it's a complete no-go.

Importantly, it's a very poor substitute for regulations mandating that manufacturers offer us these options without having to go through back doors. I believe that any energy spent on technical workarounds to manufacturer locks is better spent on political activism to enable true device ownership through regulation.


Is there a root method that falsely reports that SafetyNet is still intact?


And that's the problem with "owning your device entirely".

Most people are not capable (or willing) to own their device to its full extent and take responsibility for it.


The options are not limited to "A sanctified centralized authority controls and limits all my devices" and "I need to recompile the kernel by hand".

For most people "owning their devices" means being allowed to meaningfully change OS (or even just modify it) without losing major functionalities.


Not having a warranty is, frankly, also a part of it being really your phone.

"I want to be allowed to do whatever I want, but if I do something I regret, I get a refund" is not a coherent position to take.


Bullshit, warranties is the way to keep manufacturers honest : if it only has minimum legal manufacturer warranty, you know it's crap.

And it should be illegal to void warranty for rooting.


My understanding is that (precedent aside) in the US the law clearly says that the manufacturer has to prove that you managed the product under warranty.

That is those "warranty void" stickers are not enforceable (under that specific law) by themselves, they need to argue that the user actually damaged the product.


It is yourphone. As such, it has drm capable hardware, which let's you stream Netflix.

Think of the alternative: you don't have protection. Who would license their media to Netflix?


Who wouldn't license their media to Netflix in a world without DRM? What are they going to do? Make a $100 million dollar movie and then only release it in theaters, just out of spite? That'd be leaving a lot of money on the table.


> Because it's a library

Unfortunately digital libraries seem to be stuck with ebooks that often carry greater restrictions than physical books while costing more.



The latest On The Media episode did a segment on this lawsuit as well, starts at 21 mins.

https://www.wnycstudios.org/podcasts/otm/episodes/on-the-med...


> Also digital formats are constantly evolving and backwards compatibility for old formats is removed.

While that was certainly true at one time, Unicode text, HTML, XML, and ZIP files aren't going anywhere (or there'll be a collapse of society so thorough that print books are unlikely to survive, either).

EPUB books are basically XHTML and XML zipped up along with any external assets (e.g., images). There are nuances, but a DRM-free EPUB book should be readable indefinitely. You can check this out by unzipping the EPUB and looking at the files inside with a web browser (if you're doing it from a GUI, you will probably have to change the file extension to .zip first, so it unzips rather than opens in your ebook reader software).


HTML isn't immune from broken backwards compatibility.

The frame element has been completely removed, the hgroup element is gone, and so is the dir element. acronym is deprecated in favour of abbr. isindex, plaintext, xmp, and listing are all dead.

The attributes border, clear, background and bgcolor have all been removed by HTML, and shifted to be CSS' responsibility instead.

Just moving between EPUB 2 and EPUB 3, you lose the DAISY format support, and external resources. (EPUB2 let you use full URLs to specify parts hosted externally, like webpages, but EPUB3 requires itself to be self-contained. Not a bad change, but still a breaking change.) NCX replaces just using a HTML5 nav element, and a few more things.

All of those things mean that there _are_ technical documents that exist, that _aren't_ readable without some effort to update them.


Yes, but all of those deprecated elements can still be rendered just fine by some of the current browsers. And even if, say, you couldn't display an index page with frameset, then the stuff inside the frames is just plain old html files anyway, which you can display easily. bgcolor, etc. all still render fine in current browsers as well.

The main thing is that information which is important to people will get converted to newer formats over time, just as old print books get reprinted if they're popular.


> The main thing is that information which is important to people will get converted to newer formats over time, just as old print books get reprinted if they're popular.

Vs. decent-quality old "dead tree" books, properly stored, can be ignored for centuries and still be perfectly fine.

You might want to ask a good historian or librarian about all the incredibly important (historically, to us, now) documents which we know existed, but we do not have, because there was no continuously-operated, high-budget "Holy Brothers of Document Preservation" monastery doing all the re-copy work needed to preserve them. (And maintain off-site backups in case of fire at the Monastery, and ...)

(If you aren't familiar - most of the materials used for documents in ancient times degrade fairly quickly. Unless (say) carefully stashed in a nice, dry cave in an arid climate. And even that stuff tends to be "crumble if you touch it" fragile.)


> (And maintain off-site backups in case of fire at the Monastery, and ...)

https://en.wikipedia.org/wiki/Yongle_Encyclopedia

:(

For every important historical work we have, there are several more, at least as important, that we know we're missing. And presumably even more that we've never heard of.


Oral history. And before someone says anything, Haudenosaunee Thanksgiving address, the 4 day one. Done every year. Still.


Neither is "LaTeX"; but, like Latex and other text-based formats, the content of the document is, in general, human-readable when it's read, unlike, for example, jpeg images.


> The frame element has been completely removed, the hgroup element is gone, and so is the dir element. acronym is deprecated in favour of abbr. isindex, plaintext, xmp, and listing are all dead.

Those are all basically cosmetic/layout related, though. The text itself is still perfectly readable.



But it was removed in 2013 [0]. So depending on age and strictness of your rendering engine, you don't have a guarantee for how it'll be handled.

[0] https://lists.w3.org/Archives/Public/public-html-admin/2013A...


The day <marquee> goes from deprecated to removed will be a very sad one :(


None of that is unreadable. You can always download a browser from a few years back and open the page with that. So while there may be some inconvenience with old information, there is no loss of content.


Until you can't. You don't need to download and install anything to read a book from 50 years ago.


> HTML, XML, and ZIP files aren't going anywhere (or there'll be a collapse of society so thorough that print books are unlikely to survive, either)

I very strongly disagree. The amount of technical overhead required to render a simple zipped XML from some storage medium is immense. Even if you store this on tape, you need a tape machine to read the bits. If you manage to pull this off, you won't get anything useful, because the XML is zipped. Deciphering a ZIP and translating it into bits which only make sense if you know that they represent binary numbers and must be read according to an obscure translation table (UTF-8) which is stored also digitally, someplace else, is more difficult than deciphering the enigma code, the hyroglyphs, and the Inca knot language combined. And even if you manage to pull off this unbelievable feat, you still require a parser, a renderer, a display or a printer to even get close to the possibility for a human to read it. For all of that, you not only need electricity, but the devices required are usually easily broken, and manufacturing them requires knowledge and a technical overhead (and also an amount of electricity) orders of magnitude larger than simply using them. Not to mention all the raw materials required to produce a display, a CPU, etc, which may simply be unavailable to future generations.

Now imagine humankind after a complete societal collapse. Tribe A finds, in some basement that survived the apocalypse, a collection of basic medical books. Tribe B finds, in another basement, an unlabeled hard disk containing the entire Wikipedia, as a zipped XML file. Which tribe is more likely to survive?


Yeah, I'm a bit worried about UTF-8 : it might be impractical to read with 8-bit only CPUs ?

http://collapseos.org/

At least the ASCII part should still be easily readable ? But not much of a relief for non-latin alphabet languages...


I would argue that we shouldn't care about something in the extremely distant future, and what would be the second worst case scenario (the worst one being human extinction).

There's no way to really plan for it (prepping is a joke) and it slows down progress if we really do it.


> HTML, XML, and ZIP files aren't going anywhere

I wouldn’t be surprised if some of those go out of use in a few hundred years, if not much earlier. There will probably be major conversion efforts going on at some point. The alternative, the above formats being maintained for eternity, would be somehow depressing.


I looked up those formats and they are all around 30 years old, which is insane for technology and around when Windows 3.1 and Mac OS 7 were released. So maybe they will stay...

They are all-but deprecated though: HTML 1.0 is much different from HTML5 (2014); XML is upgraded to XML 1.1 (2004) and most people use JSON nowadays; and ZIP is very inferior to gzip and 7-zip. We're lucky web browsers are strict with never remove compatibility for old sites, and ZIP is still used by macOS instead of .tar.gz for some reason.


Yes, but HTML 1.0 renders perfectly in any web browser in existence. That's the amazing part. Browsers are designed to fall back to render older versions of standards.


HTML1 [1] isn’t that different, it’s mostly just a much smaller subset.

[1] https://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt


Gzip is not a container format. Pkzip is far more flexible.


You will also happy to read about this: https://en.wikipedia.org/wiki/Lindy_effect

Once something is mainstream and non-perishable, odds of it going away go down and it's foreseeable that it will be around for at least as long as it's already been around.

This effect is super strong in tech, for base technologies, with lots of integrations, especially around enterprise/governmental systems (which don't get changed unless someone makes a strong business case why the change makes more money) and consumer hardware (which tends to linger for aeons since it's so distributed).


Back to vim or emacs then!


>and ZIP is still used by macOS instead of .tar.gz for some reason.

Also, several other tiny insignificant Operating Systems like Android(/Linux) [apk files] and Windows [builtin explorer support]. Let's also not forget Java JAR files.


The thing with first-mover formats (SGML and derivatives such as XML and HTML) is that they tend to stay around a long time. Where should an all-new doc format come from in this day and age anyway? The ecosystem of multiple parties adopting a format that led to XML and related standards has largely been abandoned by FAANG.

However, the threat of fubaring web media doesn't come from markup (HTML, XML, SGML) but from the complexity of CSS having gone rogue, and JS.


> However, the threat of fubaring web media doesn't come from markup (HTML, XML, SGML) but from the complexity of CSS having gone rogue, and JS.

Or DRM. HTML still being used isn't going to help you when its wrapped up in an encrypted blob. We're not there yet (except for video) but you can bet there are plenty of actors who will jump at the chance.


It would be a bit tragic if HTML goes out of use, but JavaScript continues XD


There are already websites that intentionally hide everything until JS loads even though all the content is still in the HTML. Not to mention sites where the HTML is just a stub to load JS.


Literally ANY digital format can be converted to a modern version fully automatically once the need arises. You need to solve the problem once, build a tool, and from that moment on you can convert as many files as you need with a single click. This is way cheaper and easier than dealing with paper rot and ink blended by acids from air and paper.


This is overly optimistic. Even just for videos you are going to have to accept either additional quality loss or size bloat when converting older obscure codecs to something your average modern device can play.


What you're saying suggests that those old codecs were superior to new ones (offering better quality at smaller sizes), which is not a case for most of formats. Also the "size bloat" with data from say 40 years ago means little today as space has become so much cheaper. If you have to expand your data from say 70kb to 120kb, which 40 years ago would be a huge deal, today it's something no one will even bother to optimize. And anyway, discussion was not about space economy, but about data being permanently lost, unable to be accessed ever again - which unless the medium is physically damaged is very unlikely scenario. There are always people willing to reverse engineer old formats, just look at the state of retro gaming now. I converted my original ZX Spectrum games from tapes to digital files, and can play them on my modern computer almost 50 years later.


The PDF/A format was developed specifically for long term document archival.


They actually store multiple copies. They even use IPFS to store it over decentralized network. But copyright is the main issue we don't see many services like open library or internet archive. Rest in peace Aaron Swartz


Is it be possible to "donate" storage to the Internet Archive in the form of being part of it's infrastructure but distributed? Like plug in an off the shelf NAS on my home network which acts as a backup for some part of the archive? Perhaps that is wasteful, but then again also resilient.


What you’re describing is how IPFS Collaborative Clusters work.

The Internet Archive team would need to host one, and then others could join as “peers” to help seed.

https://collab.ipfscluster.io/


If they store on IPFS (someone else said they did), then the answer is yes. Grab a subset of the content and mirror it to your device. Keep said device online.


Also remember the man responsible for his death: Stephen Heymann. Lest we forget.


Carmen Ortiz shares the blame. Heymann is no longer an AUSA and his career has stalled, to say the least. He will be forever remembered linked with Swartz. Karma is a cold goddess.


Saying anyone is "responsible" for a suicide is disingenuous


People are absolutely, 100% responsible for his death.

A young person near the beginning of their career got sentenced to 13 federal crimes, 50 years of imprisonment and one million dollars in fines, for downloading some PDFs of scientific articles.

It's not like someone asked him to delete the PDFs and he killed himself in protest. His life was ruined to make an example out of him.


Well that is just objectively false. The man was never "sentenced to 13 federal crimes, 50 years of imprisonment and one million dollars in fines". It is blatantly false.


It is objectively false, but the spirit of your comment is grossly incorrect as well. He wasn't sentenced, but the "13 federal crimes" was a plea bargain offered by the prosecution. He killed himself beforehand.

FWIW, I actually have a close friend who ended up in North Kern for a couple years because of a plea deal fiasco. For most of us who never directly interact with the criminal courts, navigating such curcumstances is complelty bonkers and stressful, as I'm sure it was for Swartz.


Over zealous prosecution can make one’s life a living hell which certainly didn’t help.

Add to that a young man who might have had some depression issues (I’m not sure) and you have a heartbreaking tragedy.

From the Wikipedia:

Days before Swartz's funeral, Lawrence Lessig eulogized his friend and sometime-client in an essay, "Prosecutor as Bully." He decried the disproportionality of Swartz's prosecution and said, "The question this government needs to answer is why it was so necessary that Aaron Swartz be labeled a 'felon'. For in the 18 months of negotiations, that was what he was not willing to accept."

https://en.m.wikipedia.org/wiki/Aaron_Swartz

It’s heartbreaking even years on.


I haven't been able to figure out how to access the IPFS versions of the archive. They technically have a version of their site hosted through IPFS (https://www-dweb-cors.dev.archive.org/web), but when searching for a specific url like nytimes.com, it just redirects to the standard archive.org url.


> Physical books wear out in decades to centuries, computers break in years to decades. Also digital formats are constantly evolving and backwards compatibility for old formats is removed. Even without publisher/legal issues it's clear that preservation of digital media for even the next 10 years, and especially for the next 100 or 1000, is at risk.

If you dispense with the notion that publishers control that data, it actually becomes incredibly easy to achieve all of this and more. Worried about data loss? Make more digital copies, spread them on more media. Worried about shifting formats? Convert to plaintext, mobi, ePub or PDF - those will be supported in open source software indefinitely. Still worried about files trapped in proprietary formats that you don’t know how to convert? Fire up a VM.

As long as you can get files from under online DRM, the files are safer than ever.


There are of course naughty book services using bittorrent and the like that store a lot of stuff without worrying about the copyright. https://en.wikipedia.org/wiki/Library_Genesis is popular.


Agreed. But also since publishing is so cheap, an enourmous amount of garbage is published every year. I don’t care much if a reality celeb memoirs book vanishes from existence. Or that Python 2 handbook from 5y ago.

The books that changed my life are sitting in my shelf and will be offered to my kids one day.



>IA.BAK has been broken and unmaintained since about December 2016. The above status page is not accurate as of December 2019.

:-(


> IA.BAK has been broken and unmaintained since about December 2016. The above status page is not accurate as of December 2019.


The licensing that libraries are forced to use in order to support lending digital books only allows them to "loan" the digital book a certain number of times. Depending on publisher, it could be 50-200 loans and then the license goes poof - time to buy a new copy of the digital book to lend out.

Yes, paper books can get destroyed. I remember spilling gravy all over one library book. That was an expensive book to replace. Plus I ruined all that gravy.

> Does the Internet Archive store redundant copies in many locations? Does it use long-term physical formats?

I'm sure they do. Personally, I'd really like to know what some "best practices" there are in doing this. Other than the digital preservation group at the Library of Congress, I'm a newbie at some of this. And no one else has thought about it.

https://www.loc.gov/preservation/digital/index.html

The state agency I work for has to keep records forever. And they have to be readable forever. You can't have the statutes become unreadable because digital files get bit rot, or folks decide that they don't want to support older versions.


> Even without publisher/legal issues it's clear that preservation of digital media for even the next 10 years, and especially for the next 100 or 1000, is at risk.

This is pure hyperbole. The one and only challenge is rent-seeking. There already are lots of ebooks from decades before in PDF.


Many countries have their own digital archives, generally as part of their government's archives (so not just digital), I have worked on tools for sending data to the Danish Digital archive and was interviewing for a job there one time but I messed up the interview because I'm a dork.


They even have archives of all Danish (.dk) websites to some degree. Though it is not generally accessible like the Internet archive is


Plain text files haven't changed that much in the past few decades. We could (should?) transform at least text into plain files (instead of stupid epub/mobi/etc), and that might conserve them for ages


An epub is a .zip containing HTML, I'm pretty happy with it as a long term format.


IPFS, torrents


Yes, the tech exists, but not the follow through


> why is there no other major service like the Internet Archive?

Copyright? I'd pay monthly to access an archive of media older than 20 years.


Use fb2, it was designed for archival purposes.


Check out Arweave. It's not quite as tested as something like BitTorrent but 'perpetual endowments' in theory make data on Arweave persist "forever"


That kinda misses the point. The issue is that it requires active effort to keep it alive for more than a few years. While a physical book needs to be actively destroyed to completely disapper.


Well, nftstorage seems to be free somehow in perpetuity. Something about extra filecoin for those who store the massive 40GB files that can be uploaded there… can someone elaborate on the tokenomics of that?


You could use a laser to burn bubbles into a Qwartz crystal, for longer lasting digital media.


Or printing it on paper...oh wait that's a book ;)

BTW: https://en.wikipedia.org/wiki/Arctic_World_Archive#Storage_a...


Trade paperbacks will wear out in months or years. Zip files containing HTML documents... we'll be able to read them for a very long time.


I have paperbacks over 100 years old that are still easily readable. It is in the shelf. Before that it was on my grandfathers shelf. The transformation was quite easy. Shelf ot Shelf.

Somewhere on an Amazon server I have a couple of thousand Kindle eBooks. How long will Kindle device be around and useable? They may disappear at any moment really.

I have also bought book and publications in other formats. PDFs, Word docs, plain text, epub, etc.

If I die tomorrow, and someone was interested in my literarily habits, they could go to my shelf and find that paperback, put it in their pocket and put it on their own bookshelf.

My Kindle books will be unseeable and untouchable. (ands this might happen before I kick the bucket and I won't be able ot read them either. already a couple of books and publications are missing

All my computer file books on a cloud somewhere or some harddisk will be invisible as well.

If someone took an interest in my laptop to look at my digital bookshelf. of non-Kindle files.

Well, it is a MacBook and strongly encrypted from what I understand. I dont think a future interested person will have an easy time looking at my shelf.

Perhaps I should start gluing thumb drives to my bookshelf at regular intervals. How kong will usb thumb drives be around? How long do they last?

I have a nice collection of floppy disks, computer cassettes (with programs), ZX Microdrive disks, Iomega zip drives, and a stack of backup streamer tapes of different formats/sizes. I have no idea what system they used and what it would take to read even one of them,

I also have some movies in "HD DVD" format and a laserdisc or two. Not to mention a large assortments of memory cards for cameras that mostly contain RAW formats of long forgotten cameras.

All these data storage ideas were sensible at the time. A thumb drive is sensible now. I have no idea for how long it will be.


There are still USB floppy disk drives out there. They work fine in Windows and Linux. They'll even work with USB-OTG on Android.

There's a listing on eBay right now for a USB 750MB Zip Drive. I imagine you could get it working in Windows and Linux without much trouble.

I've got a monitor that supports something like a dozen different memory cards. Memory card readers are incredibly cheap.

All those digital mediums are not lost, and all of those digital files can be transferred to more current storage mediums without any loss in resolution.

You're right that storing important files on your MacBook isn't a good archival strategy. It is a recipe for certain disaster. Its essentially storing your books loosely in the trunk of your car along with your gym bag and every now and then an open case of beer. It's not the place to store things long term, just a place to put things while you're working on them. If your logicboard dies, it'll take the storage with it.


Have you tried to read a 5 volt smartmedia card recently? I came across a few 2 megabyte 5v cards when going through my Dad's stuff a few years back. I stuck them on EBay because it was better than just throwing them away, and was astonished when they fetched £30 a pop. I can only imagine they're used by some classic synth gear or something like that.


I have not recently but I think the card reader attached to my monitor supports the 5V cards. I don't think I've touched a smartmedia card since 2003.

They were really popular back in the day though. They seemed so futuristic, this ultra thin plastic chip seemingly storing your data on these etched gold patterns. It definitely seemed like something out of science fiction. SD cards are so plain in aesthetic comparison.


You overestimate how much people will care about your bookshelf after you die. Usually most of such book collections like that get sold for cheap or tossed out. Yes, a few interesting ones may be kept. But books are dime a dozen, the point is to read them, not to keep them around indefinitely.


Another problem is the ease of reading. When there's a book on my shelf, anyone visiting can take a look and dive into the book. Or my kids could wet their feet in different topics, something which is not possible with digital formats.


This is nonsense. I have almost never seen paperbacks wear out. Obviously it's possible, but even in any of their common failure modes, they are still easily read. I don't know why anyone would argue that saved files are somehow more reliable, it feels not at all genuine


> saved files are somehow more reliable

I have a whole set of ebooks I picked up at a less than reputable site in '11. They're all in pristine condition, all perfectly readable. I have books on my Kindle that are older than that (though not by a lot).

My 5 year old copy of "Edge of Tomorrow" (The LN "All You Need Is Kill" movie tie-in release) is yellowed and the binding is coming loose due to re-reading. The ebook manga I got at the same time is also in pristine condition, and will remain so when the book is long gone.


A physical book goes from pristine to yellowed and loose-leafed. An ebook just completely stops working. The failure modes are completely different


Either that specific edition is of bad quality or you need to learn how to take care of books better.


But that's kind of the point of the whole "electronic books last longer" argument. You don't have to worry about either of those things with electronic books.


Chuck a ebook on any storage medium and wait 20 years, and odds are the file is unusable, due to the file being incompatible with new software, the storage medium being unusable, bitrot or a number of other factors. Maintenance is required for both physical and digital. Physical requires a relatively dry space and not bending the spine, which anyone's grandma's dog can do. Digital requires backups + regular recovery testing to be guaranteed.

There's plenty to worry about with maintenance of both formats, but for paper most of the investment is upfront. For digital it is an ongoing investment, with the benefit of a pristine copy being maintained.


Instead you have to worry about new things, like not regularly backing up your files to new media since hard drive failures and bit rot are real. You may also have DRM, logins, and paywalls to overcome which are total non-issues with paperbacks.

Ultimately, it's a trade off. Assuming that you have a DRM free digital copy of something and that you're remarkably careful about keeping back ups and multi-site/format copies a digital file could easily last 100 years, just like a well cared for paperback could last a hundred years. At that point, the difference between them is how easy it will be to read the contents. For the digital copy you will need a certain type hardware and software which may or may not exist or be easy to obtain in 100 years, but you need nothing at all for the paperback.

I like books, so I think the best bet is both the physical copy and multiple digital copies in various formats. It's not as if we have to chose one or the other, so it's okay that both options come with different strengths and weaknesses.


You've never seen a paperback made in the 1960s-1990s? The paper is brittle and the pages are literally falling apart. You can't read paper where pieces of it have fallen off and disintegrated.


I have run across many books where the binding has disintegrated and the paper has degraded, but never to the point where a book could not be read. Assuming the book was handled with care (so that the unbound pages are not lost), every book could be read in its entirety.

I'm not saying that your scenario can't happen. I am saying that I would be surprised if all but the most carefully maintained digital library would outlive the cheapest of print books. Keep in mind that, at a bare minimum, a digital library must be backed up and transferred to new media every few years. You may get away with storing digital for a decade untended, but two decades is a bit of a stretch. Also keep in mind that even the slightest amount of bit rot can make an entire book unreadable. While this isn't really true of books stored as plain text, most modern formats seem to used some sort of compressed container (e.g. ePubs are compressed).


Certainly not true of every paperback from then. I wandered downstairs and found several books on my shelves printed in the '70s, '80s, and '90s in totally fine condition. On the oldest books the paper has started to yellow a touch, but are otherwise fine.

But even a book from the 90s is 20-30 years old at this point. I'm not sure if that fits colloquially with a claim that they fall apart in "months to years". I've never had a paperback book I've purchased fall apart within 5 years (hell, I can't think of a time one fell apart within 10 years with the exception of severe water damage).

I just...have never seen a book fall apart in months.


> I just...have never seen a book fall apart in months.

I have. I had one fall apart on my first read through it. Pages just dropping out of it.

I also recall, as a teenager some years ago, buying a hardcover book from a store in the airport, and realizing that the last 20 pages were the previous 20 pages pasted in again.

Ultimately, the books I cared about re-reading, I've replaced with ebooks. Because aside from the convenience, I don't have to worry about the book's condition, deterioration, mold, et.al.


My AD&D Unearthed Arcana fell apart in months.

Although that was notoriously bound poorly.


> I just...have never seen a book fall apart in months.

I've had the binding snap on mass market paperbacks when I opened it the first time. Not exactly a shining beacon of quality.


I have a few sci-fi novels from the 50's that are in fine shape and many more through the 60's to 90's that are even better. Where are these people buying such poor quality paperbacks?

The oldest hardback in my library is right around 300 years old and beyond a bit of foxing in quite good condition.


They used much higher-quality paper 300 years ago, so those books don't fall apart. Somewhere in the 20th century they used really crappy acidic paper for the mass-market paperbacks. If you're looking at sci-fi novels in hardcover form, this doesn't describe those; they generally used high-quality paper. I have a bunch of those from the 70s-80s that are fine too. Go look at the crappy romance novels from the 80s, and you'll see a very different story.


Cheap pulp-based paper was a creation of the 19th century.

The 1898 Report of the Librarian of Congress has an appendix titled "The Durability of Paper" addressing just this concern:

<https://babel.hathitrust.org/cgi/pt?id=mdp.39015036735036&vi...>


I have many paperbacks from the 1950s to the present day. Some of the really old ones are a little yellow around the edges, but brittle and falling apart? Never. I have no idea what you are talking about.


brittle and falling apart? Never.

How often have you read them? Many of my most beloved cheap paperbacks from the 80s and 90s have been replaced at least once since they've literally fallen apart as I was reading them. Also my books from the 50s and 60s seem to be of higher quality than books from the 80s and 90s (or that could also just be survivor bias).


My shelf is full of scifi/fantasy paperbacks from 80s to late 90s. A lot of these have their binding already broken. This happens especially to small (cheap) format books of more than 500 pages.

No problems with small books of reasonable lengths. At some point the number of pages in the popular books exceeded the durability of the cheap binding tech in use.


> You've never seen a paperback made in the 1960s-1990s?

I own tons from the 80s, all are in perfect condition. Also a few dozen from the 70s and they're all fine.

I also have a few books from the 1800s, those are hardcovers not paperbacks but also in reasonably good shape.


I've got a dozen or so paperbacks from the 1930s on my shelf that I read from time to time, and hundreds from later decades. Most (> 90%) are only slightly yellowed and stiff. The remaining 10% vary from crispy around the edges to falling apart. Sure, paper eventually degrades, especially cheap non acid free paper, but we know for sure (because they're in libraries and still quite readable) that the content of quality paper books can last for centuries, and probably, with good care, millennia.

Keep in mind that bindings often last a lot less than that: One of the reasons genuinely old leatherbound books have those horizontal ridges on the spines is that they cover the (often also leather) laces used to re-bind the book and hold the pages together. This gave the classic works a certain look that was later replicated by publishers as just decorative ridges on the spine - but the origin of the feature was that those leather ridges were functional (they also acted as wear bumpers, but this was a secondary bonus) and a key part of the rebound book's structure!

I'll add one more big item to the anti-electronic book column: It is simply impossible to build electronics that last for decades with lead-free solder. Leaded PCBs will still eventually have tin whisker problems, but the new "green/RoHS" lead-free PCBs always stop working much sooner than leaded ones. Worse yet, no one cares because obsolescence design cycles are single-digit years now. Prior to lead-free electronic controls, appliances lasted many decades: Almost all of my major appliances are over 30 years old now, and some have never been repaired at all! They're a little less efficient, but much cheaper over the long haul, as I'm not replacing them every few years - that makes them arguably better for the environment, too, as there's no waste filling the dump, either...

Real books last centuries. E-books can't. We're a long way from Andromeda's "flexies"...


I have books from 1800's in a perfect condition. Also, comic books from the 70's, 80's and 90's.


Unless the ebook we are comparing to (from the same time period) was a literal .txt file, it is unlikely you will be able to open it on a modern computer without processing/conversion of some kind.


They can fall apart but can be re-bound. I have a bunch of such fixed books where I got the cover and the binding replaced.


Most paper contain acid. The acid breaks down the paper. Pages become very fragile after some time(100 to 200 years?). Paperback uses cheap paper that contain acid. Paperback books will not last forever. More expensive acid free paper is often used by artists. Acid free paper is made plant fibers, but often not wood. It could be cotton. This will have a much longer life span. This paper is too expensive for most books


The problems with paperbacks, and even a fair number of hardcover books, is acidic paper (archival quality is acid-free cotton-rag), and for paperbacks, binding glue, which often fails with time.

If the pages don't literally crumble to dust, they fall from the binding, especially when actually read.

There are some formats which survive better, but many books will in fact deteriorate beyond readability within 50 years or so.


Bullshit. I can go to a second-hand bookstore in Spain right now and buy tons of volumes from the 70 and 80's.

The math book I'm reading right now it's from the 70's.


I admire your enthusiasm.

You might care to temper it with a balance of rigour in verifying your own beliefs and anecdotal experiences, your haste to dismiss that of others, and as in assessing your own methodology and its potential weaknesses.

I could perhaps have been more clear to indicate that use of acidic, pulp-based paper is more common in paperback publications, rather than universal. The point remains that as a cheaper publication mode, that cheaper publication processes and materials are more prevalent. I have encountered issues with pulp-based decay in both paperback and hardcopy books.

I'd linked an 1898 reference to issues with high-pulp, acidic paper degradation in an earlier comment (also submitted as an HN item). It begins:

The Library of Congress is indebted to the American ambassador at Berlin, the Hon Andrew D. White, for the following copy of the regulations adopted by the Prussian Government for the security of the national archives, and teh special danger involved in printing or writing records on paper made of wood pulp.

Wood pulp is extensively used in the manufacture of modern paper.

Paper made from compositions containing wood pulp decays more or less rapidly in proportion to the amount of wood pulp used.

Such paper is unfit for official use where permanency of records is essential or important.

<https://babel.hathitrust.org/cgi/pt?id=mdp.39015036735036&vi...>

The specific issue is somewhat less the wood pulp than acidic materials used in its preparation and their interaction with the pulp. Wikipedia has a good article on the matter:

Paper degradation is a slow process, but it is significantly accelerated in an acidic environment. In the mid-nineteenth century, the method of paper production became popular, in which resin-alum glue was added to the paper pulp.[3] The aluminum sulphate remaining in the paper form, in reaction with water, acids that catalyze the decomposition of cellulose (acidic hydrolysis). In this process, the cellulose chains are shortened, which reduces the tear resistance of the paper, and at the same time increases the cross-linking of their structure that causes the paper to stiffen and become brittle.[4] Parallel to the degradation under the influence of water, the cellulose chains react with oxygen, in result of oxidation the chains are also shortened.[5] Not only cellulose, but also the lignin contained in the paper is oxidized, which leads to the yellowing of the paper.

... The process of self-degradation of paper causes exceptional difficulties in safeguarding the collections of archives and libraries. For example, an analysis of the book collections of the Jagiellonian Library, Adam Mickiewicz University in Poznań, Książnica Cieszyńska, the AGH University of Science and Technology and the Cracow University of Technology proved that as much as 90% of the resources published by the mid-1990s (to be precise in 1996 in Poland) have all the features of acidic paper.[7] It turned out that these institutions, established to care for the heritage of the past, are not able to effectively carry out their mission.[8]

<https://en.wikipedia.org/wiki/Acidic_paper>

As to books published in the 1970s and 1980s, I have, or have had, numerous instances of these in my own personal library, purchased new, which are in the process of or have entirely degraded beyond usability, the latter having been discarded.

Those Spanish examples you've encountered might be a useful foil on which to consider a further concept (and common methodological / sampling error): survivorship bias:

<https://en.wikipedia.org/wiki/Survivorship_bias>


No, he's right. Your comment was bullshit. Even really poor quality books will last at least many decades longer than any ebook reader and format. Good books last centuries to millennia. (And the ones with really good content tend to get re-bound, as well, lasting nearly forever...)

No ebooks can last more than a decade or two, and even if the files are perfectly readable, you often need a LOT of supporting hardware and software to use them.

Take even a very simple example like a Kindle in 15-20 years time: The LiPoly battery will fail, it may not be able to phone home to validate your DRM anymore, for many reasons: Amazon my have updated the APIs to a form unusable by your old device, your account may no longer exist, the wireless network standards may no longer be compatible or in use (original Kindles already suffer from this!), the charger may be long-gone (or you can no longer find a type-A USB port to plug the charging cable into), the lead-free solder will have developed tin whiskers and shorted out PCB connections, the grid may have collapsed due to the instability of renewable energy or a Carrington event, and in my experience, the microUSB charging connector on the Kindle itself will most certainly have broken, anyway, as they always do (and I'm NOT hard on my devices...) Exactly NONE of those issues will ever prevent a book from providing you access to the information it had from the day of its printing.

Thanks, I'll take paper. *ALL* digital data/info storage is ephemeral. Anyone who thinks otherwise is fooling themselves.

(And BTW, history shows that the fall of civilizations (and their support structures, of course) does happen with alarming frequency. For the last few thousand years, books( (or close cousins like scrolls, etc.) have proven to be as resistant to that sort of thing as possible.)


You are refuting an argument I did not in fact make.

Actually, several arguments I didn't make.


I have at least a couple of thousand paperbacks that are at least fifty years old. Not one of them is unreadable. A few of the cheaper ones (mostly very cheap US editions) are falling apart but that doesn't have much effect on readability.


go buy a cheesy romance novel from the 70s at any local thrift store, chances are good that the pages have warped, yellowed, and have become brittle. They will generally fall from the binding easily, or break apart in a jig-saw fashion under bending load.

I use 'cheesy romance novel' as the benchmark, as they were notoriously poorly bound with cheap materials to reduce cost per unit to the extreme.

Yes, a human could read the broken pages pretty easily with effort, but one of the niceties about the book format is that it reduces the user burden to such a degree as to allow them to become entrenched in the material rather than the physical good.


> Trade paperbacks will wear out in months or years

>> go buy a cheesy romance novel from the 70s at any local thrift store

Okay, but a "cheesy romance novel from the 70s" is 50 years old at this point. That's...a very generous interpretation of "will wear out in months or years". I suppose it's true that "500 months" is technically months, but...that's not how that phrase is typically used.


(also the fact that it's for sale in a thrift store is a dead giveaway that it's not completely worn out yet; are there many 3-1/4" floppies, or even CDs, storage media from 20-30 years ago, max, that are readily picked up by a layperson and perused now?)


It'sbeginning to be a hassle now to even extract info off of external drives with usb-A, due to all the migration to USB-C. More devices will likely switch to wireless only, fed by big companies' shitty "cloud".

Hopefully some will fight the good fight to keep at least desktops with ports long enough in order to allow for a personal NAS and IPFS to keep our docs accessible.


No it isn’t: I’ve found books missing half the pages in a thrift store.


This. My wife runs a podcast about trashy novels and some of the books she gets her hands on... or not, as it were in some cases, because the digital version is a must.


This entire thread is essentially "I have an anecdote about an old book falling apart, so therefore old books are bad". Reminds me of people at work who push back against changing business processes because "what about this one edge case that might happen?" where the edge case is minor, rare and can be easily identified and handled manually where the new process would also save literal hours of time per week.


The paper of mass-market paperbacks has a high acid content and over time the paper will crumble. Take a look at mmp from the 1950's if you don't believe this. That is why librarians wash books, to get the acid out. Rag paper with no or little acid content will last 300, 400 years or more. Books from the incunabula, properly tended, have not crumbled into dust.


If you like pulp fiction from, say, the 1930s, they are very expensive if you can find them.



Paperbacks are easily lost, stolen, or destroyed by things like flood, fire, mildew, insects.

While digital books are also stored on media that can be lost, stolen, etc., the magic comes from storing multiple copies in different places.


In the 3rd world or maybe in the "cardboard" homes in the US, Maybe. But under a proper shelf, wardrobe, desk or box, books last for long.

I still have a comic book compilation (hard cover) from the 70's right here. The condition it's perfect for its age.


And yet those things have happened to me, despite storing them properly.


Untrue. I still have and use the 1$ paperbacks I bought for college 30 years ago, and they're in great shape.

When it comes to reading, non-electronic books are superior in just about every way for me.


For me, if I can really sit and enjoy reading and that's all I'm doing I prefer a physical book.

But, electronic books do have a real convenience to them that I have to admit. If I'm reading one-handed (holding a baby, eating a sandwich, etc) I really appreciate a kindle for reading. Also, when traveling, the ability to bring 50 books with me on a flight, or camping, while taking up no space is amazing.

But, yea, for the endurance of the storage media, physical books definitely have an edge in my eye.


> Trade paperbacks will wear out in months or years

I'm extremely dubious of this claim. Decades, I could start to believe. *Months*? I've never had a paperback be unusable within a decade. Do you have any citations or data for trade paperbacks wearing out in months?

It's just *so* incongruous with my personal experience of even the cheapest paperback I've ever bought.


> citations

My own experience - having a brand new paperback litter pages on my first read. Or buying a book only to find that the last signature was a duplicate of the one prior to it.

I've also had a manga spine break on a second reading, literally dropping a whole section of pages on my lap.


How are you reading them? Are you bending the cover to the back while reading?


I have trade paperbacks from when the 1970s. My father has some from the 1950s. They are all perfectly fine. Maybe the ones from the 1950s are a little yellow around the edges, but that hardly affects the ability to read them.


A big point being made in this thread is that the technology for physical storage of the media becomes obsolete quickly. One could argue that the physical book itself is the original storage technology. But, as many have said, that book on Floppy disk is harder to access in today's world, and in the future, a 7200RPM hard disk external USB drive will probably be very difficult to use. It's already somewhat of a hassle for many devices using usb-C, rather than usb-A.

So when we move to storing archives or cold storage (likely where most books will go) on DNA, information is provided by LLM AI, and all our devices have no ports whatsoever - it's pretty likely that external hard drive will require a lot of effort and money to get the contents out.


I have a SCSI hard drive or two I cannot access.


LOL. Were are you from? I have books from 1800 at home.


Maybe from Hogwarts...have you seen want they do with their books?


Seriously, here in Europe books are treated with a lot of care because books were and are relatively expensive, unlike mangas in Japan which were sold like chip bags.

Maybe a lot of them have a folded page or such, but the content it's readable. Try that with IDE disks from a Pentium II.


Ok, i folded the disk and after many hour of testing i have to say that your are right, it's not working anymore, however for the sake of correctness it was a Pentium4 era harddisk.

But yeah being swiss i know what you mean ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: