This is a really weird article for me. Ebooks are zip files with HTML pages (that is, text files) and images inside. That's about it.
These three technologies are some of the oldest and most robust we have when it comes to digital media at large. Being concerned about reading and displaying HTML pages in a zip file in a decade is absurd. They're both already over 30 years old themselves.
DRM, yes. DRM is a complete and utter pox on the digital media world. Almost as bad as trade-bound paperbacks which tend to fall apart after (sometimes during, grrr) a single read. Preserving trade paperbacks, however, takes more effort than removing ebook DRM, and both procedures are well documented with broadly available tools.
The point is that the paperback can sit on a shelf for 100 years be readable. An ebook takes a migration plan to remain readable. Yes you can remove DRM, but libraries cannot, they have to keep paying, and they're being sued to prevent them from preserving and migrating their data.
> The point is that the paperback can sit on a shelf for 100 years be readable.
Some can, most can’t. I’ve got a Bantam paperback copy of Moby Dick from 1985. It’s pretty much on its last legs. The pages are disintegrating and turning golden yellow, but the binding is still intact. Not sure if it will last another 63 years or so.
You'll have to treat this as a bread crumb, lacking sources as I do, but I believe that the nadir of printing quality was in the mid 1900s, when rapidly decaying mass paperbacks were on the market. I think that particular strain of bad quality left the market.
OTOH, I have a full set of the 1911 Encyclopedia Britannica that is entirely readable and mostly in good condition, with nothing except gentle handling, regular dusting, and infrequent treatment of the leather binding to prevent red rot (because I don't read them frequently enough to prevent the condition with oil from my skin).
> the paperback can sit on a shelf for 100 years be readable
Not really. The paper's acidic, which means that the paper degrades to the point of not being usable without a huge amount of intervention. I have trade paperbacks which are under 20 years old whose paper are downright brittle.
> but libraries canno
Which is, I hate to say, irrelevant to the discussion of preserving digital documents. Libraries are getting shafted - humanity at large is not.
> Libraries are getting shafted - humanity at large is not.
Libraries are how humanity at large preserves documents! Sure, I might have a wonderful collection of books on my hard drive, but other than me (and whoever knows about my torrents), nobody's benefiting from that; whereas everyone with a local library knows (at least, in theory) that they can find information there.
When I kick the bucket, what happens to my hard drive? If somebody knew about it and cared enough, they might subsume its contents into their collection; and then the information would be safe for another… few decades? To quote Randall Munroe (https://xkcd.com/1909/ title text):
> I spent a long time thinking about how to design a system for long-term organization and storage of subject-specific informational resources without needing ongoing work from the experts who created them, only to realized I'd just reinvented libraries.
Historically, yes. Today? No. The internet has, in many ways, supplanted it. And where Libraries are hobbled by following the laws, the internet community has found ways around this particular damage.
And that's cool.
Now then, should Libraries be getting shafted like this? Fuck no. Can any actions I take - short of running for office and making my way up to the federal government in charge of interstate commerce - effect that? Also, sadly, no. I will continue to support librarians seeking change while facing reality.
The internet is insanely prone to losing stuff. What percentage of the first 10,000 YouTube videos from 2005 do you think are still accessible? With copyright and account issues etc issues I would be surprised if it broke 20% and that’s for a website that’s still around.
You can find old stuff due to internet archive and that’s about it. Making the internet basically a giant single point of failure in practice.
> With copyright and account issues etc issues I would be surprised if it broke 20% and that’s for a website that’s still around.
On YouTube there are around 20 different labels claiming to own the copyright to a recording of the Red Army Choir singing the Soviet national anthem more than 50 years ago.
UMG has already decided you're not allowed to embed it on some other sites. Additionally, due to licensing issues, music "owned" by certain labels hasn't been/is not available in the certain countries.
I remember an old webpage, the Spanish speaking clone for Slashdot, Barrapunto. They turned down the servers because they had no visits compared to Menéame, a Reddit clone for Spaniards.
Now you just have a few pieces at the Wayback Machine and that's it.
Now, tell me the digital wankers how much the digital preservation is compared to the books. Even if they used NNTP, I think not the whole archive would be preserved, just ask Jason Scott.
And I love computers, I use OpenBSD and CWM on a netbook. But current media preservation efforts are ridiculous compared to a simple book from the 70's.
The internet is big and sprawling. It's easy to prune, by accident or malice. Think of how much of the web goes through Cloudflare! Think of how much we rely on the Internet Archive; if that went down, where would we be? Libraries, on the other hand, are decentralised, redundant, curated, indexed, and maintained against both bitrot and link rot.
We could make the internet like this. In fact, there are organisations that do so! But there's a reason so many of those organisations call themselves libraries.
Supporting the Internet Archive is very much worth doing – but we also need a parallel effort to archive the Internet Archive. There are other ways for it to go down than just "we ran out of money". (Then again, I don't know that a parallel effort exists; Archive Team were doing one a few years back, but I think that's stopped. So donating to IA is still your best bet, here.)
> When I kick the bucket, what happens to my hard drive?
Well you could write a will saying that you want it to be released to the public and then have someone come along and say, "they didn't really mean it."
The prior version is still available, though? And the caveat "the wishes that Aaron had in 2003 do not necessarily correspond with those that he had in 2013" is a far cry from "they didn't really mean it", especially when the old version is still available (encoded as data on the existing page).
I guess I'm seeing it more as a will where you state what you want to have happen after you die. So they are saying that what he wrote in 2003, he didn't mean in 2013. I'll agree it is a far cry in wording, but seems similar in intent.
Maybe they are saying that he forgot he had published that page. Maybe there was something he wrote or told someone that contradicted what he had published. All we have to go off of was his digital will for his hard drives.
Well, I think libraries getting shafted is humanity at large getting shafted. We've gone so far into monetization and privatization that academic knowledge is now gate kept, bartered and sold - I know that a century ago there would be other gate keeping issues around gender and skin color but while we've made good progress on those fronts we've also allowed both research and literature to become much more commoditized and that is bad for society as a whole.
>Not really. The paper's acidic, which means that the paper degrades to the point of not being usable without a huge amount of intervention. I have trade paperbacks which are under 20 years old whose paper are downright brittle.
I thought modern books were made with acid-free paper? I know this is a huge problem for books made in, for instance, the 1970s. Is it only the nicer books that are made with acid-free paper?
And that's why it's great that the internet exists. A very large number of libraries can burn and no knowledge will be lost. Just copies of that knowledge.
Those Alexandrians' should have focused more on their tech game. :)
Alexandria wasn't the only library in its day, obviously.
Sometimes you "only" lose redundancy. But occasionally a lost library (or digital archive) contains unique content, so statistically the survival rate still trends toward zero.
How is knowledge lost? "Slowly at first, then all at once."
What will happen to that digital knowledge if there is no electricity in the world for like… twenty years? I am not even thinking about “twenty generations”.
My impression is that trade paperbacks really suck. Frequently the bindings crack when you open them for the first time. Mass-market paperbacks are not great but they have some minimum level of quality that is not certain in trade paperbacks.
No, I mean what I say. Mass market paperbacks are at a low standard but it is a standard. They are great to put in a backpack because of small size.
A good trade paperback is better than any mass market paperback but trade paperbacks come in various sizes and designs and many of them have serious defects when manufactured.
Ah, I see where the size standardization could be valuable. I was thinking only of quality, and everything I've read (and I don't mean I just went to wikipedia ;-), though they do actually agree) suggested that a trade paperback was closer to a paper bound version of the original hardback, up to and including the size and everything, and built consistently to a higher standard than a mass market version.
Some of them are good but some of them split at the back with the slightest bit of handling. Self-published trade paperbacks are particularly bad.
As a category I don’t like them though I have plenty of them. My favorite kind of sci-fi book to collect these days are small hardcovers but for nonfiction I take what I can get.
Why don't they try it and find out? Seems like if anyone should have the legal right to do so from purchased books it should be them.
Internet Archive is trying something similar in court right now with their "controlled digital lending" concept. The ReDigi case was a true shame, but that doesn't mean we shouldn't be willing to propose other concepts and try to legalize some case here.
It's not really about physical or ebook though. Both require a plan. The physical book requires a shelf and some climate control. The ebook needs to be DRM free and stored on a 'digital shelf'.
In a 100 years if we can't unzip a file and read text files, then we have likely lost many of the physical books also.
Physical books also have a unique issue of storage space and hard to replicate.
When I go to Olin Library at Cornell to get physical books today I find quite a few that are seriously yellowing and I am not talking seriously old but books printed in the 1970s. Paperbacks I have from the 1970s in my farmhouse which is a horrible place to keep books are often still in usable condition but have gotten much worse in the last 20 years.
They shut down the Physical Science Library and the Engineering Library around the time I left CUL and put many of the science books (Q’s) in the math library and put many of the engineering books in Uris library which is split alphabetically by LC call numbers with Olin, the rest of the books went to the library Annex. A tunnel from Olin goes to the Kroc library which specializes in Asia and rare manuscripts.
There is a very good service to request books out of the annex and deliver them to a service point but people are more inclined to order books from AMZN rather than pull books from the annex or use a an express service like Borrow Direct that lets Cornellians get books from other Ivys in a few days.
There is still the fine arts library, the music library, the Africana library, Mann library at the ag school, the industrial labor relations library and the vet library. Those last three are in the state schools so any New Yorker can get a free library card for them. They closed down the Nestle library at the hotel school and consolidated that with the library at the b-school when they merged management and hotel.
There are still plenty of books, but people come to use the computers and there is intense demand for study and breakout space on central campus which competes with everything else.
If you have a Cornell netid you can access electronic resources from anywhere but not if you are a general public member. Back when I was involved in digital library stuff I worked on numerous projects that were free to access and my colleagues worked on even more such as the global performing arts database, images of the fantastic and the Ruleaux collection of gear contraptions that is on display at the engineering school but has interactive models of everything. They still do the arXiv preprint server but the funding climate got hostile circa 2005 when the library was in general contraction.
> The point is that the paperback can sit on a shelf for 100 years be readable.
Physically? Maybe.
But another factor is that human languages will naturally evolve to the point where future generations won't be able to read the words we're writing now without some kind of translation service. Even if it hasn't evolved enough to be classified as an entirely new language, it can change enough to be extremely difficult to decipher. This also requires a migration plan or historians to update the book for the modern language.
If you've ever picked up a very old book, you probably know exactly what I'm talking about. Even if it's still physically readable, you struggle to make sense of vocabulary and style in popular use at the time of publication.
What's curious to me is that some languages, and writing systems, preserve better than others.
Latin, having become a dead language used principally by the Church and for a time science / natural philosophy, remains highly-readable across centuries so long as you read Latin. Similarly for classical Greek.
I'm given to understand that Arabic, itself significantly preserved in the Quaran as a literal verbatim transmission from the 7th century.
Written Chinese, zhōngwén, or 中文, is logosyllabic, where symbols represent the concepts rather than the phonetic representation of a syllable. It is not only generally readable across millennia, but even across largely mutually-unintelligible dialects of Chinese, or other languages (Korean and Japanese, for example).
Contrast written English which becomes highly idiosyncratic even only a couple of centuries ago (long-S, nonuniform spellings, highly stylised and formalised expressions), and both difficult to read and understand (verbally) from as few as five centuries ago.
Numerous of the modern languages of Europe (Spanish, English, French, German) date only to about 900--1,000 CE or so, and the forms spoken then would be difficult or impossible for most moderns to understand. Written forms ... often didn't exist at all (government and business being transacted in Latin, Greek, or Arabic throughout much the region).
Hence my suggestion of a good backup strategy. Which should ideally include data integrity measures such as checksums and/or integrity-aware storage mediums.
You're backup's only good if you can successfully restore it.
The article isn't clear, but in the conclusion, the author is asking for libraries to have the legal right to remove ebook DRM, and to make copies of ebooks in the course of routine maintenance.
IMO, the article's title should say, "DRMed ebooks wear out faster than physical books."
And that's clearly correct: digital data that you don't make copies of is quite physically fragile, and even if the physical media survives, it may be difficult or impossible to read it. (Reading data off an old IDE hard drive or Jazz drive, for example.)
If you don't copy those files to new media, the piece of hardware itself that they reside on may be unusable. How easily can you read some HTML sitting on a 500 megabyte IDE drive from 1995, versus a book printed in 1995? If you come across such an object, you don't even know what is on it, and whether it is worth it to delve into it.
If I go to a library to read a trade paperback from 1995, it's going to have been re-bound. And that's if the paper has survived. Trade-bound paperback books - which are usually price equivalent to ebooks - require intervention to remain usable for more than a few years.
As for my own collection, I have no books from 1995 left. They've mostly fallen apart or been replaced with ebooks (which are still in great condition).
What do you do to your books?? Chew on them while reading?
I've read and bought paperbacks more than 30 years old. Yeah, I imagine a popular book in a library that's constantly checked out wears out faster, but every 27 year old trade you have is ruined? Seriously?
I totally get the point of the article, not only in books but in other media as well.
My CD collection is sitting next to me since forever and will continue to be there for the foreseeable future.
Guess what happened to the very carefully curated playlists I had on Grooveshark?
I've also lost dozens of songs from Spotify as they've been removed.
I have a huge list of music videos that I like on YouTube (~900), every month or so I find a few of them gone, I don't even know which ones they are as they only say "deleted video" or whatever.
> Ebooks are zip files with HTML pages (that is, text files) and images inside
There is almost no consumer storage medium that will last as long as a printed book. The files that will (probably) be easy to read in 50 years won't be readable on the hard-drive/CD-ROM/etc where they are stored. A printed book likely would be able to still be read
PDF is also future-proof. At least, it can be. A PDF file is just a collection of pages described in the Postscript language. The page "object" is then compressed and some metadata is built around it to produce the PDF. There are tools which will extract the pages in their (plain text) Postscript form.
> trade-bound paperbacks which tend to fall apart after (sometimes during, grrr) a single read
I'm sorry for probabbly a stupid question, but what is "trade-bound paperbacks"? I have tried to google it but I can't figure out which binding method is meant here..? Cheap glued paperback books? Sewn books? Something else? :/
Effectively hot glue bound paperbacks filled with low quality, poorly printed pages.
EDIT: I have probably confused Mass Market paperbacks with Trade paperbacks. The former is the main target of my ire, the latter is purportedly of higher quality in general.
These three technologies are some of the oldest and most robust we have when it comes to digital media at large. Being concerned about reading and displaying HTML pages in a zip file in a decade is absurd. They're both already over 30 years old themselves.
DRM, yes. DRM is a complete and utter pox on the digital media world. Almost as bad as trade-bound paperbacks which tend to fall apart after (sometimes during, grrr) a single read. Preserving trade paperbacks, however, takes more effort than removing ebook DRM, and both procedures are well documented with broadly available tools.