If they are subcontracting the ebook dev process to a third party, perhaps not. And also, the whole editing to typesetting process is probably a hellish mix of Word and InDesign files, which leaves no "pristine copy" to pick the structured text out of. (Imagine: edits late in the process are made to the typeset InDesign files, after the structure has been lost.) You need structured text for creating an ebook because the ebook software does its own typesetting. So to pick up all the late edits, the only choice might be to OCR the final book and re-apply the structure. Yep, it's sad.
> If they are subcontracting the ebook dev process to a third party, perhaps not.
So they're not willing to provide the original source files, or what?
> edits late in the process are made to the typeset InDesign files, after the structure has been lost.
Wow, they do this so haphazardly that they don't even bother to go back to the original Word file and add the same edits?
Is InDesign used for all books? I don't see why you would need it for something without any figures/graphs/tables. You could just use Word, and things would work out fine.
Word's typographical capabilities are just as bad as current ebooks. Word is absolutely not capable of producing professional quality text. So, while it may be useful for drafting a book, it is worthless for getting it ready for press.
As for why OCR is used, just look at the range of books that are now available electronically. There are a lot of books from the 1980s and earlier that are now available as EPUBs. Do you really think that it would have been easier to use the source files off floppies? (Keep in mind that the floppies are, for the most part, long gone.)
> As for why OCR is used, just look at the range of books that are now available electronically. There are a lot of books from the 1980s and earlier that are now available as EPUBs. Do you really think that it would have been easier to use the source files off floppies?
Most likely, by using PostScript (edit: sometimes, even high-resolution raster) files that have no semantic markup and cannot be automatically reformatted in any way other than simple zooming/scaling. Such files are not much easier to transform into ebooks than scans are, but are probably harder to get access to.
This state of things is quite apalling. Given how small the source files are, I would've thought that publishers would keep an up-to-date copy of all of their books in a centralized repository.
Some publishers might do this. But it would be very surprising if any major publisher has been completely standardized on using the exact same toolchain and file formats for more than 20 years. Most publishers also have no incentive to modernize the markup for a book that's not getting any content updates.
With all due respect I have extensively used Word, InDesign, & LaTeX and, contrary to your belief, typesetting can be (and is often) achieved in Word quite effectively.
Sure, ligatures have been supported for a while (since Office 2007, I believe). In any case, I was able to enable them in Office 2010. I would post a screenshot, but my Windows VM crashed when I tried to take one, and now refuses to start. I'm sure you can find info on it online.
I know that people like to harp on and on about how great LaTeX or InDesign are, and when it comes to complex layouts (particularly LaTeX if there are lots of referenced figures and InDesign for text wrapping, etc.), I agree. If you were writing a scientific article, I'd tell you to use LaTeX. If you were creating a magazine, I'd tell you to use InDesign.
But for your typical, text-only, fiction novel, the kind you're most likely to read on a Kindle? I don't think it really makes a difference.
FWIW, I don't think any publisher that uses an actual printing press uses Word to typeset books, simply because they almost certainly use the whole Adobe toolchain for prepress [1], and InDesign integrates much more nicely with this process. For example, Word has no facilities for color spaces, separation, or generating the actual plate images.
I guess they could generate PDFs from Word files with Acrobat and shoehorn them into the process, but any publisher that is actually satisfied with doing that is bonkers.
If I wasn't fond of Emacs, I would probably do my writing in AbiWord or WordPad. Putting a lot of work in a bloated format like .doc(x) kind of scares me. Though maybe Word with rtf would perform well.
Well, as you can probably tell from my initial post, my knowledge of mainstream publishing practices is nil. Though from what I can discern, a lot of the problems have to do with poor technology usage.
People make a big deal about how using Word will lock you into a certain format when they're storing their data on floppies or, even worse, rasterizing it and keeping just the PS. Seems like the latter is a lot worse than the former.
This is the point I was trying to make in my comments earlier. Typography is a feature. Right now we're seeing Minimum Viable Ebooks. Ship.
What is the incentive for better typography? Most books are like mini-monopolies (over the short term). If the typography of a book sucks and it sucks on all platforms, what is a consumer to do if they want to read the book? Fallback to print? Publishers probably don't mind that. It's true that all things being equal, typography could be a deciding factor for the consumer, but my gut says that in most cases the content is weighted much higher.
I think the platform could benefit from typography. A consumer may choose an iPad over a Kindle if the iPad had a better reputation for typography. But since there is _art_ involved it might not be economical for the platform to pursue it. Which is a better selling point: Our platform has 100 really nice books or our platform has 100,000 books?
No, but this is how the iPod beat every other mp3 player -- it was just undefinably better to use. Typography is one of those features; most people don't think they even notice it, but they do.
Your mom might select a kindle fire if she hears from all her savvy friends and family that the kindle fire delivers a better experience. She may even hear why it does, but she may not care about the why, just that trusted sources inform her it is better.
I believe this is true in general. A few savvy people catch on that a product is better and it becomes a meme that others rely upon for their decisions.
Don't talk about my mom like that! (In all honesty she is not really into savvy/hip people;)
I do not think that there are enogh savvy-typography-valuing-people to induce some market driving meme. If this was the case Word would have died many years ago...