Remember the fear that Google would start a print on demand business and put all the publishers out of business? Well, Google appears to be getting into the bookselling business, but there’s no printing involved, nor are they cutting out publishers. Google’s new service will allow publishers to set their own price for online access to books. Readers won’t be able to save copies of the books on their computers nor will they be able to copy text from the books, and the books will only be viewable within the browser window. This looks like a great opportunity for publishers to provide online access to their books without having to set up their own systems. (via)
On Friday, as you may or may not have noticed, Amazon went down for about two hours. These days, we’re used to 100% uptime from the internet’s supersites – Google, Yahoo, Wikipedia, et al – but the Amazon outage reminded me of the late 1990s when even the biggest dot-coms, struggling to scale to the explosive growth of the Web, suffered routine and sometimes prolonged outages. (Of course, some more recent start-ups still experience such growing pains).As Amazon returned to service on Friday afternoon, speculation kicked into high gear about just how much revenue the world’s largest Internet retailer had lost during the two-hour outage. A little back-of-the-envelope math gives a rough idea. When the company reported its first quarter numbers, it estimated that it would have net sales of between $3.875 billion and $4.075 billion in the second quarter of this year. The midpoint of that is $3.975 billion: $43,681,319 per day or $1,820,054 per hour. So, theoretically, the outage lost the company $3,640,109, with the caveat that this is just averaging the numbers out and not taking to account how busy mid-day Friday is, as opposed to other times of the week. Regardless, a decent chunk of change.Of course, as Silicon Alley Insider pointed out, “When customers who wanted to buy something from Amazon went to the site and found it down, the majority of them likely figured the glitch was temporary and decided to check back later this afternoon. And lo and behold–it was temporary. So they’re probably placing their orders right now.” So, in reality, the likely damage is probably minimal. It would take repeated outages for Amazon to start feeling the impact from downtime.
NPR’s On the Media ran a feature recently on entrepreneur Joshua Karp’s new startup the Printed Blog (TPB), a web aggregator that takes the best online content and… puts it on paper. Karp plans to print TPB twice a day and hand it out for free in major urban outlets. Content and advertising will be localized, and readers can go online to discuss and recommend articles and content they would like to see.The timing of the announcement coming, as it does, close on the heels of the Atlantic Monthly’s (hopefully) exaggerated reports of the NYT’s demise seemed almost comical. My initial reaction was to check the calendar. Having confirmed that it wasn’t April, I became incredulous, made snarky comments to the radio, and finally accepted the idea.Although at first glance Karp’s project seems endearingly quixotic, it does have one thing going for it: depending on how the content is selected, it could be an excellent tool for encouraging the development of a sense of physical community. Although the web has successfully connected people with similar interests, it hasn’t done the same for people with similar addresses. TBP could be a great tool for making highly visible, localized announcements. Having a block party? Print an ad in TBP’s morning edition. Canceled because of rain? An announcement in the evening edition will come out just in time to catch commuters on their way home. If done in the right way, TBP has the potential to provide a legitimate and much needed public service. Not to mention it will be a great way to expose less web savvy members of the community to some of the fantastic writing that’s being done on blogs today.On the downside, it will have to overcome several major obstacles. First, iPhones and similar technology have already made the web portable. I assume the target demographic is web savvy young professionals between the ages of 22-30, a hunch confirmed by the web site’s blog (yes, they have a blog). This is precisely the group that is most likely to already have the Internet in their pocket, delivering their favorite blogs to them at the speed of inanity. TPB might introduce them to new content, but isn’t that what Digg and Delicious are for? And as for the reader suggested content… If the readers don’t access blogs offline, what makes Karp think they’ll log on to share their opinion? To make matters worse, the people for whom this service would be most useful, those without the means or knowledge to use computers, won’t be able to vote. A mismatch between the content and the audience seems inevitable.The second issue is cost. Karp estimates that his initial venture will cost $15,000. He anticipates selling ads for $25 apiece, meaning he’ll have to sell 600 to cover his overhead. Because the publication is intended to be “hyper-localized,” I assume he’s going to be targeting local businesses for ad revenue. I’m not sure how many of them will shell out that kind of money for a daily ad, but as a point of comparison, Google ads are free as long as no one clicks on them (and very cheap even then), run indefinitely, and are guaranteed to reach your target audience, regardless of their geographic location (a concern if you’re trying to advertise to tourists). Hard to beat that deal. And besides, isn’t the lack of willing advertisers print media’s biggest problem? I’d love to have seen Karp pitch that business model to potential investors. Granted, local free papers, like the San Francisco Guardian, seem to be doing well.The experiment begins on January 27th in Chicago and San Francisco, but if successful I suppose the model can be easily rolled out at minimal cost nationwide. Although, I’m still skeptical, The Millions never turns down free publicity. Why don’t you suggest they include us in their first issue?
For several years, it seemed as though the book industry was getting a reprieve. As the music industry was ravaged by file sharing, and the film and TV industry were increasingly targeted by downloaders, book piracy was but a quaint cul de sac in the vast file sharing ecology. The tide, however, may be changing. Ereaders have become mainstream, making reading ebooks palatable to many more readers. Meanwhile, technology for scanning physical books and breaking the DRM on ebooks has continued to advance.
A recent study by Attributor, a firm that specializes in monitoring content online, came to some spectacular conclusions, including the headline claim that book piracy costs the industry nearly $3 billion, or over 10% of total revenue. Of all the conclusions in the Attributor study, this one seemed the most outlandish, and the study itself might be met with some skepticism since Attributor is in the business of charging companies to protect their content from the threat of piracy.
Nonetheless, the study, which monitored 913 titles on several popular file hosting sites, did point to a level of activity that suggested illegal downloading of books was becoming more than just a niche pastime. Even if the various extrapolations that led to the $3-billion figure are easy to poke holes in, Attributor still directly counted 3.2 million downloaded books.
For some, however, the study may inspire more questions than answers. Who are the people downloading these books? How are they doing it and where is it happening? And, perhaps most critical for the publishing industry, why are people deciding to download books and why now? I decided to find out, and after a few hours of searching – stalled by a number dead links and password protected sites – I found, on an online forum focused on sharing books via BitTorrent, someone willing to talk.
He lives in the Midwest, he’s in his mid-30s and is a computer programmer by trade. By some measures, he’s the publishing industry’s ideal customer, an avid reader who buys dozens of books a year and enthusiastically recommends his favorites to friends. But he’s also uploaded hundreds of books to file sharing sites and he’s downloaded thousands. We discussed his file sharing activity over the course of a weekend, via email, and in his answers lie a critical challenge facing the publishing industry: how to quash the emerging piracy threat without alienating their most enthusiastic customers. As is typical of anonymous online communities, he has a peculiar handle: “The Real Caterpillar.” This is what he told me:
The Millions: How active are you. How many books have you uploaded or downloaded?
The Real Caterpillar: In the past month, I have uploaded approximately 50 books to the torrent site where you contacted me. I am much less active then I once was. I used to scan many books, but in the past two years I have only done a few. Between 2002-2005 I created around 200 ebooks by scanning the physical copy, OCRing and proofing the output, and uploading them to USENET. I generally only upload content that I have scanned, with some exceptions. I have been out of the book scene for a while, concentrating on rare and out of print movies instead of books because it is much easier to rip a movie from VHS or DVD than to scan and proof a book.
I have downloaded a couple thousand ebooks via USENET and private torrent sites.
TM: Do you typically see scanned physical books or ebooks where the DRM has been broken?
TRC: Most of what I have seen is scanned physical books. Stephen King’s Under the Dome was the first DRM-broken book I downloaded knowingly.
TM: Why have you gone this route as opposed to using a library or buying books? Do you consider this “stealing” or is it a gray area?
TRC: I own around 1,600 physical books, maybe a third of which were bought new, the rest used. I buy many hardcovers in a given year and generally purchase more books than I end up reading, so I have not chosen to collect electronic books as opposed to paper books but in addition to them. My electronic library has about a 50% crossover with my physical library, so that I can read the book on my electronic reader, “loan” the book without endangering my physical copy, or eventually rid myself of the paper copy if it is a book I do not have strong feelings about.
I do not buy DRM’d ebooks that are priced at more than a few dollars, but would pay up to $10 for a clean file if it was a new release.
I do not pretend that uploading or downloading unpurchased electronic books is morally correct, but I do think it is more of a grey area than some of your readers may. Perhaps this will change as the Kindle and other e-ink readers make electronic books more convenient, but the Baen Free Library is an interesting experiment that proves that at least in that case, their business was actually enhanced by giving away their product free. That is probably not a business model that will work for everyone, but what is shows is that as a company they have their ear to the ground and are willing to think in new directions and take chances instead of putting their fingers in their ears, closing their eyes, and railing against their customers, as the
music industry is doing. The world is changing and business models have to change with it.
Three additional points:
1) With digital copies, what is “stolen” is not as clear as with physical copies. With physical copies, you can assign a cost to the physical product, and each unit costs x dollars to create. Therefore, if the product is stolen, it is easy to say that an object was stolen that was worth x dollars. With digital copies, it is more difficult to assign cost. The initial file costs x dollars to create, but you can make a million copies of that file for no cost. Therefore, it is hard to assign a specific value to a digital copy of a work except as it relates to lost sales.
2) Just because someone downloads a file, it does not mean they would have bought the product I think this is the key fact that many people in the music industry ignore – a download does not translate to a lost sale. I own hundreds of paper copies of books I have e-copies of, many of which were bought after downloading the e-copy. In other cases I have downloaded books I would never have purchased, simply because they were recommended or sounded interesting.
3) Just because someone downloads a file, it doesn’t mean they will read it. I realize that buying a book doesn’t mean someone is going to read it either, but clicking a link and paying $10-$30 is very different – many more people will download a book and not read it than buy a book and not read it.
In truth, I think it is clear that morally, the act of pirating a product is, in fact, the moral equivalent of stealing… although that nagging question of what the person who has been stolen from is missing still lingers. Realistically and financially, however, I feel the impact of e-piracy is overrated, at least in terms of ebooks.
TM: How easy is it to go online and find a book you’re looking for? How long does it take to download and how much technical expertise is required?
TRC: I have specific tastes, so it is usually not very easy to find specifically what I am looking for. The dearth of material I was interested in is what prompted me to scan in the past, in order to share some of my favorite, less popular authors with as many people as possible. It does not take much time to download once something you want has been found, however, and little technical experience is required.
Since books are generally very small files, they can be downloaded in minutes. You can then convert the file using one of many applications, for instance Mobipocket Creator, to PRC or another format that works with your reader. You can then plug your Kindle into your computer and copy the file over. The entire process typically takes 5-10 minutes.
BitTorrent technology is easy to install and use, and just about anyone can install the basic software needed and begin downloading their first torrent in less than an hour. However, discovering and gaining access to private torrent sites (invite only) can take a lot of time – and of course, that is where the good stuff is. Public sites (no account needed) and semi-private sites (sites that require an account, but usually have open enrollment) have a limited selection, but are easily accessible and anyone with basic computer skills can find and download very popular novels.
Usenet is an older technology, and is considered a safer place to pirate files. For older users like me who were around at the beginning of the internet it seems very simple, but to newer computer users it may seem unnecessarily complex, and more expensive because you need an account separate from your regular internet connection to access it.
TM: Once you’ve downloaded a book, what format is it in and how do you read it? On you computer? Printed out?
TRC: My preferred format for distribution is RTF because it holds metadata such as italics, boldfaces, and special characters that TXT does not, is easily converted to other formats using Word, cannot contain a virus, and is an open format that will be readable forever. Other popular formats are DOC, HTML, PDF, LIT (Microsoft Reader), PRC (Palm), MOBI (Palm), CBR (rar’d image files) – and there is a new format with each new reader that is released. Most formats can be converted to your preferred format with enough ingenuity or the
To read, I convert to PRC and load the books onto my Kindle. Before I got that, I read on my Palm or laptop.
TM: How long does it take you to scan a physical book?
TRC: The scanning process takes about 1 hour per 100 scans. Mass market paperbacks can be scanned two pages at a time flat on the scanner bed, while large trades and hardcovers usually need to be scanned one page at a time. I’m sure that some of the more hardcore scanners disassemble the book and run it through an automatic feeder or something, but I prefer the manual approach because I’d like to save the book, and don’t want to invest in the tools. Usually I can scan a book while watching a movie or two.
Once scanned, the output needs to be OCR’d – this is a fairly quick process using a tool like ABBYY FineReader.
The final step is the longest and most grueling. I’ve spent anywhere from 5 to 40 hours proofing the OCR output, depending on the size of the book and the quality of type in the original. This can be done in your OCR tool side-by-side with the scan of the original image or separately in your final output type (RTF, DOC, HTML, etc.). If there are few errors on the first few pages of text my preference is to proof in RTF, otherwise I do the proof within Finereader itself.
TM: What types of books do you look for? What is generally available? Is any fiction or popular non-fiction available?
TRC: I restrict my downloads to books I will likely read – this includes some popular novels, literary novels, and general non-fiction such as humor, biography, science, sociology, etc. Unlike DVD rips, the newest releases are not typically available two weeks before the product is released, if at all. I’m assuming that this is due to the smaller devoted audience books have, as well as the increased difficulty of sharing a book.
TM: Do you have a sense of where these books are coming from and who is putting them online?
TRC: I assume they are primarily produced by individuals like me – bibliophiles who want to share their favorite books with others. They likely own hundreds of books, and when asked what their favorite book is look at you like you are crazy before rattling of 10-15 authors, and then emailing you later with several more. The next time you see them, they have a bag of 5-10 books for you to borrow.
I’m sure that there are others – the compulsive collectors who download and re-share without ever reading one, the habitual pirates who want to be the first to upload a new release, and people with some other weird agenda that only they understand.
TM: Is it your sense that a lot of people are out there looking to get books this way? Or is it just a tiny group?
TRC: I would say that there is a small unaffiliated “group” of people responsible for sourcing the material.
Also, keep in mind that everything I’m saying applies mostly to fiction and general-interest non-fiction.
Textbook, programming and technical manuals are all over the place and its very easy to obtain almost anything you want. I assume there are more sources for that material, and that their high price is a larger factor in people deciding to pirate them. Similarly, there are many communities creating comic, graphic novel and magazine content of whom I am only vaguely aware.
TM: Do you worry at all about getting in trouble for scanning and uploading ebooks?
TRC: A little, but the books I do are typically not bestsellers and are rarely new. I figure I have a bit of a buffer if trouble comes down because the Stephen King or Nora Roberts or “whoever the latest bestseller is” scanners would be the ones to get hit first. I’ve done a lot of out-of-print stuff, and when it is not out of print it’s books by authors like John Barth – someone who no longer sells very well, I imagine.
I’ve debated doing some newer authors and books, but I would need to protect myself better and resolve the moral dilemma of actually causing noticeable financial harm to the author whose work I love enough to spend so much time working on getting a nice e-copy if I were to do so.
TM: What changes in the ebook industry would inspire you to stop participating in ebook file sharing?
TRC: This is a tough question. I guess if every book was available in electronic format with no DRM for reasonable prices ($10 max for new/bestseller/omnibus, scaling downwards for popularity and value) it just wouldn’t be worth the time, effort, and risk to find, download, convert and load the book when the same thing could be accomplished with a single click on your Kindle. Even in this situation, I would probably still grab a book if I stumbled across the file and thought it might interest me – or if I wanted to check it out before buying a paper copy.
I was impressed by the Indie filmmakers of the movie “Ink” – when their movie leaked before the DVD was released, they put a donation button on their site doubleedgefilms.com. I donated even though I haven’t watched the movie yet, just because of their thoughtfulness and sincerity. This didn’t seem to work for King’s “The Plant“, but I think that had a lot to do with the lack of reading technology at the time. I would like to see the experiment tried again by someone like Eggers or Murakami – someone with a very devoted fanbase.
Perhaps if readers were more confident that the majority of the money went to the author, people would feel more guilty about depriving the author of payment. I think most of the filesharing community feels that the record industry is a vestigal organ that will slowly fall off and die – I don’t know to what extent that feeling would extend to publishing houses since they are to some extent a different animal. In the end, I think that regular people will never feel very guilty “stealing” from a faceless corporation, or to a lesser extent, a multi-millionaire like King.
One thing that will definitely not change anyone’s mind or inspire them to stop are polemics from people like Mark Helprin and Harlan Ellison – attitudes like that ensure that all of their works are available online all of the time.
[Image credit: Patrick Feller]
Science fiction author and Boing Boing blogger Cory Doctorow explains why science fiction writers should be excited that theirs is the “only literature people care enough about to steal on the Internet.” Doctorow has made his books freely available on the Internet – while also selling copies through traditional channels – and has been impressed by the results:I’ve discovered what many authors have also discovered: releasing electronic texts of books drives sales of the print editions. An SF writer’s biggest problem is obscurity, not piracy. Of all the people who chose not to spend their discretionary time and cash on our works today, the great bulk of them did so because they didn’t know they existed, not because someone handed them a free e-book version.The full column is available at Locus Online. For my thoughts on these topics a good place to start is here.
In December, I wrote about HarperCollins’ plan to host digitized copies of their books on their own Web site rather than make them available to Google’s book search. Now the AP is reporting that HarperCollins has unleashed its first offering in this format, Go It Alone, a business book by Bruce Judson. The book is available, in its entirety, at Judson’s Web site. As Google does with its book search, HarperCollins has surrounded the book with contextual ads and provided a link to buy the book. The article points out the supposed irony of using Google ads, but I see Yahoo ads in there too and anyway, HarperCollins isn’t trying to screw over Google, they’re trying to maintain control over the process. HarperCollins has mostly gotten good reviews for their efforts primarily because they’re not using any sort of Digital Rights Management (DRM) to “protect” their intellectual property. To some, this approach is nothing new. As is noted in the article, marketing guru Seth Godin and science fiction author Cory Doctorow (to give two examples), have both made their books available in this way. The news here is that a major publisher is doing it.Based on this article, though, HarperCollins doesn’t seem to understand that by allowing easy, free access to the book, they are, in effect, using the book as marketing for itself in much the same way that one can flip through a book at bookstore before buying. Instead they view the ads displayed next to the book’s pages as a “new revenue stream.” That’s why you shouldn’t expect to see any fiction as a part of this program. According to Brian Murray, group president of HarperCollins, “I don’t think advertisers are clamoring to place ads around literary fiction.” Hence, no literary fiction.