You are hereBlogs / blog de Carlos Miranda Levy / Google Books, the Open Content Alliance and the Internet Archive

Google Books, the Open Content Alliance and the Internet Archive


By Carlos Miranda Levy - Posted on 04 Enero 2010

(Note: This is still a very early draft. Feel free to comment and make suggestions, corrections and excuse the errors.)

The Internet Archive and Google Books: Why can't they be friends?

The Internet Archive has been scanning books and making them available for free for a little longer than Google (it was founded on 1996 with the primary goal of archiving all web pages for posterity, since they change constantly and disappear for ever eventually). Google Books program for scanning books in major libraries starting with Oxford University was announced on December 2004 and making them available to the public for free.

Both projects seem like a blessing and well meant and one would think they would even be happy to cooperate with each other. Well, not so. As it turns out, the Internet Archive is a non-profit initiative, founded by technology entrepreneur Brewster Khale and Google Books is an initiative of Google, a commercial operation. Yes, both aim to make all books in the universe available in digital format for free to everyone, but one, given its nature expects to profit from this.

The Internet Archive's Plan for Global Access to Culture and Content

Brewster Khale, founder of the Internet Archive, knows all there is to know of Internet start-ups and business models where profit is based on providing free services. He also knows very well the unexpected turns a business endeavor can take, as his abundant wealth comes from himself being an entrepreneur who invented WAIS and sold it to AOL and later created Alexa.com and sold it to Amazon.

The Internet Archive, that is Brewster Khale in person and the people in his payroll had been working for a couple of years on getting books, audio, video, TV, everything digitized when no one cared for them. Their collection includes thousands of hours of TV from Egypt, Iraq and places most of us do not care for, but that someone, some day, somewhere would eventually be interested in. Their engines crawled the Internet, saving copies of websites as they changed everyday. Their emmisaries traveled to remote places to negotiate setting up huge digital scanners to make high resolution digital copies of arcane books, even restoring them when needed and then carefully doing optical character recognition on them to make their content searchable and portable. They had envisioned a digital universal library nirvana, funded by sponsors and by several services they offer to libraries and companies for a fee (aha) in order to be sustainable.

During my time as fellow at Stanford University, Khale explained their plan to me: The Internet Archive would help you scan all the books in your library if you wanted, but you have to pay for the scanner and for every page scanned. It was a small sum he said, when you took in consideration that the Internet Archive would host the libraries content forever, for free.

Google's Plan for Domination of Global Access to Culture and Content

Along comes Google, with plenty of resources and announces on December 2004 its plan to digitize all the books in the world. Of course, Brewster Khale is hurt. Of course The Internet Archive resents this. That is what they had been saying they would do. And now Google is talking, and even not talking, to the same people they had been talking to for a couple of years.

Brewster Khale calls himself a Digital Librarian, that is the actual title on his presentation card. Google couldn't care less for the title. They want to scan the books and make them available to everyone for free and devise a way to make money (lots of it, of course) in the process.

The Open Content Alliance

After much thought and plenty of discussions and meetings, on October 2005, 10 months after Google Books was announced, Yahoo! and the Internet Archive got together and with support from Microsoft (until 2008 when they dropped out of the book digitizing race), the University of California, the University of Toronto, the National Archive in England and other institutions announced the Open Content Alliance, with the explicit goal of offering resistance and an open alternative to Google Books project.

Google's Approach vs. The Internet Archive's approach.

To the common citizen, the main differences lie first in that Google Books is scanning and indexing all books (public domain or not) in the libraries it partners with and that books scanned by Google. The second critical difference is that although many of the public domain books are available for download, all content digitized by Google is available only through the Google Books service. Yes, access is free and content can be embedded for free in any website or online application, but only via Google's service.

The Open Content Alliance on the other hand, only works with digital versions of content (audio, text, videos, images, etc.) in the public domain (those whose copyright has expired) and/or released by their authors under Creative Common licenses. This content is  made available for free through the Internet Archive and its partners and is indexable by any search engine (Yahoo!, Microsoft's and even Google).

The Rebel Alliance vs. The Evil Empire

It is inevitable that some of the Open Content Alliance will link the name and compare it to the Rebel Alliance's resistance against the evil Empire in Star Wars. In 2005, Google, praised by many as the sworn enemy of the old Microsoft empire, now found itself in a position it was not familiar with: Google was now considered the enemy and the empire.

The Open Content Alliance questions the power of Google to sequester public domain books, to profit from them and to offer exclusive access to them. Hence their open and fierce oposition and criticism to Google Books.

They have been very vocal and active in the strong oposition the editorial world has presented Google Books in the press and in the courts (plenty of lawsuits) and when Google finally announced a settlement and agreement with publishers and authors who had presented formal lawsuits against the project, the Open Content Alliance, through the voice and hand of Brewster Khale, celebrates and openly claims the existence of "around 400 objections to the settlement objecting to Google’s treatment of out-of-print books– companies, libraries and even countries".

Is it a Matter of Control?

Google Books has been criticized and formally sued in courts by a large number of libraries, editors, publishers and associations, including the USA Authors Guild, the Federation of European Publishers (Google lost a case in French courts on December 2009 and is currently appealing), the New York Library Association, and of course as expected, Amazon.com, Microsoft and Yahoo!, among many others.

What is that triggers this resistence? In the case of commercial publishers it is easy to understand, as Google is digitizing and indexing their books and allowing people to search them for free.

When it comes to authors, one might understand as well, specially in the case of those trying to make a living of book sales. But since copyrighted books can only be searched but not downloaded or read entirely through the service, one would think that Google Books might actually boost their sales.

But libraries? Institutions meant to promote free access to content and culture opposing an initiative that is partnering with libraries to digitize their content and make it freely available to the public. This one is a little more complicated to understand.

That is unless you think about the shift in power and control of who is making the content freely available to the public: the library or Google. And it is even easier to understand when you see Brewster Khale's voice and hand in most of this.

When Google announced an agreement and settlement for the court case with authors and publishers, Amazon.com, Microsoft, Yahoo!, the Internet Archive and most members of the Open Content Alliance, announced the creation of the Open Book Alliance, to "counter Google, the Association of American Publishers and the Authors’ Guild’s scheme to monopolize the access, distribution and pricing of the largest digital database of books in the world" as expressed in their mission. Interestingly enough, the Open Book Alliance is co-chaired by Peter Brantley, Director of the Internet Archive and under Brewster Khale's payroll.

Understanding the Fear

The Internet Archive, the Open Content Alliance and Brewster Khale (like I said before, it is hard to tell them appart most of the time) insist on the danger of digitized public domain content being made available through a single commercial source, even though the content is made available for free and even though Google now allows you to download such content in portable, searchable formats and even though Google has reached a court-sanctioned agreement with authors and editorials.

One easy to grasp aspect of such a danger is that while public domain content is free and public, any reproduction you make of it is not. So, while the text of Treasure Island is in the public domain by default, Google's digital copy of it is only in the public domain if Google chooses to make it so. Just like you can visit the Roman Coliseum, but any pictures you make of it are your own. You can choose to share or sell those pictures, as they are yours, but of course anyone else can make pictures of the Coliseum to compete with you. The one thing they can not do is to use, access or modify your pictures unless you allow them and give them permission to do so.

Google Advantage: Coolness and Ease of Use

Availability of All Books, not just old stuff. Google Books works with libraries to digitize all of their content. This includes not just books in the public domain, but also current texts and publications. This has been one of the many criticisms and reasons for opposition by the editorial world. In response to concerns of evil Google allowing people to read books which are available at a library for free, Google does limit access to books not in the public domain to only a handful of pages at a time, although you can still search the whole book and see its table of content, regardless of it being in the public domain or not.

Things change fast in the technology world. And recent changes in Google Books approach to and handling of public domain texts as well as the openness of Google technology eliminate most of the reasons for criticism in practical term to the end user, although they do not eliminate the fear for potential control or enforcement of ownership by Google.

Downloading of Public Domain Books. Google announced on August 2009 that it was making more than one million public domain books available for download not only as .pdf files but also in the popular open epub format, making them available to most e-book readers and for easy search and indexing operations.

Embedding of Public Domain Books. You can also easily embed any of the public domains in your own website, blog or online publication. There is no need to copy or reformat the book, just copy and paste a little piece of code available at the book's page on Google Books and the whole book is available at your site, search feature included. As an example, I am embedding Treasure Island right here:

Allowing any site to search Google Books. Thanks to Google's open Application Programming Interface (API), any website or online publication or application can freely search books available in Google Books and list the results in their own website or page without their user leaving their page or going to Google Books website.

The Internet Archive and Open Content Alliance Advantage: Multiple media formats, collaboration with other projects and searches available by multiple sources

Distribution by Multiple Providers. One of the first advantages pointed out by the Alliance is that the content of Books digitized by the Internet Archive are available to search engines affiliated with the Open Content Alliance, and not just through a single provider.

Multiple Formats. On the other hand, the Internet Archive is not just about scanning books, it digitizes video, audio and pictures too. It also maintains an Audio Archive which includes audio versions of thousands of books in the public domain or released under Creative Commons licenses.

This turns out to be specially important in today's changing media world as the following example will show. There is over a dozen different versions of Treasure Island in the Internet Archive. This includes both text and audio versions of the books as well as videos about it and even a videogame based on the book. For the purpose of our example, let us focus on the four most viewed/downloaded.

  • One is a digital copy of the text, made available by Project Gutenberg, which has been downloaded from the Internet Archive 472 times at the time of writing. It is only available as a plain text file or HTML web format.
  • The second one we find is another digital copy of the text, part of the Internet Archive American Libraries collection, which includes over 1.2 million books scanned from libraries in the USA using modern technologies, in partnership with the Internet Archive, available in multiple formats, including .pdf, Kindle, epub, plain text and also available for reading on-line. This one has been downloaded 932 times.
  • But the interesting point here is highlited by the third and the fourth versions we find. They are both audio version of the books, one read by multiple volunteers and one read by a single person, and both part of the Librivox.org initiative. They are indexed and catalogued under the Internet Archive Audio Archive and available in the popular .mp3 format every teenager is familiar with. They one read by multiple volunteers has been downloaded 179,122 times and the one read by a single person 41,768 times.

The difference in views/downloads on our example is abysmal when it comes to format and might mean much more than just a preference or the fact that Treasure Island remains a mandatory or recommended reading in many schools and is still a favorite of both reading aficionados and literary masters as well. It might be hinting us of where the future of literature and reading might lead, regardless of what many die-hard plain text readers insist on.

Cooperation with Multiple Initiatives and Sources

One of the Internet Archive's and the Open Content Alliance's strengths, as seen in the example above is their cooperation with multiple initiatives.

While Google Books collaborates mostly with libraries and academic institutions, the Internet Archive is open to collaboration with other book and media digitizing initiatives, as it does not see them as competitors. It even lists and embed Books digitized by Google Books.

Three of the four versions of Treasure Island in the Internet Archive listed above come from sources outside the Internet Archive. You can actually read and download Google Books version of Treasure Island at the Internet Archive.

Enviar un comentario nuevo

El contenido de este campo se mantiene privado y no se mostrará públicamente.
  • Saltos automáticos de líneas y de párrafos.
  • Etiquetas HTML permitidas: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <b> <blockquote> <hr> <span> <img> <h2> <h3> <h4> <h5> <h6> <p> <br>
  • Youtube and google video links are automatically converted into embedded videos.
  • Las direcciones de las páginas web y las de correo se convierten en enlaces automáticamente.
  • Se pueden agregar imágenes a este envío.
  • Insert Flickr images: [flickr-photo:id=230452326,size=s] or [flickr-photoset:id=72157594262419167,size=m].

Más información sobre opciones de formato

Follow SocInfo on:

Próximos eventos

  • No hay próximos eventos disponibles

Sindicar

Distribuir contenido