Comparison of e-book formats

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Contents

The following is a comparison of e-book formats used to create and publish e-books.

A writer or publisher has many options when it comes to choosing a format for production. While the average end-user might arguably simply want to read books, every format has its proponents and champions. The myriad of e-book formats is sometimes referred to as the "Tower of eBabel". [1] For the average end user who wants to read a book, each format has its advantages and disadvantages.

[edit] Formats

Formats available include, but are by no means limited to:

[edit] Plain text files

Format: text
Published as: .txt

E-books in plain text exist and are very small in size. For example, the Bible is about 4 MB.[2] The ASCII standard allows ASCII-only text files (unlike most other file types) to be interchanged and readable on Unix, Macintosh, Microsoft Windows, DOS, and other systems. These differ in their preferred line ending convention and their interpretation of values outside the ASCII range (their character encoding).

[edit] Hypertext Markup Language

Format: Hypertext
Published as: .htm; .html

HTML is the markup language used for most web pages. E-books using HTML can be read using a Web browser. The specifications to the format are available without charge from the W3C.

As markup language, HTML adds especially marked meta elements to otherwise plain text encoded using character sets like ASCII or UTF-8. As such suitably formatted files can be, and sometimes are, generated by hand using a plain text editor or programmer's editor. Many HTML generator applications exist to ease this process and often require less intricate knowledge of the format details involved.

HTML on its own is not a particularly efficient format to store information, requiring more storage space for a given work than many other formats. However, several e-Book formats including the Amazon Kindle, Open eBook, Compressed HM, Mobipocket and IDPF/EPUB use one HTML file for each book chapter and then Zip compress the files, along with images, metadata and style sheets into one file.

[edit] Amazon Kindle

Format: Kindle
Published as: .azw

With the launch of the Kindle eBook reader, Amazon.com created the AZW format. It is based on the Mobipocket standard, with a slightly different serial number scheme (it uses an asterisk instead of a Dollar sign) and its own DRM formatting. Because the eBooks bought on the Kindle are delivered over its wireless system called Whispernet, the user does not see the AZW files during the download process.

[edit] Open Electronic Package

Format: Open eBook
Published as: .opf

OPF is an XML-based e-book format created by E-Book Systems.

[edit] TomeRaider

Format: TomeRaider
Published as: .tr2; .tr3

The TomeRaider e-book format is a proprietary format. There are versions of TomeRaider for Windows, Windows Mobile (aka Pocket PC), Palm, Symbian, iPhone and more[specify]. Several Wikipedias are available as TomeRaider files with all articles unabridged, some even with nearly all images. Capabilities of the TomeRaider3 e-book reader vary considerably per platform: the Windows and Windows Mobile editions support full HTML and CSS. The Palm edition supports limited HTML (e.g., no tables, no fonts), and CSS support is missing. For Symbian there is only the older TomeRaider2 format, which does not render images or offer category search facilities. Despite these differences any TomeRaider e-book can be browsed on all supported platforms. The Tomeraider website[3] claims to have over 4000 e-books available, including free versions of the Internet Movie Database and Wikipedia.

[edit] Arghos Diffusion

Format: Arghos Reader
Published as: .aeh

The AEH format is an XML-based proprietary format developed by the French firm Arghos Diffusion. AEH files use a proprietary DRM and encryption method and are readable only in the Arghos Player. It supports various input formats for text, audio or video, such as PDF, WMA, MP3, WMV, and allows multiple interactive functions such as bookmarking, advanced plain-text searching, dynamic text highlighting, etc.

[edit] Flip Books

Format: Interaxive media
Published as:

A "Flip Book" is a type of E-Book distinguished by virtual pages that actually "flip", much like turning pages of paper in a real book or magazine. The first dynamic Flip Book Reader was developed in 2003/2004 by Interaxive Media for Nishe Media (Canada) and was therefore called "Nishe Pages". The first version was produced in part by Cybaris (Canada) and was first publicly showcased in August 2004. Soon thereafter, many copycat "flip books" started appearing thanks to technological advances in Macromedia Flash, mostly hard coded using Flash components.

The original software remains unique in that it is powered by a complete server-based CMS system that allows the books to be created, published, and viewed remotely from a web server without requiring any custom software to be installed. Nishe Media went defunct in 2004, leaving the unfinished software to Interaxive Media who continued its development in Hong Kong. Though not widely used outside of Asia, it is now at version 3.0 and can be a server-based E-Book platform. It remains privately held by the original developer, Ryan Sutherland, owner and founder of Interaxive Media.

[edit] NISO Z39.86

Format: DAISY
Published as: DTB[citation needed]

DAISY is an XML-based e-book format created by the DAISY international consortium of libraries for people with print disabilities. DAISY implementations have focused on two main types: audio e-books and text e-books. A subset of the DAISY format has been adopted by law in the United States as the National Instructional Material Accessibility Standard, and K-12 textbooks and instructional materials are now required to be provided to students with disabilities. [4]

[edit] FictionBook

Format: FictionBook
Published as: .fb2

FictionBook is a popular XML-based e-book format, supported by free readers such as Haali Reader and FBReader. See http://haali.cs.msu.ru/pocketpc/FictionBook_description.html

[edit] Text Encoding Initiative

Format: TEI Lite
Published as: .xml[citation needed]

TEI Lite is the most popular of the TEI-based (and thus XML-based or SGML-based) electronic text formats.

[edit] Plucker

Format: Plucker
Published as:

Plucker is a free e-book reader application with its own associated file format and software to automatically generate plucker files from HTML files, web sites or RSS feeds. The format is a compressed HTML archive, somewhat like Microsoft's CHM.

[edit] Compressed HM

Format: Microsoft Compressed HTML Help
Published as: .chm

CHM format is a proprietary format based on HTML. Multiple pages and embedded graphics are distributed along with proprietary metadata as a single compressed file. In contrast, in HTML, a site consists of multiple HTML files and associated image files in standardized formats.

[edit] Portable Document

Format: Adobe Portable Document
Published as: .pdf

A file format created by Adobe Systems, initially to provide a standard form for storing and editing printed publishable documents. The format derives from PostScript, but without language features like loops, and with added support for features like compression and passwords. Because PDF documents can easily be viewed and printed by users on a variety of computer platforms, they are very common on the World Wide Web. The specification of the format is available without charge from Adobe.

PDF files typically contain brochures, product manuals, magazine articles — up to entire books, as they can embed fonts, images, and other documents. A PDF file contains one or more zoomable page images.

Since the format is designed to reproduce page images, the text traditionally could not be re-flowed to fit the screen width or size. As a result PDF files designed for printing on standard paper sizes are less easily viewed on screens with limited size or resolution, such as those found on mobile phones and PDAs. Adobe has addressed this by adding a re-flow facility to its Acrobat Reader software, but for this to work the document must be marked for re-flowing at creation [5], which means existing PDF documents will not benefit unless they are tagged and resaved. The Windows Mobile (aka Pocket PC) version of Adobe Acrobat will automatically attempt to tag a PDF for reflow during the synchronization process using an installed plugin to Active Sync. However, this tagging process will not work on most locked or password protected PDF documents. It also doesn't work at present (2009-10) on the Windows Mobile Device Center (Active Syncs Successor) as found in Windows Vista and Windows 7. This limits automatic tagging support during synchronization to Windows XP/2000.

Multiple products support creating and tagging PDF files, such as Adobe Acrobat, PDFCreator, OpenOffice.org, iText, and FOP, and several programming libraries. Adobe Reader (formerly called Acrobat Reader) is Adobe's product used to view PDF files; third party viewers such as xpdf are also available. Mac OS X has built-in PDF support, both for creation as part of the printing system and for display using the built-in Preview application.

Later versions of the specification add support for forms, comments, hypertext links, and even interactive elements such as buttons for forms entry and for triggering sound and video. Such features may not be supported by older or third-party viewers and some are not transferable to print.

PDF files are supported on the following e-book readers: iRex iLiad, iRex DR1000, Sony Reader, Bookeen Cybook, Foxit eSlick, Amazon Kindle (1, 2, International & DX), Barnes & Noble nook and the upcoming iPad.

[edit] PostScript

Format: PostScript
Published as: ps

PostScript is a page description language used in the electronic and desktop publishing areas for defining the contents and layout of a printed page, which can be used by a rendering program to assemble and create the actual output bitmap. Many office printers directly support interpreting PostScript and printing the result. As a result, the format also sees wide use in the Unix world.

[edit] DjVu

Format: DjVu
Published as: .djvu

DjVu is a format that specializes in and particularly excels at storing scanned images. It includes advanced compressors optimized for low-color images, such as text documents. Individual files may contain one or more pages.

The contained page images are divided in separate layers (such as multi-color, low-resolution, background layer using lossy compression, and few-colors, high-resolution, tightly-compressed foreground layer), each compressed in the best available method. The format is designed to decompress very quickly, even faster than vector-based formats.

The advantage of DjVu is that it is possible to take a high-resolution scan (300-400 DPI), good enough for both on-screen reading and printing, and store it very efficiently. Several dozens of 300 DPI black-and-white scans can be stored in less than a megabyte.

[edit] Microsoft LIT

Format: Microsoft Reader
Published as: .lit

DRM-protected LIT files are only readable in the proprietary Microsoft Reader program, as the .LIT format, otherwise similar to Microsoft's CHM format, includes Digital Rights Management features. Other third party readers, such as Lexcycle Stanza, can read unprotected LIT files. There are also tools such as Convert Lit, which can convert .lit files to HTML files or OEBPS files.

The Microsoft Reader uses patented ClearType display technology. In Reader navigation works with a keyboard, mouse, stylus, or through electronic bookmarks. The Catalog Library records reader books in a personalized "home page", and books are displayed with ClearType to improve readability. A user can add annotations and notes to any page, create large-print e-books with a single command, or create free-form drawings on the reader pages. A built-in dictionary allows the user to look up words.

[edit] eReader

Formerly Palm Digital Media/Peanut Press
Format: Palm Media
Published as: . pdb

eReader is a freeware program for viewing Palm Digital Media electronic books. Versions are available for iPhone, PalmOS, Android, Symbian, BlackBerry, Windows Mobile Pocket PC/Smartphone, desktop Windows, and Macintosh. The reader shows text one page at a time, as paper books do. eReader supports embedded hyperlinks and images. Additionally, the Stanza application for the iPhone and iPod Touch can read both encrypted and unencrypted eReader files.

The company's web site - ereader.com maintains a wide selection of eReader-formatted e-books, available for purchase and download, with a handful of public domain titles available for free. Those books that aren't free are encrypted, with the key being the purchaser's full name and credit card number. This information is not preserved in the e-book. A one-way hash is used, so there no risk of the user's information being extracted.

The program supports features like bookmarks and footnotes, enabling the user to mark any page with a bookmark, and any part of the text with a footnote-like commentary. Footnotes can later be exported as a Memo document.

The company also offers two Windows/MacOS programs for producing e-books: the Dropbook, which is free, and the eBook Studio, which is not. Dropbook is a file-oriented PML-to-PDB converter; eBook Studio incorporates a WYSIWYG editor. Both programs are compatible with simple text files.

There is also support for an integrated reference dictionary (with many options up to and including a 476,000-word Merriam-Webster Dictionary, including pronunciation keys) so that any word in the text can be highlighted and looked up on the dictionary instantly. Commercial fonts can also be individually purchased and downloaded at the company's web site, ereader.com.

On July 20, 2009, Barnes & Noble announced[6] that the eReader format will be the method they will use to deliver e-books. Updated versions of the Palm Digital programs for Apple iPhone/Touch, Blackberry, Mac OS X, and Windows platforms were made available on the Barnes & Noble eBooks website.

On October 20, 2009, Barnes & Noble announced[7] that their Nook Reader will support the eReader format.

[edit] Desktop Author

Format: DNL Reader
Published as: .dnl; .exe

Desktop Author is an electronic publishing suite that allows creation of digital web books with virtual turning pages. Digital web books of any publication type can be written in this format, including brochures, e-books, digital photo albums, e-cards, digital diaries, online resumes, quizzes, exams, tests, forms and surveys. DesktopAuthor packages the e-book into a ".dnl" or ".exe" book. Each can be a single, plain stand-alone executable file which does not require any other programs to view it. DNL files can be viewed inside a web browser or stand-alone via the DNL Reader.

DNL format is an e-Book format, one which replicates the real life alternative, namely page turning Books. The DNL e-Book is developed by DNAML Pty Limited an Australian company established in 1999. A DNL e-Book can be produced using DeskTop Author or DeskTop Communicator.

[edit] Newton eBook

Format: Newton eBook
Published as: .pkg

Commonly known as an Apple Newton book; a single Newton package file can contain multiple books (for example, the three books of a trilogy might be packaged together). All systems running the Newton operating system (the most common include the Newton MessagePads, eMates, Siemens Secretary Stations, Motorola Marcos, Digital Ocean Seahorses and Tarpons) have built-in support for viewing Newton books. The Newton package format was released to the public by Newton, Inc. prior to that company's absorption into Apple Computer. The format is thus arguably open and various people have written readers for it (writing a Newton book converter has even been assigned as a university-level class project[8]).

Newton books have no support for DRM or encryption. They do support internal links, potentially multiple tables of contents and indexes, embedded gray scale images, and even some scripting capability (for example, it's possible to make a book in which the reader can influence the outcome)[9]. Newton books utilize Unicode and are thus available in numerous languages. An individual Newton book may actually contain multiple views representing the same content in different ways (such as for different screen resolutions).

[edit] Founder Electronics

Format: Apabi Reader
Published as: .xeb; .ceb

APABI is a format deviced by Founder Electronics. It is a popular format for Chinese e-books. It can be read using the Apabi Reader software, and produced using Apabi Publisher. Both .xeb and .ceb files are encoded binary files. The Iliad e-book device includes an Apabi 'viewer'.

[edit] Libris

Format: Mobile Information Device Profile
Published as: .lbr; .bin

Libris is a Java based eBook reader for mobile devices such as cell phones. Libris will run on most Java enabled devices that support MIDP. The reader formats books to fit the device screen, and shows one page at a time using high quality anti-aliased fonts. Books may employ encryption or be unrestricted. Libris content may be produced using the MakeLibris tool. The Libris reader also supports the PalmDoc format.

[edit] Mobipocket

Format: Mobipocket
Published as: .prc; .mobi

The Mobipocket e-book format based on the Open eBook standard using XHTML can include JavaScript and frames. It also supports native SQL queries to be used with embedded databases. There is a corresponding e-book reader. A free e-book of the German Wikipedia has been published in Mobipocket format.[10]

The Mobipocket Reader has a home page library. Readers can add blank pages in any part of a book and add free-hand drawings. Annotations — highlights, bookmarks, corrections, notes, and drawings — can be applied, organized, and recalled from a single location. Mobipocket Reader has electronic bookmarks, and a built-in dictionary

The reader has a full screen mode for reading and support for many PDAs, Communicators, and Smartphones. Mobipocket products support most Windows, Symbian, BlackBerry and Palm operating systems. Using WINE, the reader works under Linux or Mac OS X. Third-party applications like Okular and FBReader can also be used under Linux or Mac OS X, but they work only with unencrypted files.

The Amazon Kindle's AZW format is basically just the Mobipocket format with a slightly different serial number scheme (it uses an asterisk instead of a Dollar sign).

Mobipocket has developed an .epub to .mobi converter called KindleGen[11] (supports IDPF 1.0 and IDPF 2.0 epub format, according to the company).

Notably, Eastern European letters with diacritical marks are not supported.

[edit] EPUB

Format: IDPF/EPUB
Published as: .epub

The .epub or OEBPS format is an open standard for e-books created by the International Digital Publishing Forum (IDPF). It combines three IDPF open standards:

  • Open Publication Structure (OPS) 2.0, which describes the content markup (either XHTML or Daisy DTBook)
  • Open Packaging Format (OPF) 2.0, which describes the structure of an .epub in XML
  • OEBPS Container Format (OCF) 1.0, which bundles files together (as a renamed ZIP file)

Currently, the format can be read by the Apple iPad, Barnes and Noble Nook, Sony Reader, BeBook, Bookeen Cybook v. 2.0, Adobe Digital Editions, Lexcycle Stanza, BookGlutton, AZARDI, Aldiko and WordPlayer on Android and the Mozilla Firefox add-on OpenBerg Lector. Several other reader software programs are currently implementing support for the format, such as dotReader, FBReader, Mobipocket, uBook and Okular. Another software .epub reader, Lucidor, is in beta. Additionally, the Stanza application for the iPhone and iPod Touch can read ePub files offline.

In 2008 BookGlutton launched a server-side HTML-to-EPUB converter.[12]

Adobe Digital Edition uses .epub format for its e-books, with DRM protection provided through their proprietary ADEPT mechanism. The recently developed INEPT framework and scripts have been reverse-engineered to circumvent this DRM system.[13]

DSLibris, a Sourceforge.net project, is able to decode e-books in .epub and .xht format for reading on the Nintendo DS/DS Lite/DSi systems (through the use of a flash linker, such as SuperCard DS One). The e-book is presented in a natural page format (the DS console is held sideways with both screens simulating left and right pages of a book), and page turns are accomplished by either left or right buttons pressed on the directional pad or stylus taps on the left or right side of the touchscreen. Bookmarks can be created using the Select key, and the user can return to them using the up or down directional pad buttons when the e-book is reopened.[14]

[edit] Broadband eBooks

Format: Sony media
Published as: .lrf; .lrx

The digital book format used by Sony Corporation (ソニー株式会社, Sonī Kabushiki Kaisha?) [2]. It is a proprietary format, with no known reader software for non-Sony devices. The LRX file extension represents a DRM encrypted eBook.

[edit] SSReader

Format: SSReader
Published as: .pdg

The digital book format used by a popular digital library company 超星数字图书馆[3] in China. It is a proprietary raster image compression and binding format, with reading time OCR plug-in modules. The company scanned a huge number of Chinese books in the China National Library and this becomes the major stock of their service. The detailed format is not published. There are also some other commercial e-book formats used in Chinese digital libraries.

[edit] Multimedia eBooks

Format: Eveda
Published as: .exe or .html

A Multimedia EBook is media and book content that utilizes a combination of different book content forms. The term can be used as a noun (a medium with multiple content forms) or as an adjective describing a medium as having multiple content forms. Currently, configuration of several forms of media is possible only on the basis of technology Adobe Flash. The technique of a Flip Book is applied to preservation sequences statements of the traditional book.[15]

The 'multimedia eBook' term is used in contrast to media which only utilize traditional forms of printed or text book. Multimedia EBook includes a combination of text, audio, still images, animation, video, and interactivity content forms. The formats used to create a literary fiction book somteimes have an addition of an audio-visual element and interactive contents allowing new form of creativity. The user (eg., reader) has an opportunity to participate in events occurring to characters, to feel influence of a musical part of a narration and graphic part. The perception of several media forms of contents considerably expands depth of transfer power of art and creativity.

[edit] Features and hardware tables

[edit] Features

Format Filename extension DRM support Image support Word wrap support Open standard Embedded annotation support Book- marking
Plain text .txt No No Yes Yes No No
HTML .html No Yes Yes Yes No No
PostScript .ps No Yes No Yes  ?  ?
Portable Document Format .pdf Yes Yes No Yes Yes Yes
DjVu .djvu  ? Yes No Yes  ?  ?
EPUB (IDPF) .epub Yes Yes Yes Yes No No
FictionBook .fb2 Yes Yes Yes Yes Yes  ?
Mobipocket .prc, .mobi Yes Yes Yes Yes Yes Yes
Kindle .azw Yes Yes Yes No Yes Yes
eReader .pdb Yes Yes Yes No Yes Yes
Broadband eBook .lrf, .lrx Yes Yes Yes No  ?  ?
WOLF .wol Yes Yes No No  ?  ?
Tome Raider .tr2, .tr3 Yes Yes Yes No  ?  ?
ArghosReader .aeh Yes Yes Yes No  ? Yes
Microsoft Reader .lit Yes Yes Yes No  ? Yes

[edit] Supporting Hardware

\ Format Reader \ Plain text PDF ePub HTML Mobi- Pocket Fiction- Book DjVu Broadband eBook1 eReader1 Kindle1 WOLF1 Tome Raider1 Open eBook2
Amazon Kindle 2, DX Yes Yes No No Yes No No No No Yes No No No
Apple iPad Yes Yes Yes Yes No No No No No No No No No
Azbooka WISEreader Yes No Yes Yes Yes Yes No No No No No No No
Barnes & Noble Nook No Yes Yes No No No No No Yes No No No No
Bookeen Cybook Gen3, Opus Yes Yes Yes3 Yes Yes3 Yes4 No No No No No No Yes
COOL-ER Classic Yes Yes Yes Yes Yes Yes No No No No No No No
Hanlin e-Reader V3 Yes Yes Yes Yes Yes Yes Yes No No No Yes No No
Hanvon WISEreader Yes Yes Yes Yes No No No No No No No No No
iRex iLiad Yes Yes Yes No Yes No Yes No No No No No No
Iriver Story Yes Yes Yes No No No No No No No No No No
Nokia N900 Yes Yes Yes Yes Yes Yes No No No No No No Yes
NUUTbook 2 Yes Yes Yes No No No No No No No No No No
Onyx Boox 60 Yes Yes Yes Yes Yes Yes Yes No No No No No No
Pocketbook 301 Plus, 302, 360° Yes Yes Yes Yes Yes Yes Yes No No No No No No
Sony Reader Yes Yes Yes No No No No Yes No No No No No
Viewsonic VEB612 Yes Yes Yes Yes Yes No No No No No No No No

1 Proprietary format - 2 Predecessor of ePUB - 3 Versions support either ePUB or MobiPocket - 4 Only ePUB version and with FW 2.0+

[edit] See also

[edit] References

General information
  • Chandler, S. (2007). From entrepreneur to infopreneur: Make money with books, ebooks, and information products. Hoboken, N.J.: John Wiley & Sons.
  • Rich, J. (2006). Self-publishing for dummies. Hoboken, N.J.: Wiley.
  • Cavanaugh, T. W. (2006). The digital reader: Using e-books in K-12 education. Eugene, OR: International Society for Technology in Education.
  • Cope, B., & Mason, D. (2002). Markets for electronic book products. C-2-C series, bk. 3.2. Altona, Vic: Common Ground Pub.
  • Henke, H. (2001). Electronic books and epublishing: A practical guide for authors. London: Springer.
  • Hanttula, D. (2001). Pocket PC handbook.
Footnotes

[edit] External links