Jedisaber.com

Original Content: Home | Books | eBooks | Other | LinksFan Content: Andromeda | Avatars | Anime | Dark Angel | Matrix | Pern | Stargate | Star Trek | Star Wars

ePub.epub eBooks Tutorial Part 2:
Prepare the XML Files

 

Introduction: What is ePub? | Part 1: Formatting Your source content | Part 2: Prepare the XML Files | Part 3: Creating The Container and Adding Your Files


First, let's go check out the official specs. Yes, it's very boring and hard to follow, but aren't they all?
These will come in handy later on though. After getting the basic structure of the file setup, the official specs are handy to reference for tags that aren't used very often, or if you can't remember what exactly goes in a certain tag.
Don't let them scare you though, we really only have to fiddle with two XML files, the rest is either straight XHTML, or files that you can copy from the sample file that we'll be looking at later.

IDPF Specs:

Note: If you want to download an ePub file to un-zip and poke at it's guts to see how it works, I recommend either the sample ePub file mentioned in this guide, or "A Girl of the Commune" by G.A. Henty from the Books page. (Why that one you ask? The reason is that I made all the books on my books page back when the ePub format was brand spanking new. Since then, the format has had a number of minor revisions. You  should be able to read all of those books, but they might not all validate properly. I've updated "A Girl of the Commune" so that it should validate now. Yes, but why that one in particular? Simple. It's the first one in the folder. ;)  )

 

The XML files are all the other stuff in the ePub book that tells where your content is, and what to do with it.

Before we start preparing our own eBook, lets look inside a sample file.

Great. Now what is all this stuff?


The root of the zip file                     The Meta-inf folder                        the OEBPS folder

A .epub file contains, at a bare minimum, the following files/folders:

Lets look at each of these in more detail.
(Feel free to extract these files from the sample.epub and use them as a template)

One thing to note before we get started: the filenames are case sensitive.
This means that if you have a file named "Chapter1.xhtml" and you refer to it as "chapter1.xhtml" in the .OPF file or .NCX file, the book will not display properly.

mimetype
This file is just a plain ASCII text file that contains the line:
"application/epub+zip"
The operating system can look at this file to figure out what a .epub file is instead of using the file extension.
This file must be the first file in the zip file, and must not be compressed.

META-INF Folder
This contains the container.xml file, which points to the location of the Content.opf file.
This folder is the same for every e-book, so you should be able to recycle the whole folder from the sample file without making changes.

OEBPS Folder
Notes on the OEBPS folder:
This is the folder where the book content is stored. According to the IDPF spec, you don't have to put your book content in here, but it is recommended. I've come across at least two readers that won't read the book properly if the content isn't in this folder. (If you do put your book content somewhere else, make sure that you update container.xml to point to the correct location of the content.opf file.)

- images Folder
If you have any images for your eBook, they go in here.

Note: most reading systems support a variety of images, but according to the OPF spec, only PNG must be supported by reading system

- Content.opf
This file gives a list of all files in the .epub container, defines the order of files, and stores meta data (author, genre, publisher, etc.) information.
Note that this file can be named anything you want to call it, as long as the container.xml file mentioned above points to the correct filename.

<?xml version="1.0" encoding="UTF-8"??>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="BookID" version="2.0" >
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
        <dc:title>Sample .epub eBook</dc:title>
        <dc:creator opf:role="aut">Yoda47</dc:creator>
        <dc:language>en-US</dc:language>
        <dc:rights>Public Domain</dc:rights>
        <dc:publisher>Jedisaber.com</dc:publisher>
        <dc:identifier id="BookID" opf:scheme="UUID">jedisaber06282007214712</dc:identifier>
    </metadata>
    <manifest>
        <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml" />
        <item id="style" href="stylesheet.css" media-type="text/css" />
        <item id="pagetemplate" href="page-template.xpgt" media-type="application/vnd.adobe-page-template+xml" />
        <item id="titlepage" href="title_page.xhtml" media-type="application/xhtml+xml" />
        <item id="chapter01" href="chap01.xhtml" media-type="application/xhtml+xml" />
        <item id="chapter02" href="chap02.xhtml" media-type="application/xhtml+xml" />
        <item id="imgl" href="images/sample.png" media-type="image/png" />
    </manifest>

    <spine toc="ncx">
        <itemref idref="titlepage" />
        <itemref idref="chapter01" />
        <itemref idref="chapter02" />
    </spine>

</package>

Lots of stuff in this file. I'll go through each required tag here. Check the specs to see more information about optional meta data tags.

dc:title - Title of the book
dc:language - Identifies the language used in the book content. The content has to comply with RFC 3066. List of language codes. (I'd just copy the language tag from the sample...)
dc:identifier - This is the book's unique ID. This has to be a unique identifier for every different e-book. The spec doesn't give any sort of recommendation for what to use, but an ISBN number would be a good bet. I used the name of my web site and the date and time.
One thing to note, because of how the file interacts with toc.ncx, just modify what's after the " uuid:" on this line.

Next comes the manifest. This is just a listing of the files in the .epub container, and their file type.
Each item is also assigned an item ID that's used in the spine section of content.opf. This list does not have to be in any particular order. (But you'll be happier if it is. Also, see the section below on the NCX file for more information on the id attribute.)

The spine section lists the reading order of the contents. The spine doesn't have to list every file in the manifest, just the reading order. For example, if the manifest lists images, they do not have to be listed in the spine, and in fact, can't be. Only content (i.e. the XHTML files) can be listed here.

The value for the idref tag in the spine has to match the ID tag for that entry in the manifest.
(example: if you have a file named "chap01.xhtml", and the manifest reference looks like: <item id="chapter01" href="chap01.xhtml" media-type="applicatin/xhtml+xml" /> then your spine entry for Chapter 1 of your book would be <itemref  idref="chapter01" />. See above for a live example from a complete file.)

- toc.ncx
This is the table of contents. This file controls what shows up in the left Table of Contents pane in Digital Editions

<?xml version="1.0" encoding="UTF-8"?>
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1">

<head>
 
   <meta name="dtb:uid" content="jedisaber.com06282007214712"/>
    <meta name="dtb:depth" content="1"/>
    <meta name="dtb:totalPageCount" content="0"/>
    <meta name="dtb:maxPageNumber" content="0"/>
</head>

<docTitle>
    <text>Sample .epub eBook</text>
</docTitle>


<navMap>
    <navPoint id="title_page" playOrder="1">
        <navLabel>
            <text>Title Page</text>
        </navLabel>
        <content src="title_page.xhtml"/>
    </navPoint>

<navPoint id="
chapter01" playOrder="2">
    <navLabel>
        <text>Chapter 1</text>
    </navLabel>
    <content src="chap01.xhtml"/>
</navPoint>

<navPoint id="
chapter02" playOrder="3">
    <navLabel>
        <text>Chapter 2</text>
    </navLabel>
    <content src="chap02.xhtml"/>
</navPoint>


</navMap>
</ncx>

Things you need to change (if you copy and re-use the sample toc.ncx file):

The navPoint tag

Each nav point is a chapter listing, the text is the chapter name, and the src is the file it links to.
If you copy a navpoint tag set to add chapters, make sure to update the id and playorder values.

Let's look at our example file to clarify this:

<navPoint id="chapter01" playOrder="1">
     <navLable>
          <text>Chapter 1</text>
     </navLable>
     <content src="chap01.xhtml" />
</navPoint>

    <navPoint id="chapter01" playOrder="1">

According to the spec, the id can be anything you want, but it's easier to keep track of things if you use the same id you used for that file in the .OPF file. Also, some readers won't properly display the Table of Contents if the ID doesn't match.

    <navPoint id="chapter01" playOrder="1">

The playOrder values have to be in order. (An item with playorder 1 will be before an item with playorder 2, etc.) They also have to be listed in order, and can't have any gaps. (You'll get an error if you jump from 1 to 20, etc)

    <text>Chapter 1</text>

The stuff you type inside the text tag is what actually shows up in the reading software's table of contents. This can be any text you want.

    <content src="chap01.xhtml"> />

The content tag links the table of contents item to the XHTML file it points to. If your id tag and text tag both point to chapter 1, and your content tag points to chapter 4, you'll go to chapter 4 when you click the link to chapter 1 in the table of contents.

Notes on the toc.ncx:

You can't format the contents of the toc.ncx. This file is used by the reading software to display the table of contents. Each program will display the contents of toc.ncx differently. If you want to present a formatted table of contents to the reader, you need to make a XHTML file with the contents formatted however you want. (In fact, this is a good idea as there are still some ePub reading programs that don't use toc.ncx.)

 

- page-template.xpgt


This file isn't part of the IDPF spec, but Adobe Digital Editions uses it for formatting and setting column settings and whatnot. You don't need this file at all, but your book will look nicer in Digital Editions if you include it. Other readers should just ignore it.

Note: You can use a .css style sheet file to layout styles for your book as well. Just make sure to list it in the manifest section of Content.opf
Also of note here, any styling should be done in a CSS stylesheet, and not in the document.

- Content .xhtml files

Content files should be XML 1.1 documents
If you're not familiar with XML, it's basically HTML with closing tags for every element, and several style tags are not supported.
As far as how to put the content, you can have it all in one document with bookmarks at each chapter, or each chapter in a separate .xhtml file. The latter looks nicer in most readers, as well as decreases the time it takes for the book to load when you first open it.

 

- A Note about Cover Images:

The ePub specification doesn't say anything about where or how to do your book cover, but there are a few "best practices" that have emerged into a kind of non-spoken convention.

Most readers will display the first image in the book as the book's cover, but not all of them do this. Almost all readers that support cover images will correctly display a cover if you do this:

 


    >> Continue to Part 3 of the ePub Tutorial: Creating the Container and Adding Your Files

Was this guide useful to you?

You can also follow my blog for updates on books I write, and also some ePub content: http://aarondemott.blogspot.com

Please consider donating a small amount to help pay the server costs. You don't have to, but if you would like to, I'd be very thankful!

Download this Guide

Download this guide as an ePub file

Comments? Questions?

If you have any comments, notice any bugs, or have any questions on any of the steps here, please e-mail me at: yoda47 (at) jedisaber (dot) com