SF Worth Reading

The book recommendations list has moved to sfworthreading.com. It’s a static site built with Jekyll and a Ruby “updater” script I wrote to do some busy work for me that Jekyll won’t, like building a custom authors index.

Getting Cover Images

A while ago I wanted to upgrade the “Book Recommendations” page to includelinks to reviews, book cover images and other information, possibly Amazon links. It was nothing more than a set of lists in markdown.

Thinking about the best way to achieve this, I realized I’d need an automatic way to take a book title and author and find a link to images and other resources. If I had to do it by hand to update all existing books in the list, it wouldn’t happen. Also it would be very nice to add a new book to the list and get the cover image added automatically.

The Amazon affiliate program will give you links which include nice images. However, my objective wasn’t to make money for Amazon and it didn’t seem likely I could earn enough to stay in the affiliate program anyhow. Plus, automating the generation of links for every book isn’t straightforward, and that was the most important point.

So, I turned to Open Library. It’s a database of information about published books that includes cover images. Every book has a web page on Open Library.

Open Library conveniently has an API for searching the Open Library database, and another API just for book cover images. The search API gives good enough results to automate adding cover image links to most books with some notable exceptions. For instance, all of Ian M. Banks’ books seem to be missing from Open Library and a few other books weren’t discovered because of how I named them in my list. These can be fixed by hand at some point. You can find a little more information on the process on the blogfor sfworthreading.com.

Enhancing a Jekyll Site With Additional Ruby Scripting

To get book titles and authors out of the book list I had to parse the existing markdown file with the books list. If I had any idea I’d be generating a site off this list, I’d never have done it this way.

A Ruby script employing a few regular expressions plus some rather ugly logic managed to load the list into a reasonable data structure. Then I could add code to search on Open Library for matches to my books and retrieve links to cover images, get publication dates, publisher names, and book pages on Open Library. With those results, I could generate new pages that included cover images and more, including sorting books. This automated process saved an enormous amount of time. Now I had a script that generated a set of pages that make up the sfworthreading.com website.

At that point I had to decide whether to update the site with new books by adding to the markdown list or come up with a whole new approach to updating the site. I could keep running the Ruby script every time I added a new book, reading the whole markdown file again and retrieving all the data from Open Library. Do I really want to maintain this script? The markdown parser is pretty ugly because of how I originally structured the lists. And more importantly, I didn’t have a way to save any links or other information on book entries retrieved from Open Library or elsewhere. Some of the information won’t change so it makes sense to store it locally. This makes switching to another cover image provider easier, if the rest of the information doesn’t have to be available on the new source of book data.

I decided to take the simple route and save all the data as JSON. That way I can add new fields in the future like links to external reviews or tags, etc. Editing the file by hand isn’t too difficult and other programs can read it easily.

Adding new books and new authors is done with a simple set of Ruby methods. Now the site update process is pretty painless:

# Add a book by an author not in the existing list of authors 
# and retrieve metadta for the book from Open Library:
add_book_by_new_author(title: ["Lexicon"], author: "Max Barry", rating: "Excellent") 

# Add a book by an author with other books already on the site 
# and retrieve metadata for the book from Open Library:
add_new_book(title: ["Redshirts"], author: "John Scalzi", rating: "Excellent")

# Generate pages including the new books and authors
update_site

Do the following if you have already added the book but later want to update the site metadata for a given title. Perhaps the entry at Open Library has been updated since the time you added the book and you want the new cover image.

refresh_metadata(author: "Lois McMaster Bujold", title: ["The Curse of Chalion"])

You can use update_site(book) to both refresh metadata for an existing book and generate the site with the updated metadata in one step:

# Retrieves metadata for the given book already on the site, then
# generate new pages using the new metadata (images, publication date etc.)
update_site(author: "S. Andrew Swan", title: "Broken Crescent")