biblio.webquery

Retrieving book data from online sources.

Platforms & distributions

  • Python package
  • platform-independent source code

Repositories

Development status

Stable.

Latest version

See below.

Background

This package presents a number of methods for querying webservices for bibliographic information, and includes two scripts for querying and renaming files by ISBN.

Installation

This package can be installed by the usual Pythonic methods:

  1. use your favourite installation tool:

    % easy_install biblio.webquery
    
  2. or download the source, unpack it, change into the directory and call:

    % python setup.py install
    

Depending on your platform, the scripts may be installed as .py scripts, or some form of executable, or both.

Usage

This was initially written for my own purposes (querying webservices for book information so I could correct the metadata on my ebooks and name them appropriately), so the features may be a little eccentric and documentation is thin. However, the code is not complex and can be easily understood.

Two scripts are installed with this package: queryisbn and renamebyisbn.

queryisbn returns bibliographic information from webservices for supplied ISBNs. It is called:

% queryisbn.py [options] ISBNs ...

with the options:

--version show program's version number and exit
-h, --help show this help message and exit
--debug For errors, issue a full traceback instead of just a message.
-s SERVICE, --service=SERVICE
 The webservice to query. Choices are xisbn (WorldCat xISBN), isbndb (ISBNdb). The default is xisbn.
-k KEY, --key=KEY
 The access key for the webservice, if one is required.

For example:

% queryisbn.py 1568385048 1564145026
1568385048
title: Drop the rock: removing character defects
authors: [Bill Pittman, Todd Weber]
publisher: Hazelden
year: 1999
lang: eng
1564145026:
title: Stop clutter from stealing your life : discover why you clutter and how you can stop
authors: [Mike Nelson]
publisher: New Page Books
year: 2001
lang: eng

% queryisbn.py --debug -s isbndb -k OPNH8HG2 1568385048 1564145026
1568385048:
title: Drop the Rock: Removing Character Defects
authors: [Bill Pittman, Todd Weber]
1564145026:
authors: [Mike Nelson]

renamebyisbn extracts an ISBN from a file name, looks up the associated bibliographic information in a webservice and renames the file appropriately. It is called:

% renamebyisbn.py [options] FILES ...

with the options:

--version show program's version number and exit
-h, --help show this help message and exit
--debug For errors, issue a full traceback instead of just a message.
-s SERVICE, --service=SERVICE
 The webservice to query. Choices are xisbn (WorldCat xISBN), isbndb (ISBNdb). The default is xisbn.
-k KEY, --key=KEY
 The access key for the webservice, if one is required.
-c CASE, --case=CASE
 Case conversion of the new file name. Choices are orig, upper, lower.The default is orig.
--leave_whitespace
 Leave excess whitespace. By default, consecutive spaces in names are compacted
--replace_whitespace=REPLACE_WHITESPACE
 Replace whitespace in the new name with this string.
--strip_chars=STRIP_CHARS
 Remove these characters from the new name. By default this are ':!,'".?()'.
--overwrite Overwrite existing files.
--dryrun Check function and without renaming files.
--template=TEMPLATE
 The form to use for renaming the file. The fields recognised are auth (primary authors family name), title (full title of the book), short_title (abbreviated title), isbn, year (year of publication). The default is '%(auth)s%(year)s_%(short_title)s_(isbn%(isbn)s)'.
--unknown=UNKNOWN
 Use this string if value is undefined.

The new name is generated first before the various processing options are applied. In order, characters are stripped from the name, excess whitespace is collapsed and then the case conversion is applied. We suggest you try a dryrun before renaming any files. The file extension, if any, is removed before renaming and re-applied afterwards.

For example, working with 4 files called '0763718165.Jones Course.djvu', 'helm_0671708821 (orig).pdf', 'tutor_9780198568322.rar', 'unce.9783540237730.27380.pdf':

% renamebyisbn.py --dryrun books/*
Original books/0763718165.JonesCourse.djvu ...
extracted ISBN 0763718165 ...
found Andersen - Data structures in Java : a laboratory course
new name Andersen2001_Data structures in Java_isbn0763718165.
new path books/Andersen2001_Data structures in Java_isbn0763718165.djvu. Original books/helm_0671708821 (orig).pdf ...
extracted ISBN 0671708821 ...
found Helmstetter - What to say when you talk about yourself.
new name Helmstetter1990_What to say when you talk about yourself_isbn0671708821.
new path books/Helmstetter1990_What to say when you talk about yourself_isbn0671708821.pdf.
Original books/tutor_9780198568322.rar ...
extracted ISBN 9780198568322 ...
found Skilling - Data analysis : a Bayesian tutorial ; [for scientists and engineers]
new name Skilling2006_Data analysis_isbn9780198568322.
new path books/Skilling2006_Data analysis_isbn9780198568322.rar.
Original books/unce.9783540237730.27380.pdf ...
extracted ISBN 9783540237730 ...
found McDaniel - Uncertainty and surprise in complex systems questions on working with the unexpected
new name McDaniel2005_Uncertainty and surprise in complex systems questions on working with the unexpected_isbn9783540237730.
new path books/McDaniel2005_Uncertainty and surprise in complex systems
questions on working with the unexpected_isbn9783540237730.pdf.

If you use the package directly in code, biblio.webquery presents several classes that may be useful:

  • BaseWebQuery, a simple class for encapsulating queries to webservices
  • BaseKeyedWebQuery, ditto except allowing for access keys
  • XisbnQuery and IsbndbQuery, for fetching bibliographic information from Worldcat xISBN and ISBNdb services respectively
  • QueryThrottle, for limiting the frequency or total number of service queries.
  • BibRecord, a general class for holding bibliographic information
  • PersonalName, a class for holding a name along functions for parsing names into this class.

Thanos Vassilakis has posted what looks like a very useful module for querying by ISBN. However it seems to have disappeared from its home website.

Limitations

Note that all bibliographic databases contain a certain amount of malformed entries or inconsistent formatting. For example, edition may or may not be included in the title, authors can be given as 'Firstname Lastname' / 'Lastname, Firstname' / 'F Lastname' / etc. etc. and so on. To this end, the package uses a number of heuristics to nomrlaise the data.

To use the ISBNdb, a key must be provided. This can be obtained for free by signing up on the website.

References

Releases