Query Musicbrainz.org via HTTP: a proposal

Tuesday, December 30, 2003

This is a proposal to improve current query model of Musicbrainz.org. This is aimed to simplify the process of querying the Musicbrainz (MB) metabase using standard HTTP requests and thus making MB queries somewhat more "software-agent friendly", without the need to install/compile/use additional client libraries.

Please note: some pieces of this proposal are already in place and match my current wishes, others should be added or refined.

My goal is to use most of the MB query features — hopefully the most useful — with simple HTTP GET requests, using URIs to identify resources as best explained in "Building Web Services the REST Way".

Querying MB using HTTP POSTs should be limited to the cases where relevant query results are not guaranteed to be returned — like a "find album" search — or when there are encoding issues and query terms cannot be passed as normal querystring parameters.

What can be retrieved

Artists, albums, tracks, CD-index and TRMID metadata, being already uniquely identfied, should be all retrieved via an HTTP GET requests. All these have URIs thus are already described in the metabase as resources. For example:

http://www.musicbrainz.org/artist/4a4ee089-93b1-4470-af9a-6ff575d32704
Identifies the artist "The Prodigy".
http://www.musicbrainz.org/album/4cfca905-7d38-4bc7-881f-6202a0394786
Identifies the album "Music for the Jilted Generation".
http://www.musicbrainz.org/track/daa9e1bd-56aa-48bf-9c7b-6cd7cac8c223
Identifies the track "Break & enter".
http://www.musicbrainz.org/cdindex/XC87Kvf0Onwnu7g_FvE1I_im47I-
Identifies the album "Music for the Jilted Generation" using its CD-index (the same album in different countries may have a different CD-index value).
http://www.musicbrainz.org/trmid/dd6a5b51-a08b-409e-89af-b18392a32867
Identifies a track "Full Throttle" via its TRM Acoustic Fingerprint.

Actually if you try to visit the above links with a web browser the MB server will display HTML tables showing metadata information of the requested resource.

How RDF/XML info can be retrieved

The idea is to reuse artists, albums, tracks, CD-index and TRMID URIs to display a different and more machine-understandable format of the same data to allow software agents to browse the RDF/XML data and extract useful information for us, the users.

MB server should serve a format instead of another by using HTTP "format negotiation", but actually i suspect it just sniffs for user-agent strings. Here is a sample Python client implementation asking for album metadata in RDF/XML format using format negotiation:

from urllib2 import urlopen, Request

h = {'Accept':'application/rdf+xml'}

# our album URI
uri = "http://www.musicbrainz.org/album/4cfca905-7d38-4bc7-881f-6202a0394786"
r = urlopen(Request(uri, headers=h)) # r holds results

# str does the necessary charset decoding
print str(r.read()) 

This will perform a GET on the given album URI. The web server will look-up for an Accept HTTP header, check if it contains application/rdf+xml and then it will send back RDF/XML instead of HTML. The Content-type for the response header should be set to application/rdf+xml. A policy of what to do if Accept header is missing or contains application/rdf+xml among other values needs to be discussed.

Current query model and the related client library includes a "depth" value to specify the amount of metadata returned by the MB server.

With my proposed model "depth" would defaulted to value of, say, 1 with straight-URI requests and could be explicity specified by appending a /depth-value after the URI, for example:

http://www.musicbrainz.org/artist/4a4ee089-93b1-4470-af9a-6ff575d32704/3

This way agents unaware of MB internals would simply request URIs with a depth = 1 and then crawl result sets in subsequent steps.

Bootstrapping

All the above discussion implies that an URI for a resource is known before the request and this makes sense in the "crawling" scenario; an URI can be found anywhere: within an e-mail message, in a playlist or inside a web page as an hyperlink.

When an URI for a desired resource is not known in advance current MB implementation already provides a complete set of query functions: mq:FindArtist, mq:FindAlbum, mq:FindTrack among others use POST requests with a RDF/XML payload.

Once again clients making queries should add an Accept and a Content-type headers to state they can handle RDF/XML and they are sending query terms as RDF/XML payload.

Cool URIs don't change

Finally it would be cool to have a friendly and permanent URI where to submit queries, something like "http://www.musicbrainz.org/search". Changes in the query engine should be handled by versioning the namespace of the MM e MQ vocabularies and not by changing the URI of the search script.

Hope this helps. Discuss.


Generare HTML con le closure

Wednesday, December 17, 2003

Stavo leggendo "Lisp in Web-Based Applications" e questo passaggio mi ha incuriosito:

One way we used macros was to generate Html. There is a very natural fit between macros and Html, because Html is a prefix notation like Lisp, and Html is recursive like Lisp. So we had macro calls within macro calls, generating the most complicated Html, and it was all still very manageable.

Non ho un'idea precisa di che cosa siano le macro di Lisp ma in Python è possibile generare dinamicamente contenuto ed attributi di una serie di tag HTML tramite delle closure.

Per rendere l'idea di come potrebbe funzionare questa cosa ho ripreso un esempio già pubblicato e l'ho riscritto utilizzando closure e liste.

data = [('Spam', 'http://www.spam.com/'), 
  ('Eggs', 'http://www.eggs.com/')]
def ul():
  def li(title, uri):
    def a():
      return {'href':uri}, 'Visit %s' % title
    return a,

  # build a bunch of li's
  return [curry(li, title, uri) for title, uri in data]

print fill(ul)

Il codice nell'esempio crea un lista non ordinata (ul), i suoi item (li) e per ognuno di questi un link (a).

La classe curry permette di associare dei valori ai parametri di una funzione prima della sua effettiva chiamata.

La funzione fill partendo dalla funzione ul invoca e scorre i valori restituiti da questa. Ogni funzione definita ritorna una lista di zero o più elementi e tali elementi possono essere di tre differenti tipi:

L'ovvia limitazione di questo approccio è l'impossibilità di avere due funzioni con lo stesso nome come sibling l'una dell'altra, limite facilmente aggirabile rinominando le funzioni in modo opportuno e facendo in modo che fill consideri significativo solo una parte del nome (ad esempio ul_foo e ul_bar, produranno entrambe il tag ul).

Aggiornamento: ho pubblicato Deex: ...a little Python module that uses nested function lists to render sequences of XHTML tags.