Query Musicbrainz.org via HTTP: a proposal
Tuesday, December 30, 2003
This is a proposal to improve current query model of Musicbrainz.org. This is aimed to simplify the process of querying the Musicbrainz (MB) metabase using standard HTTP requests and thus making MB queries somewhat more "software-agent friendly", without the need to install/compile/use additional client libraries.
Please note: some pieces of this proposal are already in place and match my current wishes, others should be added or refined.
My goal is to use most of the MB query features — hopefully the most useful — with simple HTTP GET requests, using URIs to identify resources as best explained in "Building Web Services the REST Way".
Querying MB using HTTP POSTs should be limited to the cases where relevant query results are not guaranteed to be returned — like a "find album" search — or when there are encoding issues and query terms cannot be passed as normal querystring parameters.
What can be retrieved
Artists, albums, tracks, CD-index and TRMID metadata, being already uniquely identfied, should be all retrieved via an HTTP GET requests. All these have URIs thus are already described in the metabase as resources. For example:
- http://www.musicbrainz.org/artist/4a4ee089-93b1-4470-af9a-6ff575d32704
- Identifies the artist "The Prodigy".
- http://www.musicbrainz.org/album/4cfca905-7d38-4bc7-881f-6202a0394786
- Identifies the album "Music for the Jilted Generation".
- http://www.musicbrainz.org/track/daa9e1bd-56aa-48bf-9c7b-6cd7cac8c223
- Identifies the track "Break & enter".
- http://www.musicbrainz.org/cdindex/XC87Kvf0Onwnu7g_FvE1I_im47I-
- Identifies the album "Music for the Jilted Generation" using its CD-index (the same album in different countries may have a different CD-index value).
- http://www.musicbrainz.org/trmid/dd6a5b51-a08b-409e-89af-b18392a32867
- Identifies a track "Full Throttle" via its TRM Acoustic Fingerprint.
Actually if you try to visit the above links with a web browser the MB server will display HTML tables showing metadata information of the requested resource.
How RDF/XML info can be retrieved
The idea is to reuse artists, albums, tracks, CD-index and TRMID URIs to display a different and more machine-understandable format of the same data to allow software agents to browse the RDF/XML data and extract useful information for us, the users.
MB server should serve a format instead of another by using HTTP "format negotiation", but actually i suspect it just sniffs for user-agent strings. Here is a sample Python client implementation asking for album metadata in RDF/XML format using format negotiation:
from urllib2 import urlopen, Request
h = {'Accept':'application/rdf+xml'}
# our album URI
uri = "http://www.musicbrainz.org/album/4cfca905-7d38-4bc7-881f-6202a0394786"
r = urlopen(Request(uri, headers=h)) # r holds results
# str does the necessary charset decoding
print str(r.read())
This will perform a GET on the given album URI. The web server will look-up for
an Accept HTTP header, check if it contains application/rdf+xml
and then it will send back RDF/XML instead of HTML. The Content-type
for the response header should be set to application/rdf+xml. A policy of what to do if
Accept header is missing or contains application/rdf+xml among other values needs
to be discussed.
Current query model and the related client library includes a "depth" value to specify the amount of metadata returned by the MB server.
With my proposed model "depth" would defaulted to value of, say, 1 with straight-URI requests and could be explicity specified by appending a /depth-value after the URI, for example:
http://www.musicbrainz.org/artist/4a4ee089-93b1-4470-af9a-6ff575d32704/3
This way agents unaware of MB internals would simply request URIs with a depth = 1 and then crawl result sets in subsequent steps.
Bootstrapping
All the above discussion implies that an URI for a resource is known before the request and this makes sense in the "crawling" scenario; an URI can be found anywhere: within an e-mail message, in a playlist or inside a web page as an hyperlink.
When an URI for a desired resource is not known in advance current MB implementation already provides a complete set of query functions: mq:FindArtist, mq:FindAlbum, mq:FindTrack among others use POST requests with a RDF/XML payload.
Once again clients making queries should add an Accept and a Content-type
headers to state they can handle RDF/XML and they are sending query terms as
RDF/XML payload.
Cool URIs don't change
Finally it would be cool to have a friendly and permanent URI where to submit queries, something like "http://www.musicbrainz.org/search". Changes in the query engine should be handled by versioning the namespace of the MM e MQ vocabularies and not by changing the URI of the search script.
Hope this helps. Discuss.
Generare HTML con le closure
Wednesday, December 17, 2003
Stavo leggendo "Lisp in Web-Based Applications" e questo passaggio mi ha incuriosito:
One way we used macros was to generate Html. There is a very natural fit between macros and Html, because Html is a prefix notation like Lisp, and Html is recursive like Lisp. So we had macro calls within macro calls, generating the most complicated Html, and it was all still very manageable.
Non ho un'idea precisa di che cosa siano le macro di Lisp ma in Python è possibile generare dinamicamente contenuto ed attributi di una serie di tag HTML tramite delle closure.
Per rendere l'idea di come potrebbe funzionare questa cosa ho ripreso un esempio già pubblicato e l'ho riscritto utilizzando closure e liste.
data = [('Spam', 'http://www.spam.com/'),
('Eggs', 'http://www.eggs.com/')]
def ul():
def li(title, uri):
def a():
return {'href':uri}, 'Visit %s' % title
return a,
# build a bunch of li's
return [curry(li, title, uri) for title, uri in data]
print fill(ul)
Il codice nell'esempio crea un lista non ordinata (ul), i
suoi item (li) e per ognuno di questi un link (a).
La classe curry
permette di associare dei valori ai parametri di una funzione prima della
sua effettiva chiamata.
La funzione fill partendo dalla funzione ul invoca
e scorre i valori restituiti da questa. Ogni funzione definita ritorna una
lista di zero o più elementi e tali elementi possono essere di tre differenti
tipi:
- Un dictionary, utilizzato per mappare gli attributi dei tag HTML ai loro rispettivi valori.
- Una stringa, che verrà riportata tale e quale in output.
- Una funzione, che verrà a sua volta invocata, procedendo così ricorsivamente
fino ad attraversare l'intera struttura. Il nome della funzione (che sarà
il nome del tag HTML) viene ricavato dalla proprietà
func_namedella funzione stessa.
L'ovvia limitazione di questo approccio è l'impossibilità di avere due
funzioni con lo stesso nome come sibling l'una dell'altra, limite
facilmente aggirabile rinominando le funzioni in modo opportuno e facendo
in modo che fill consideri significativo solo una parte del
nome (ad esempio ul_foo e ul_bar, produranno entrambe
il tag ul).
Aggiornamento: ho pubblicato Deex: ...a little Python module that uses nested function lists to render sequences of XHTML tags
.