Table of contents
What is it?
Purple is an RDF API for the Python programming language. It's heavily based on Pyrple toolkit, written by Sean B. Palmer. Purple aims to have a more extensible and provide in-memory and MySQL storage support. Purple is a work in progress effort, hance some of basic features are still missing. Purple is maintained by Andrea Peltrin.
Download
Download the latest snapshot of Purple.
License
Purple is released under the GPL 2 license.
Tutorial
During this tutorial a basic knownledge of RDF concepts is assumed.
Installation
Just unzip the purple archive in Python's site-package folder (or any other folder listed in your PYTHONPATH enviroment variable). If everything is correct by typing import purple into your Pyton console should appear blahblah..
PythonWin 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)] on win32. Portions Copyright 1994-2001... >>> import purple >>> dir(purple) ['Graph', 'Literal', 'NTriplesParser', 'Namespace', 'Node', 'Triple', 'TurtleParser', 'URI', 'Var', '__builtins__', '__doc__', '__file__', '__name__', '__path__', 'aliases', 'bNode', 'graph', 'namespaces', 'node', 'parsers', 'quoting', 'serializers', 'triple', 'www']
Graph is your best friend
In Purple all the interesting stuff is done thru graph instances either with
in-memory or MySQL storage. You can create an in-memory RDF graph using the
memory as storage value.
from purple import Graph #from purple.namespaces import FOAF, DC, VAR G = Graph(storage='memory')
or using the shortcut
G = Graph()
Now you have a Graph bound to G variable and you are ready to rock.
We can add one or more triple to a graph instance by using add
method
t = Triple(URI('http://www.deelan.com'), DC.description, Literal('Personal website of Andrea Peltrin'))
G.add([t])
add method expects an iterable to be supplied as parameter so we can either
pass a list of triples, or a another graph instance.
F = Graph()
F.feedURI('http://www.deelan.com/foaf.rdf')
G.add(F)
Fundamentals RDF building blocks like URI's, literals and blank nodes are provided by corresponding URI, Literal and bNode classes.
from purple import URI, Literal, bNode
# URI resource
rez = URI('http://www.deelan.com/')
# Literal italian and english languages
litIt = Literal('Ciao mondo!', lang='it-it')
litEn = Literal('Hello world!', lang='en')
# An anonymous blank node
anon = bNode()
# A named blank node
blah = bNode('blah123')
# A literal with a XML Schema floating point datatype
floatLit = Literal('66.6', dtype=XSD.float)
Connect to a MySQL database and automagically stuff data into it (MySQLdb module must be already installed on the system).
db = Graph(storage='mysql', host='localhost', db='foo', user='bar', password='secret')
We can also supply an already initialized connection object, using the connection name parameter.
db = Graph(storage='mysql', connection=k)
Sometimes is more conventient or quick to just scribble triples with Turtle or N-Triples grammars and then create an in-memory graph instance with such data.
s = """
@prefix dc:<http://purl.org/dc/elements/1.1/> .
@prefix foaf:<http://xmlns.com/foaf/0.1/> .
<http://www.deelan.com>
dc:title "deelan.com" ;
dc:creator [ foaf:nick "deelan" ; foaf:fname "Andrea Peltrin" . ]
.
"""
G = Graph.fromString('turtle', s)
fromString let us to create a Graph instance directly from raw
data, without using the feed* methods. Since Purple needs to know the serialization
format used to encode RDF data we pass a mime-tyle (or an alias) to fromString
in order to invoke the correct parser.
Namespaces
Purple defines a series of commonly used namespaces as instances of Namespace class. They live in the purple.namespaces module.
RDF = Namespace('http://www.w3.org/1999/02/22-rdf-syntax-ns#')
RDFS = Namespace('http://www.w3.org/2000/01/rdf-schema#')
OWL = Namespace('http://www.w3.org/2002/07/owl#')
FOAF = Namespace('http://xmlns.com/foaf/0.1/')
DC = Namespace('http://purl.org/dc/elements/1.1/')
CC = Namespace('http://web.resource.org/cc/')
XSD = Namespace('http://www.w3c.org/2001/XMLSchema#')
Here will import the FOAF namespace.
from purple.namespaces import FOAF FOAF.homepage <http://xmlns.com/foaf/0.1/homepage>
To create your own use the Namespace class
from purple.namespaces import Namespace
EX = Namespace('http://example.com/')
EX.foo
<http://example.com/foo>
Querying
db.feedURI('http://www.foafnaut.org/dump.rdf')
q=[
Triple(URI('http://deelan.com/'), FOAF.knowns, VAR.who),
Triple(VAR.who, FOAF.nick, VAR.nick),
Triple(VAR.who, FOAF.given, VAR.given),
Triple(VAR.who, FOAF.mbox, VAR.mbox)
]
results = db.query(Graph(q))
We you have filled the graph with some data you can query it for some specific values. VAR's let us to bind matching triple terms to some label, doing this we'll be able to extract some terms later.
for r in results: print r[VAR.who], 'AKA', r[VAR.nick], 'mbox'
You can iterate over results and find out values those match your search criteria.
MySQL goodies
Graph instances with MySQL storage have a more powerful query capabilities.
limit and offset query parameters allow to be more selective about matching
triples and exactMatch paramenter turns on a full fledged full-text search
for literal nodes (at the moment no google-like boolean search). This will restrict
the number of matching triples to 5...
q=[
Triple(URI('http://deelan.com/'), FOAF.knowns, VAR.who),
Triple(VAR.who, FOAF.nick, VAR.nick),
Triple(VAR.who, FOAF.given, VAR.given),
Triple(VAR.who, FOAF.mbox, VAR.mbox)
]
results = db.query(Graph(q), limit=5)
If you want to be really fancy you can store query themselves in the metabase.
import pickle, base64
s = base64.encodestring(pickle.dumps(q))
query = bNode()
t = [
Triple(query, THIS.query, Literal(s)),
Triple(query, DC.title, Literal('People I known')),
Triple(query, DC.creator, URI('http://deelan.com/'))
]
G.add(t)
Metadata extraction
We can also extract metadata from popular media format like MP3
G = Graph(storage='memory')
G.feedURI('http://deelan.com/sample.mp3')
print G
blah blah...
TODO list
- Adapt Pyrple RDF/XML parser.
- Add Pyrple's infer / think / filter methods to Graph class.
- Write a RDF/XML serializer.
Contribute
Feel free to contribute with suggestions, ideas and code. Purple is pretty modular and it's easy to add parsers, serializers and additional storage implementations.