Tutorial
What is amplee and what it isn't
amplee is an implementation of the Atom Publishing Protocol using the Python programming language. The goal of amplee is to provide a toolkit for developer wishing to profit from the benefit of that new protocol in their application.
amplee tries hard not to feel like a framework. Of course it cannot avoid to have some aspects of a framework but by decoupling the different packages of amplee, I hope I managed to demonstrate that they could be used independently from each other.
Indeed, even though AtomPub is specified in terms of HTTP generalization, its data model is actually independent from HTTP and can be manipulated in any number of other places. Therefore amplee has split the data model implementation from the HTTP interface and made the former independent from the latter.
Requirements
Before starting with amplee you must ensure you have the basic requirements:
For the purpose of this tutorial, we will assume you run CherryPy 3 too.
For further requirements, please report you here.
Overview
amplee is designed around several packages that, as a whole play along together, but can also be used fairly independently one from the other, let's review those packages:
- amplee: This is the top-level package which contains all the others. It also contains a set of standalone modules that are helpful for the others. It also contains the loader module that allows to generate a complete application from a INI file.
- amplee.atompub: This package contains the implementation of the AtomPub data model.
- amplee.handler: This package contains the implementation of the operations of AtomPub.
- amplee.storage: This package contains a set of modules that implement a very simple interface to persist resources using different stategies like RDBMS, filesystem, Amazon S3, ZODB, tarballs, subversion, memcached.
- amplee.indexer: This package contains a simple indexing implementation that is suited for small to medium websites and designed around the AtomPub data model.
This tutorial will demonstrate how to use those different packages in conjunction to create a web application based on AtomPub.
Relationship between packages
When used as a whole, amplee creates a relationship between the above listed packages, as follow:
- The amplee.loader module reads a INI file and generates the complete structure of an AtomPub service using the amplee.atompub and amplee.storage packages. At that point the resulting structure can be used in any application without having to go through the HTTP interface which can be handy for a fat client application wishing to manipulate an AtomPub store without a network call.
- The generated service structure is then bound to one of the provided HTTP interface via the amplee.handler package.
Step 1 - The configuration file
When you create a service structure it is common that certains tasks are just repeated in the same over and over across projects, in order to avoid duplication of code and make maintenance easier, amplee provides a loader module. This loader reads a INI file and transforms its content into a service instance.
A full description of the INI structure can be found here but let's review its main sections.
- storage: This section describes which storage your service should be using for the member and media resources. The former represents the Atom entries and the latter with any other content. When content, other than an Atom entry, is stored, its associated Atom entry is then called a media-link entry or MLE. You can if you need specify two distinct storage types for both type of resources.
- store: This section sets the information to create a store, which is a simple interface between the amplee.atompub and the amplee.storage packages.
- service: Specifies some global information about the service. For instance if all your collections share the same base URI, you can specify it there and it will get populated. You also tell the loader which workspaces will be linked to this service.
- workspaces: The workspace sections provide a way to set the common information about workspaces. It is also the place where to linka workspace to its collections.
- collections: The collection sections are the largest in terms of information to provide as they are the core of the data model.
- handlers: Each media-type set as acceptable for a collection is associated what amplee calls a handler. Those handlers are your entry points into amplee and are the right place to add your application-specific code. Each handler is two parts:
- a Python class that implements none or many of methods that will be searched for by amplee.handler during the request processing.
- a Python class that inherits from amplee.atompub.member.atom.EntryResource for member resources or amplee.atompub.member.generic.MediaResource for media resources. Those two classes implement default behavior for both cases, but if you need to implement entirely your own member resource provider, you may implement amplee.atompub.member.MemberResource which is the base class.
- members: A set of sections that relates to one or several handlers. When a handler links to one of this section, each of the member section entry will be passed as-is to the handler. Useful to provide a per-application configuration settings that are meaningless to amplee but relevant to the handler.
For an example of configuration file, you may look at the demo source code of amplee.
When the configuration file is setup you then run a code similar to:
import os from amplee.loader import loader base_dir = os.getcwd() def setup_store(): service, conf = loader(os.path.join(base_dir, 'cooker.conf'), encoding='ISO-8859-1', base_path=base_dir) return service, conf
The loader function will generate the service structure and return both the amplee.atompub.service.Service and amplee.loader.Config instances.
Note that in this example we provide the base_path argument which is will ensure that all the paths defined in the config file are absolute when used. That's why your config file can avoid using absolute paths in almost all cases.
Step 2 - An HTTP interface
In the previous step we have loaded a service structure. In a fat client application that doesn't want to bother with the HTTP layer, we could have stopped there and manipulate an AtomPub service directly. However you will most likely always use an HTTP interface and this section will show you how.
amplee provides currently two interfaces:
- CherryPy in the amplee.handler.store.cp package. In that case the HTTP interface is an actual CherryPy application using the MethodDispatcher to route the requests based on their HTTP method.
- WSGI in the amplee.handler.store.wsgi package. In that case the HTTP interface is a pure WSGI implementation independent of any other toolkit. You can then bind each WSGI application to a URL dispatcher like selector or wsgidispatcher.
Note that, even though a CherryPy 3 application is also a WSGI application, amplee provides a pure and neutral WSGI interface for people who would not want to use CherryPy at all in the first place.
Each of the above package provides two classes:
- Service: This takes an amplee.atompub.service.Service instance and simply serves the AtomPub service document on GET and HEAD requests.
- Store: This takes an amplee.atompub.collection.Collection instance and serves the collection on the HTTP methods defined by AtomPub: GET, HEAD, POST, PUT and DEETE. It serves the member resources and the collection feed.
Let's see a quick example on how to bind the service structure to an HTTP interface:
Using the CherryPy interface
from amplee.handler.store.cp import Service, Store def setup_apps(): service, conf = setup_store() workspace = service.get_workspace('starters') pub = Service(service) col = workspace.get_collection('salads') pub.salad = Store(col, strict=True)
Using the WSGI interface
import selector from amplee.handler.store.wsgi import Service, Store def setup_apps(): service, conf = setup_store() s = selector.Selector() pub = Service(service) s.add('/pub', GET=pub.get_service) workspace = service.get_workspace('starters') store = Store(workspace.get_collection('salads'), strict=True) s.add('/pub/salad[/]', POST=store.create_member, GET=store.get_collection, HEAD=store.head_collection) s.add('/pub/salad/{rid:any}', GET=store.get_member, PUT=store.update_member, DELETE=store.delete_member, HEAD=store.head_member)
Step 3 - Application specific requirements
As explained above, you can extend amplee by implementing the handler classes for each media-type your service structure supports.
The HTTP handler interface
By implementing a Python class with amplee specific callback points you can add your own code that will change the behavior of amplee. The handler interface is as follow:
class MyHandler(object): def __init__(self, member_type): # Compulsory instance of {{{atompub.handler.MemberType}}} created by amplee within the loader. self.member_type = member_type def on_error(exception, member): # Called when an unexpected error happened and could be trapped by amplee # The first argument is the raised exception or error and the second is the member on which the error happened # Returns nothing def on_create(member, content): # Called during the process request when amplee has generated the member from the request body. # The first argument is a member instance and the second is the raw content of the request. Note that this can be of any type even if most of the time it will be either a string or a file object. # This returns the member and content which could have been altered within this method for your specific application needs. # For instance you can return (member, None) to indicate that only the member resource must be persisted and not the content sent. # You can raise an {{{amplee.error.ResourceOperationError}}} in order to stop the process and therefore prevent the resource to be persisted while returning an application specific message to the client. def on_created(member): # Called after the member and its content have been persisted. Allows for some extra work. # Returns nothing def on_update(existing_member, new_member, new_content): # Equivalent to on_create but for a PUT request. The existing_member argument is the loaded member from the storage as it exists now. The new_member is generated from the request itself. It's up to you to take what is needed from each of them and return the member that will be actually persisted in place of the existing one. # The rest is the same on_create def # on_updated(member): # Called once the member has been persisted. # Returns nothing def on_delete(member): # Called upon deletion of the resource. You can also stop the operation by raising {{{amplee.error.ResourceOperationError}}}. # Returns nothing def on_deleted(member): # Called once the resource has been removed from the storage def on_get_content(member, content, content_type): # Rarely needed but allows you to force amplee into returning the content you will return from this method instead of the default behavior # Returns member, content, content_type modified if neeeded within. def on_get_atom(member): # Same as above but for the atom entry representing the member resource # Returns a member def on_update_feed(member): # Called for POST, PUT and DELETE after the resource was persisted or removed from the storage # to allow you to update the feed accordingly or not # Returns nothing
You can see an example in the demo source code of amplee.
It is interesting to note that the on_create method returns the member and a list of tuples as content. The reason for this is that the demo uses the tarfile storage to persist the content. This storage expects such a list construction. This shows you that you can return any kind of content from those handlers, as long as your storage supports it, amplee will not attempt to make read it any further.
Note as well that whenever you want to access the Atom entry associated with the member resource you do member.atom which is an instance of bridge.Element from the bridge XML library. You may therefore have to learn its API before continuing, fortunately this API is simple enough.
The member handler interface
In addition to be able to hook your own code during the request processing, amplee allows you to modify the creation and edition of the member resources themselves from the amplee.atompub.member package.
This can be useful if you want to add extra code when the member resource, e.g. the atom entry, is created or updated. The base class is amplee.atompub.member.MemberResource which defines a whole set of methods to manipulate a member resource. Only four methods must then be re-implemented per your requirements. (Note that the documentation of create and update says only string is passed as source, but a file object may as well be passed).
amplee comes with two built-in implementations of this base class to handler atom entries with amplee.atompub.member.atom.EntryResource, and any other media-type with amplee.atompub.member.generic.MediaResource. These two classes will handle the construction of the atom entry associated with the content provided and will set the entry attribute of the member resource.
For a good example of this, look at the demo source code of amplee once again. This class handles multipart/form-data by parsing the body request and transforming it into a dictionnary before using each value to generate the atom entry.
Note that the create and update methods returns the parsed dictionary as content to be passed on to the HTTP handlers. This is quite interesting because it means that those handlers won't have to perform the parsing again but also that the dictionary could be modified at that stage before continuing its life within the request processing.
Appendix A - Feed handling
Collection feed
AtomPub defines the collection as a regular Atom feed which entries are member resources, e.g. atom entries that contain atom:link elements with rel attributes set to "edit" or "edit-media". In amplee you can generate an atom feed as follow:
collection.feed
This will compute and return a feed containing the entire set of member resources belonging to that collection.
However because this operation can be consuming, amplee provides a cache for this feed:
# Instance of amplee.atompub.collection.FeedHandler collection.feed_handler # To get an instance of the feed: collection.feed_handler.retrieve() # set the collection feed to a new version of the feed collection.feed_handler.set(feed) # set a XSLT resource as a processing instruction collection.feed_handler.set_collection_xslt(xslt_uri) # get the feed as an XML string collection.feed_handler.collection_xml()
If the operation of generating the atom feed is too consuming you could have a different thread running that in cyclic manner will update the collection feed cache like this:
collection.feed_handler.set(collection.feed)
Public feed
In addition to a collection feed, amplee handles a public feed. A public feed is the version of the collection feed that is suitable for public aggregation. The entries are not the member resources but their public representation.
This is really handy when you want to have a public face in your application without using the collection feed for aggregation.
# Accessing the public feed collection.feed_handler.public_feed # Initializing the public feed from the collection # The entry_processor is a callable that takes a member instance and must return an entry instance # allowing for modifying the member resource into its public representation # It can also return None to prevent the entry to be part of the feed # The post_processor takes the feed instance and returns the same or a new feed instance # allowing for post treatment of the feed collection.feed_handler.init_public(collection, entry_processor=None, post_processor=None) # Manipulating the public feed collection.feed_handler.public_feed.add(entry) collection.feed_handler.public_feed.remove(entry) collection.feed_handler.public_feed.replace(entry) # Setting an XSLT resource as a processing-instruction collection.feed_handler.set_public_xslt(xslt_uri) # Get the feed as an XML string collection.feed_handler.public_xml()
For an example of a function that transforms a member resource into a public entry, look at the source code of the demo.
Appendix B - Caching
amplee provides two distinct mechanism to cache the resources it handles.
Internal simple cache
This mechanism should only be used for very small websites that don't have too much traffic nor too much content. Basically each collection holds a set of members in memory. When a member is requested, the collection looks into the cache and fetches it if it is there or load it from the storage otherwise. Each member is weighted so that the most requested keep staying in the cache.
To enable that cace you just need to add the following entry to any of your collection section in the config file:
enable_cache = True # or False to disable it
The cache is enabled by default but if you plan on using the next caching solution, you should disable it.
Memcached
amplee fully supports memcached via the cmemcache or python-memcache packages. The way amplee was implemented is by creating a memcache storage that acts as a proxy between the store and the underlying storage. From the store point of view, because it has the same API as any other storages it works all the same. The memcache storage takes care of delegating each operation to the underlying storage.
This means that when you request a resource, if it is in one of the memcached servers, it is returned rather than loaded from the underlyin, physical storage.
To enable this feature simply add the following lines to the store section of your config file:
# Comma seperated list of servers IPs and their port member_storage_memcached = 127.0.0.1:11211,192.168.1.45:7878 media_storage_memcached = 192.168.1.46:4678
Appendix C - Indexing
amplee comes with its own dedicated indexing system. It is not meant to compete with a solution like Lucene but provides nonetheless an interesting set of features geared towards AtomPub.
Setting up an indexer
import os from amplee.indexer import * base_dir = os.getcwd() def setup_indexer(): ind = Indexer() container = ShelveContainer(os.path.join(base_dir, 'index.p')) #container = MemcacheContainer(['127.0.0.1:7878']) #container = MemoryContainer() aid = AtomIDIndexer(name="entry_id", container=container) cid = CategoryIndex(name="category_index", container=container) ind.register(aid) ind.register(cid) return ind
The basic idea is to first create an Indexer instance that will hold a set of indexing mechanism such as per atom id, per atom category, per dates, etc. Each of these index is bound to a container. amplee provides three container mechanisms: in memory, shelve, memcached. Depending on your needs.
Once the indexer is created and indexes registered, you can start bind the indexer to the collection:
collection.add_indexer(ind)
Indexing
Each index mechanism expect a member instance and will index it as follow:
member_id = (collection_name, member_id)
Members are automatically indexed by the collection when they are persisted and when they are bulk loaded. They are not indexed on each access (as of 0.5.1 and above).
If you need to index it manually you can simply do:
member.index()
which will go through all the registered indexers of the collection and apply them all.
Querying an index
Once members are indexed you may query for them like this:
s = ind.indexes['category_index'].lookup(term=token)
Basically the idea is to retrieve the index instance for a particular query, here the atom category index, and call the appropriate method, here lookup. This will return a Set either empty or containing the tuples (collection_name, member_id) that matched the query. Of course, since this is a python Set instance, you can apply all the Set operators and methods on them to filter the results of several query for instance.
Once you have retrieved those tuples you can load the related members as follow:
# transforms the set of (collection_name, member_id) into a dict of {collection_name: [member_ids,]} items = indexer.to_dict(s) for name in items: c = service.get_collection(name) members = c.reload_members_from_list(items[name])
The members object holds a list of the matching members.
You can also generate an atom feed (as a bridge.Document instance) directly from the set like this:
from amplee.comparer import app_updated_comparer feed = service.make_feed(items, entry_processor=transform_member_resource, title=u"Search Result", xslt_path=u"/static/search.xsl", member_comparer=app_updated_comparer) # Return the feed as a string feed.xml()
The entry_processor is a method that takes a member and returns an atom entry proper for the feed. The member_comparer takes a callable that will accept either member instances or entry instances (bridge.Element) and will compare some of its field so that the feed is sorted appropriately.
Appendix D - Searching
amplee does not have searching facilities. However by using the index mechanism as see in Appendix C you can create your own search mechanism quite easily as seen in the demo source code.
Summary
In this tutorial I have presented the big lines of amplee, of course there is more but hopefully this should help you understanding the design and good ways of using amplee. If anything please report me any feedback you might have and I'll do my best to help you out.
