December 2012

Our previous post “SWORDv2 Compliance: what is it and why is it good?” introduced some reasons to make your scholarly systems compliant with SWORDv2, and what that really means. This post covers an approach to achieving SWORDv2 compliance for your particular use case(s), using DataStage and DataBank as examples.

First, if you are implementing a SWORD v2 solution, it’s worth having a passing acquaintance with the specification, at least so you are aware of the standard protocol operations.

An approach that we’ve found works well for designing how to fit SWORD v2 to your workflow is as follows:

1. Diagram your deposit workflow

Draw a diagram of your deposit workflow, with all the systems and interactions required. Don’t mention SWORD v2 anywhere at this point; let’s make sure that it meets your workflow requirements, not that you fit your workflow to it.

For example, here’s a basic diagram showing how the DataStage to DataBank deposit looks (click to enlarge):

2. Re-draw the diagram around SWORDv2

Re-draw the diagram in the following form (click to enlarge):

By referring to the spec throughout, you can figure out which SWORD v2 operations with which HTTP headers and what deposit content is required to achieve your workflow. For example, the diagram for DataFlow which integrates DataStage and DataBank looks basically like this (click to enlarge):

3. Utilise the pre-existing code libraries

Take one of the code libraries (which exist in Python, Ruby, Java and PHP for the client-side, and Java and Python for the server-side), and find the methods which implement the SWORD operations that your diagram tells you that you need.

For example, consider some (simplified) Python code required by DataStage to deposit to DataBank:

# create a Connection object
conn = Connection(service_document_url)

# obtain a silo to deposit to (this just gets the first Silo in the list)
conn.get_service_document()
silo = conn.sd.workspaces[0][1][0]

# construct an Atom Entry document containing the metadata
e = Entry(id=dataset_identifier, title=dataset_title, 
             dcterms_abstract=dataset_description)

# issue a create request
receipt = conn.create(col_iri=silo.href, metadata_entry=e)

It should then be a relatively straightforward task for a programmer to integrate this code into your application workflow.

(This is the second part of a two-part article about SWORDv2 compliance. You can read the first part here: SWORDv2 Compliance: what is it and why is it good?)

What does it mean to be compliant, and why is this a good thing? Where are the edges of SWORD v2 (what is part of the standard, what isn’t)? This blog post aims to address these questions.

What does it mean to be compliant, and why is this a good thing?

A compelling reason to comply with SWORD v2 in your environment is that more and more sector-wide infrastructure is already being built using it, and you will be able to take advantage of it easily. SWORD v2 allows content to easily be moved between scholarly systems. For example, Jisc, via the UKRepNet+ project, are working with EuroPMC and Nature Publishing Group to allow the deposit from their publisher systems directly into digital repositories. It is likely that more cross-organisational bodies will utilise SWORD v2 for parts of their infrastructure in the future, and it will be valuable for repositories (and other systems) to be prepared.

From a technical point of view, compliance just means having implemented the parts of SWORD v2 which are required by the specification. It has a lot of optional components, so it is relatively easy for a software implementation to be compliant.

You can also comply with the specification while also extending it for your own needs. Since SWORD v2 is a profile of AtomPub it can take advantage of all of the extension mechanisms it provides. This means that custom metadata (beyond the Dublin Core explicit in the specification), for example, can be embedded into deposit requests.

A more general benefit to being compliant is that any given client/server pair can understand each other. This doesn’t strictly mean that one will be able to deposit to the other, since a successful deposit implies a common understanding of deposit package formats and metadata, which are not issues that SWORD v2 is designed to deal with. What it means instead is that they can mutually determine whether deposit can take place, and carry it out if so.

Another benefit to complying with SWORD v2 is for reduction of overall effort in supporting deposit. Even though there may be work to do in your client or server environment to support the package and metadata formats for your needs, the transport layer itself is well defined and well supported by software libraries and repository systems. Thus the overall implementation effort for a deposit workflow is significantly reduced.

Where are the edges of SWORD v2?

There are three key principles to bear regarding what SWORD v2 is:

SWORD v2 is about TRANSPORT – it is for getting content from one place to another, but is agnostic to what that content is (it does not care what the package format or metadata format is)
SWORD v2 is about ACTIONS – it defines operations that you can do to content on a server (e.g. create, retrieve, update, delete), which can be mapped on to deposit use cases
SWORD v2 is a SPECIFICATION – it defines ways to carry out deposit actions over HTTP using the semantics of AtomPub, and provides some extensions. It is not, in any way, a piece of software, although there are software implementations of SWORD v2 just as there are software implementations of AtomPub.

There are some things that SWORD v2 does, though, to help out with some of the content related issues. First, it allows you to give your package formats identifiers, which means that a client and server can agree when they support the same format, but the definition of those identifiers is not part of the standard – you can just make them up yourself!

For example, the BagIt format used by DataStage and DataBank is:

    http://dataflow.ox.ac.uk/package/DataBankBagIt

Which allows both sides of the deposit to know that they are dealing with a BagIt file of the particular format that they both understand.

Second, it actually does specify a minimum metadata format (Dublin Core) and package format (a simple ZIP file) so that any client/server pair should always be able to transfer binary content, even if they can’t exchange more structured packages or metadata.

What to do if you’re having trouble?

Because SWORD v2 is a specification rather than a piece of software, there are a variety of software implementations in different languages for different purposes, and they all have some variations in how they implement the spec. This means, especially as much of that software is new, that sometimes things won’t go as expected. In those cases there are some things that you can do:

Come to sword-app-tech, the general SWORD technical discussion forum. Many implementers from different communities hang out there, and general issues with SWORD may get answered.
Go to your community discussion forum. For example, DSpace has its own implementations of SWORD v1 and SWORD v2, and the developers there might be better placed to help you than on sword-app-tech.
Check out the SWORD v2 and AtomPub specs – they will tell you what you can do with the standard, and the common code libraries should be sufficiently general to allow you to extend your use of SWORD v2.

(This is the first part of a two-part article about SWORDv2 compliance. You can read the second part here: SWORDv2 Compliance: how to achieve it)