Over the past year or so the concept and use of linked data seems to be gaining more and more traction. At CETIS we’ve been skirting around the edges of semantic technologies for some time – tying to explore realization of the vision particularly for the teaching and learning community. Most recently with our semantic technologies working group. Lorna’s blog post from the last meeting of the group summarized some potential activity areas we could be involved in.
The day started with a short presentation from Tom Heath, Talis, who set the scene by giving an overview of the linked data view of the web. He described it as a move away from the document centric view to a more exploratory one – the web of things. These “things” are commonly described, identified and shared. He outlined 10 task with potential for linked data and put forward a case for how linked data could enhance each one. E.g. locating – just now we can find a place, say Aberdeen, however using linked data allows us to begin to disambiguate the concept of Aberdeen for our own context(s). Also sharing content, with a linked data approach, we just need to be able to share and link to (persistent) identifiers and not worry about how we can move content around. According to Tom, the document centric metaphor of the web hides information in documents and limits our imagination in terms of what we could do/how we could use that information.
The next presentation was from Tom Scott, BBC who illustrated some key linked data concepts being exploited by the BBC’s Wildlife Finder website. The site allows people to make their own “wildlife journeys”, by allowing them to explore the natural world in their own context. It also allows the BBC to, in the nicest possible way, “pimp” their own progamme archives. Almost all the data on the site, comes from other sources either on the BBC or the wider web (e.g. WWF, Wikipedia). As well as using wikipedia their editorial team are feeding back into the wikipedia knowledge base – a virtuous circle of information sharing. Which worked well in this instance and subject area, but I have a feeling that it might not always be the case. I know I’ve had my run-ins with wikipedia editors over content.
They have used DBPedia as a controlled vocabulary. However as it only provides identifiers, and no structure they have built their own graph to link content and concepts together. There should be RDF available from their site now – it was going live yesterday. Their ontology is available online.
Next we had John Sheridan and Jeni Tennison from data.gov.uk. They very aptly conceptualised their presentation around a wild-west pioneer theme. They took us through how they are staking their claim, laying tracks for others to follow and outlined the civil wars they don’t want to fight. As they pointed out we’re all pioneers in this area and at early stages of development/deployment.
The data.gov.org project wants to:
* to develop social capital and improve delivery of public service
*make progress and leave legacy for the future
*use open standards
*look at approaches to publishing data in a distributed way
Like most people (and from my perspective, the teaching and learning community in particular) they are looking for, to continue with the western theme, the “Winchester ’73” for linked data. Just now they are investigating creating (simple) design patterns for linked data publishing to see what can be easily reproduced. I really liked their “brutally pragmatic and practical” approach. Particularly in terms of developing simple patterns which can be re-tooled in order to allow the “rich seams” of government data to be used e.g. tools to create linked data from Excel. Provenance and trust is recognised as being critical and they are working with the W3C provenance group. Jeni also pointed that data needs to be easy to query and process – we all neglect usability of data at our peril. There was quite a bit of discussion about trust and John emphasised that the data.gov.uk initiative was about public and not personal data.
Lin Clark then gave an overview of the RDF capabilities of the Drupal content managment system. For example it has default RDF settings and FOAF capability built in. The latest version now has an RDF mapping user interface which can be set up to offer up SPARQL end points. A nice example of the “out of the box” functionality which is needed for general uptake of linked data principles.
The morning finished with a panel session where some of key issues raised through the morning presentations were discussed in a bit more depth. In terms of technical barriers, Ian Davies (CEO, Talis) said that there needs to be a mind shift for application development from one centralised database to one where multiple apps access multiple data stores. But as Tom Scott pointed out it if if you start with things people care about and create URIs for them, then a linked approach is much more intuitive, it is “insanely easy to convert HTML into RDF “. It was generally agreed that the identifying of real world “things”, modelling and linking of data was the really hard bit. After that, publishing is relatively straightforward.
The afternoon consisted of a number of themed workshops which were mainly discussions around the issues people are grappling with just now. I think for me the human/cultural issues are crucial, particularly provenance and trust. If linked data is to gain more traction in any kind of organisation, we need to foster a “good data in, good data out” philosophy and move away from the fear of exposing data. We also need to ensure that people understand that taking a linked data approach doesn’t automatically presume that you are going to make that data available outwith your organisation. It can help with internal information sharing/knowledge building too. Of course what we need are more killer examples or winchester 73s. Hopefully over the past couple of days at Dev8 progress will have been made towards those killer apps or at least some lethal bullets.
The meet up was a great opportunity to share experiences with people from a range of sectors about their ideas and approaches to linked data. My colleague Wilbert Kraan has also blogged about his experiments with some of our data about JISC funded projects.
For an overview of the current situation in UK HE, it was timely that Paul Miller’s Linked Data Horizon Scan for JISC was published on Wednesday too.