The Map Is Not The Territory

A blog by Christian Willmes.

New SemanticMediawiki based OSGeo Member Map

| categories: webdev, semantic web, geospatial, osgeo, semantic mediawiki | View Comments

In this post, I give some background on the new Semantic Mediawik based OSGeo Members map, that replaced the userMap. Starting with the Mediawiki update and introducing Semantic Mediawiki, some words about the history of the userMap and most important an overview of the new implementation and possible additional applications of Semantic Mediawiki in the OSGeo Wiki are given.

The introduction of Semantic Mediawiki into the OSGeo Wiki

Recently, thanks to an effort by OSGeo SAC (namely by Martin Spott), the OSGeo Wiki underlying Mediawiki software was upgraded from an ancient version (I think it was 1.12) to the current 1.25.3. Additionally the Semantic Mediawiki (SMW) extension, including Semantic Maps was installed, to enhance the OSGeo Wiki with its features.

SMW is a Mediawiki extension, that allows to structure wiki content (as data) and provides tools for queriying, export and visualization of this structured data. The Semantic Maps extension adds the capabilitiy to visualize SMW content, containing data of the special type "Geographic Coordinate" on maps. SMW even offers an API that allows to query the structured data stored in the wiki from external applications and export data based on queries. SMW is a mature project running on many large Mediawiki implementations, by well known organizations like NASA, OLPC, The Free Software Directory, semanticweb.org, to name just a few.

The OSGeo Wiki userMap

The original OSGeo Wiki userMap, implemented by me in 2008 during an internship at WhereGroup, is now broken because of dependencies of the not anymore supported Mediawiki extension called Simple Forms. The extension implemented a parser hook, that allowed to store the spatial locations of users in a PostGIS database. And parser hooks for including OpenLayers based map into wiki pages, displaying a users Location as well as a map of all were implemented in this first version of the userMap. The now deprecated documentaion is for now still available in the wiki, to get an overview.

SMW based OSGeo Members map

The SMW data model was developed using a tool called mobo. Due to using mobo, it is possible to develop and maintain an SMW data model from a central point in a consistent manner, enhancing maintainability, coordinating possible collaboration and also allowing to grow the Schema to additional applications and scopes over time. Mobo is a command line toolset that helps building Semantic MediaWiki structure in an agile, model driven engineering (MDE) way. The Schema is formulated applying the JSON-Schema specification, in JSON or YAML notation, in a defined folder structure considering file naming conventions. A bit similar to some MVC frameworks for building a web applications domain. The documaentation including a tutorial and examples of the mobo toolkit, can be found here.

The development code files of the mobo model are stored and published in a GitHub repository, for community review and allowing anyone to send pull requests for helping to improve the SMW based capabilities of the OSGeo Wiki.

It was even possible, to save the locations entered through the previous userMap implementation into the mentioned PostGIS table. This was possible by exporting the data from the PosGIS table as CSV, applying some Python foo on the CSV (especially on the geometry wkb notation using Shapely) and importing the data into the wiki as CSV, using the Mediawiki DataTransfer Extension.

Conclusion and Outlook

The application of SMW technology in the OSGeo wiki has, with the introduction of the OSGeo Members model, created a valuable directory that gives a nice overview of the OSGeo community. It is possible to extend the model in the future, to a directory of Charter Members, or OSGeo Advocates. This would yield sortable tables and of course maps of these contacts.

It is even possible to develop models for the Service Providers, to replace the sometimes hard to maintain current Service Provider directory, or for example a model of the Geo4All laboratories to generate directory and an according map. But one of my favorite possible models would be a model for an Open Geo Data directory in the OSGeo wiki.

All these models and the emerging directories would be collaboratively created and maintained by the OSGeo community by just editing the wiki. And not yet to speak of what is possible with the Mediawiki API for querying the structured data and getting the results nicely in JSON format, and by far not yet to speak of enabling the SPARQL-Endpoint which comes with Semantic Mediawiki.

So, the OSGeo Wiki has a bright future If we want. I will do my best for this goal.

Have fun!


 

comments powered by Disqus

Read and Post Comments

Modelling bibliographic records in Semantic MediaWiki using BibTeX schema and result format

| categories: webdev, semantic web, semantic mediawiki | View Comments

Disclaimer: This is a bit longish post about modelling Bibliographic Information in Semantic Mediawiki.

Semantic MediaWiki (SMW) supports to manage bibliographic records and deliver them in BibTeX format by using the Semantic Result Format BibTeX. This is a useful feature, if you understand how to implement it in your SMW instance, which is not trivial if you are not already an SMW expert. In this post I try to describe this modelling and implementation process.

Ok this last paragraph and the headline of this post contain a lot of maybe new information (for non SMW experts), which needs to be clarified first.

  • Bibliographic Record
  • BibTeX Format
  • SMW Semantic Result Formats
  • BibTeX Semantic Result Format

Bibliographic Record

A bibliographic record is an entity to reference a specific content item, which is in most cases an academic publication, for example a journal paper. Those bibliographic records mostly underlie a schema or formalism which is applied in a given context, for example references in an academic publication mostly follow a citation formalism defined by the publisher.

BibTeX Format

The BibTeX Format is a tool to model such citation formalisms, originating from the LaTeX community, to handle bibliographic records in LaTeX. Though, there does not exist any official specification of the BibTeX schema (aside from the BibTeX implementation in the LaTeX code base), but in the following we refer to the Wikipedia entry, which defines the schema in a sufficient way.

Semantic Result Format

Semantic Result Formats (SRF) is a SMW extension which allows to render the results of an SMW #ask query or inline query in a defined format.

BibTeX SRF

The BibTeX SRF allows to render bibliographic information, stored in an SMW instance, in BibTeX Format. Here are some demos of the BibTeX SRF.

Modelling BibTeX schema in SMW

In the mentioned Wikipedia article, the BibTeX schema is defined in bibliographic items, which are the basic attributes or properties of bibliographic entry types or classes.

Bibliographic Items

The bibliographic items are modelled as SMW properties. The BibTeX Wikipedia site defines 26 items, to which we add three more items. (1) keyword, to handle the keywords defined for the content of the publication as semantic properties. This has the advantage, that you can browse and filter for keywords in the constructed bibliographic database. And we define a property for (2) DOI and (3) ISBN, which are two well accepted unique identifier schemes for publications. This gives us the following list of bibliographic items:

  • address: Publisher's address (usually just the city, but can be the full address for lesser-known publishers)
  • annote: An annotation for annotated bibliography styles (not typical)
  • author: The name(s) of the author(s) (in the case of more than one author, separated by and)
  • booktitle: The title of the book, if only part of it is being cited
  • chapter: The chapter number
  • crossref: The key of the cross-referenced entry
  • DOI: Digital Object Identifier (www.doi.org)
  • edition: The edition of a book, long form (such as "First" or "Second")
  • editor: The name(s) of the editor(s)
  • eprint: A specification of an electronic publication, often a preprint or a technical report
  • howpublished: How it was published, if the publishing method is nonstandard
  • institution: The institution that was involved in the publishing, but not necessarily the publisher
  • ISBN: International Standard Book Number
  • journal: The journal or magazine the work was published in
  • key: A hidden field used for specifying or overriding the alphabetical order of entries (when the "author" and "editor" fields are missing). Note that this is very different from the key (mentioned just after this list) that is used to cite or cross-reference the entry.
  • keyword: Keyword(s) to tag/categorize the content of the publication
  • month: The month of publication (or, if unpublished, the month of creation)
  • note: Miscellaneous extra information
  • number: The "(issue) number" of a journal, magazine, or tech-report, if applicable. (Most publications have a "volume", but no "number" field.)
  • organization: The conference sponsor/host
  • pages: Page numbers, separated either by commas or double-hyphens.
  • publisher: The publisher's name
  • school: The school where the thesis was written
  • series: The series of books the book was published in (e.g. "The Hardy Boys" or "Lecture Notes in Computer Science")
  • title: The title of the work
  • type: The field overriding the default type of publication (e.g. "Research Note" for techreport, "{PhD} dissertation" for phdthesis, "Section" for inbook/incollection)
  • url: The WWW address
  • volume: The volume of a journal or multi-volume book
  • year: The year of publication (or, if unpublished, the year of creation)

You are free to extend this list with any item you want or which you think would be useful. For example an citation item, in which you store the complete Citation, as you would add it in a Bibliographic reference list at the end of an publication. I use the note item for this purpose, but...

Entry Types

The entry types are modelled as SRF classes holding the according properties (bibliographic items) in SMW. According to the Wikipedia BibTeX scheme we have 14 entry types, of which I show here the five most used:

Entry Type Description Required Items Optional Items
article An article from a journal or magazine. author, title, journal, year keywords, volume, number, pages, month, DOI, URL, note, key
book A book with an explicit publisher. author/editor, title, publisher, year keywords, volume, number, series, address, edition, month, ISBN, URL, note, key
inbook A part of a book, usually untitled. May be a chapter (or section or whatever) and/or a range of pages. author/editor, title, chapter/pages, publisher, year keywords, volume/number, series, type, address, edition, month, ISBN, URL, DOI, note, key
inproceedings An article in a conference proceedings. author, title, booktitle, year keywords, editor, volume/number, series, pages, address, month, organization, publisher, DOI, URL, ISBN, note, key
techreport A report published by a school or other institution, usually numbered within a series. author, title, institution, year keywords, type, number, address, month, DOI, URL, note, key

Implementation

For implementing the data structure in SMW, we use the Semantic Forms extension. Semantic Forms facilitates GUI's to create and edit structured data in SMW. Basically it allows users to add, edit and query data in SMW using forms.

The easiest way to implement the bibliographic data model is to use the Semantic Form "Create a Class". This creates all properties, forms, and templates automatically by filling out a form.

Screenshot of the "Create a Class" form, defining the BibBook class.

After filling out the Form and clicking on "create", you need to go to Special:SMWAdmin and run the "Start updating data", this triggers SMW to create all needed links, so you can find and work with the Forms and Templates in your wiki.

You can repeat this create class process for each Entry Type you want to implement. You need to enter all the bibliographic item properties again, so that the Forms and templates will contain them. The bibliographic item properties will not be duplicated if they already exist in SMW though.

Semantic Forms

Using the create class form SMW automatically created templates and forms to display and edit the data of the according class. The automatically created forms are fine, but with two minor edits you don't have to specify the Entry Types for each new item, which would be redundant, because we already defined the entry type through the class definition. In example we edit now the template Template:BibArticle and the form Form:BibArticle of the BibArticle class, to set the Entry Type automatically.

From the Form, we remove the

! BibType:
| {{{field|BibType}}}
|-
part, which would let the user enter a value for the BibType property, which we do not want in our model. The resulting form definition looks as follows:

Form:BibArticle

<noinclude>
This is the "BibArticle" form.
To create a page with this form, enter the page name below;
if a page with that name already exists, you will be sent to a form to edit that page.

{{#forminput:form=BibArticle}}
</noinclude><includeonly>
<div id="wikiPreview" style="display: none; padding-bottom: 25px; margin-bottom: 25px; border-bottom: 1px solid #AAAAAA;"></div>
{{{for template|BibArticle}}}
{| class="formtable"
! Author(s):
| {{{field|Author(s)}}}
|-
! Title:
| {{{field|Title}}}
|-
! Journal:
| {{{field|Journal}}}
|-
! Year:
| {{{field|Year}}}
|-
! Volume:
| {{{field|Volume}}}
|-
! Number:
| {{{field|Number}}}
|-
! Pages:
| {{{field|Pages}}}
|-
! Date:
| {{{field|Date}}}
|-
! DOI:
| {{{field|DOI}}}
|-
! URL:
| {{{field|URL}}}
|-
! Keyword(s):
| {{{field|Keyword(s)}}}
|-
! Key:
| {{{field|Key}}}
|-
! Note:
| {{{field|Note}}}
|}
{{{end template}}}

'''Free text:'''

{{{standard input|free text|rows=10}}}

{{{standard input|summary}}}

{{{standard input|minor edit}}} {{{standard input|watch}}}

{{{standard input|save}}} {{{standard input|preview}}} {{{standard input|changes}}} {{{standard input|cancel}}}
</includeonly>

In the template we set the BibType property statically, so that every BibArticle is of BibType::Article, we set [[BibType::Article]] as first entry. Additionally we set the category "Bibliographic Record" for the entry (last line), because every BibArticle is a Bibliographic Record. So you can later query for example for all Bibliographic Record's, yielding different entry types. See the following Template definition code:

Template:BibArticle

<noinclude>
This is the "BibArticle" template.
It should be called in the following format:
<pre>
{{BibArticle
|BibType=
|Author(s)=
|Title=
|Journal=
|Year=
|Volume=
|Number=
|Pages=
|Date=
|DOI=
|URL=
|Keyword(s)=
|Note=
|Key=
}}
</pre>
Edit the page to see the template text.
</noinclude><includeonly>{| class="wikitable"
! BibType
| [[BibType::Article]]
|-
! Author(s)
| {{#arraymap:{{{Author(s)|}}}|,|x|[[BibAuthor::x]]}}
|-
! Title
| [[BibTitle::{{{Title|}}}]]
|-
! Journal
| [[BibJournal::{{{Journal|}}}]]
|-
! Year
| [[BibYear::{{{Year|}}}]]
|-
! Volume
| [[BibVolume::{{{Volume|}}}]]
|-
! Number
| [[BibNumber::{{{Number|}}}]]
|-
! Pages
| [[BibPages::{{{Pages|}}}]]
|-
! Date
| [[BibDate::{{{Date|}}}]]
|-
! DOI
| [[BibDOI::{{{DOI|}}}]]
|-
! URL
| [[BibURL::{{{URL|}}}]]
|-
! Keyword(s)
| {{#arraymap:{{{Keyword(s)|}}}|,|x|[[BibKeyword::x]]}}
|-
! Note
| [[BibNote::{{{Note|}}}]]
|-
! Key
| [[BibKey::{{{Key|}}}]]
|}

[[Category:BibArticle]]
[[Category:Bibliographic Record]]
</includeonly>

Here you can find further examples and the sources of more Entry Type definition.

Authoring and editing bibliographic data

All authoring and editing is facilitated by the Forms we have created for the entry type classes. You can create new entries as well as editing existing entries using those forms.

Screenshot of the form for editing BibArticle entries.

Conclusion

In this post, the implementation of a bibliographic model in SMW was described in detail. You can find this implementation in my SMW instance, where you can look at the details I may forgot to mention here.

The actual use of the SMW based bibliographic data base will be described in an upcoming blog post soon. There I will dig into the powerful browsing, filtering and data rendering capabilities of SMW.

I hope this post helps some people getting their heads around the SMW concept, which can be kind of complex... As I heard of SMW first, it was immediately clear to me that this is a very powerful technology, which makes much sense. But I had to chew a bit on all of the concepts before it worked for me (after much of trial and error)...

Have fun!


 

comments powered by Disqus

Read and Post Comments

RDFa content markup

| categories: webdev, semantic web | View Comments

In this post I document parts of the RDFa markup used on this website. Because I am convinced of Semantic Web technology as the future of the web, I am keen to apply this techniques on my own website. I aimed to apply the W3C recommended RDFa primer best practices techniques. In the following it is described how these best practices are implemented on this site. This post does not aim to teach the concept of RDFa, for this please see the links under Resources, a short overview is given though.

What is RDFa

Resource Description Framework in Attributes (RDFa) is a technique that facilitates adding structured (machine readable) data to HTML using a set of attributes (see Table 1). Sometimes RDFa is also referred to as a CSS for meaning.

attribute description
about a URI or CURIE specifying the resource the metadata is about
content optional attribute that overrides the content of the element when using the property attribute
datatype optional attribute that specifies the datatype of text specified for use with the property attribute
property specifying a property for the content of an element or the partner resource
rel and rev specifying a relationship and reverse-relationship with another resource, respectively
src, href and resource specifying the partner resource
typeof optional attribute that specifies the RDF type(s) of the subject or the partner resource (the resource that the metadata is about)
Table 1: The RDFa attributes. (Source) http://en.wikipedia.org/wiki/RDFa

To be able to provide well defined ("standardized") semantics for your content markup, it is favorable to re-use exisiting vocabularies. This will also increase the interoperability of the data marked up this way. Here you can find a list of officially (W3C) recommended/supported vocabularies for RDFa markup.

Implementation of RDFa within this website

If you want to develop RDFa markup for a website, it is really helpful to use some tools to validate the markup you wrote. A tool which I can recommend is RDFa Developer, it is available as an Add-On for the Firefox web browser. This Add-On provides a window at the bottom of the page displaying warnings and errors for the RDFa markup, much like the famous Firebug Add-On for client side webdevelopment. Additionally it provides a usefull view of the RDF triples modelled in the site and also a SPARQL interface to query that data. The other tool I can recommend, is of course the official W3C RDFa Validator. The validator provides, additionaly to warnings, errors and "informational messages" a statement if the validated document is a valid RDFa resource or not.

The vocabularies used to markup content within the site are defined as XML Namespaces (xmlns) within the <html> tag.

<html  xmlns="http://www.w3.org/1999/xhtml"
xmlns:xhv="http://www.w3.org/1999/xhtml/vocab/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:bibo="http://purl.org/ontology/bibo/"
version="XHTML+RDFa 1.0" xml:lang="en">

Listing 1: Vocabularies included in the <html>-tag.

This part shows, the vocabularies included for RDFa markup on this website. For standard XHTML properties used in property-tags, I assigned a prefix "xhv". This is normaly not necessary, but I find this increases clarity of the semantics, i.e. it clearly defines the vocabulary the property is defined in.

Blog

The RDFa markup of a blog entry is pretty simple. As you can see in the listing, I used just two DCTerms, "title" and "created", to add semantics to blog posts.

<h2 class="blog_post_title" property="dc:title">
<a href="..." rel="bookmark">
RDFa content markup
</a>
</h2>
[...]
<span class="..." property="dc:created">August 01, 2013 at 03:23 </span> | categories:
[...]

Listing 2: RDFa markup for blog posts.

About

For the informal about page, I used the vCard vocabulary, to markup the contact information. See listing 3:

<div xmlns:v="http://www.w3.org/2006/vcard/ns#">    
<div about="http://cwillmes.de/about.html" typeof="v:VCard">
<span property="v:fn">Chrsitian Willmes</span>
<div rel="v:adr">
<div typeof="v:Address v:Home">
<span property="v:street-address">Stammstr. 26</span>,
<span property="v:postal-code">50823</span>,
<span property="v:locality">K&ouml;ln</span>,
<span property="v:country-name">Germany</span>.
</div>
</div>
email:
<a rel="v:email" href="mailto:c.willmes@uni-koeln.de">c.willmes@uni-koeln.de</a>
<span property="v:key" rel="link">(<a href="Christian_Willmes_c.willmes@uni-koeln.de_(0x2FA87ED4)_pub.asc" target="_blank">PGP Public Key</a>)</span>.<br/>
See <span property="v:nickname">@cwillmes</span> Box at the right sidebar for more contact options.
</div>
</div>

Listing 3: RDFa vCard markup.

Curriculum vitae

On the CV page, I only marked up the publications, so far. I will markup the other structured Infromation also, but this is a bit more tricky, because I could not find good vocabularies for this purpose yet. (Please send me recommendations, If you have some.)

Publications

The publication entries of my academic record are marked up using the Bibliographic Ontology (Bibo). This ontology provides an comprehensive vocabulary to describe scientific publications. See listing 4 for an example publication, with RDFa bibo annotations.

<span typeof="bibo:Article">
<span property="bibo:authorList">
<span property="dc:creator">Willmes, C.</span>,
<span property="dc:creator">Brocks, S.</span>,
<span property="dc:creator">Hoffmeister, D.</span>,
<span property="dc:creator">H&uuml;tt, C.</span>,
<span property="dc:creator">K&uuml;rner, D.</span>,
<span property="dc:creator">Volland, K.</span>,
<span property="dc:creator">Bareth, G.</span>
</span>
(<span property="dc:date" datatype="xsd:gYear">2012</span>):
<span property="dc:title">Facilitating integrated spatio-temporal visualization and analysis of heterogeneous archaelogical and palaeonvironmental research data.</span>
<span prperty="bibo:Journal">ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci.</span>
<span property="bibo:Issue">I-2</span>,
<span property="bibo:pages">
<span property="bibo:pageStart">223</span>-
<span property="bibo:pageEnd">228</span>
</span>. DOI:
<span property="bibo:doi">10.5194/isprsannals-I-2-223-2012</span>.
</span>

Listing 4: RDFa markup of bibliographic information.

Conclusion

I am aware, that the RDFa markup I implemented is not perfect. But hey, better doing some steps in the right direction than nothing. I will do follow up posts with extensions and improvements to the RDFa implementation for this site from time to time. I am happy about any feedback regarding the RDFa markup I used and I hope I could provide some helpful things to others who want to implement RDFa on their websites.

Have fun!
Christian



 

comments powered by Disqus

Read and Post Comments