The Map Is Not The Territory

A blog by Christian Willmes.

Modelling bibliographic records in Semantic MediaWiki using BibTeX schema and result format

| categories: webdev, semantic web, semantic mediawiki | View Comments

Disclaimer: This is a bit longish post about modelling Bibliographic Information in Semantic Mediawiki.

Semantic MediaWiki (SMW) supports to manage bibliographic records and deliver them in BibTeX format by using the Semantic Result Format BibTeX. This is a useful feature, if you understand how to implement it in your SMW instance, which is not trivial if you are not already an SMW expert. In this post I try to describe this modelling and implementation process.

Ok this last paragraph and the headline of this post contain a lot of maybe new information (for non SMW experts), which needs to be clarified first.

  • Bibliographic Record
  • BibTeX Format
  • SMW Semantic Result Formats
  • BibTeX Semantic Result Format

Bibliographic Record

A bibliographic record is an entity to reference a specific content item, which is in most cases an academic publication, for example a journal paper. Those bibliographic records mostly underlie a schema or formalism which is applied in a given context, for example references in an academic publication mostly follow a citation formalism defined by the publisher.

BibTeX Format

The BibTeX Format is a tool to model such citation formalisms, originating from the LaTeX community, to handle bibliographic records in LaTeX. Though, there does not exist any official specification of the BibTeX schema (aside from the BibTeX implementation in the LaTeX code base), but in the following we refer to the Wikipedia entry, which defines the schema in a sufficient way.

Semantic Result Format

Semantic Result Formats (SRF) is a SMW extension which allows to render the results of an SMW #ask query or inline query in a defined format.

BibTeX SRF

The BibTeX SRF allows to render bibliographic information, stored in an SMW instance, in BibTeX Format. Here are some demos of the BibTeX SRF.

Modelling BibTeX schema in SMW

In the mentioned Wikipedia article, the BibTeX schema is defined in bibliographic items, which are the basic attributes or properties of bibliographic entry types or classes.

Bibliographic Items

The bibliographic items are modelled as SMW properties. The BibTeX Wikipedia site defines 26 items, to which we add three more items. (1) keyword, to handle the keywords defined for the content of the publication as semantic properties. This has the advantage, that you can browse and filter for keywords in the constructed bibliographic database. And we define a property for (2) DOI and (3) ISBN, which are two well accepted unique identifier schemes for publications. This gives us the following list of bibliographic items:

  • address: Publisher's address (usually just the city, but can be the full address for lesser-known publishers)
  • annote: An annotation for annotated bibliography styles (not typical)
  • author: The name(s) of the author(s) (in the case of more than one author, separated by and)
  • booktitle: The title of the book, if only part of it is being cited
  • chapter: The chapter number
  • crossref: The key of the cross-referenced entry
  • DOI: Digital Object Identifier (www.doi.org)
  • edition: The edition of a book, long form (such as "First" or "Second")
  • editor: The name(s) of the editor(s)
  • eprint: A specification of an electronic publication, often a preprint or a technical report
  • howpublished: How it was published, if the publishing method is nonstandard
  • institution: The institution that was involved in the publishing, but not necessarily the publisher
  • ISBN: International Standard Book Number
  • journal: The journal or magazine the work was published in
  • key: A hidden field used for specifying or overriding the alphabetical order of entries (when the "author" and "editor" fields are missing). Note that this is very different from the key (mentioned just after this list) that is used to cite or cross-reference the entry.
  • keyword: Keyword(s) to tag/categorize the content of the publication
  • month: The month of publication (or, if unpublished, the month of creation)
  • note: Miscellaneous extra information
  • number: The "(issue) number" of a journal, magazine, or tech-report, if applicable. (Most publications have a "volume", but no "number" field.)
  • organization: The conference sponsor/host
  • pages: Page numbers, separated either by commas or double-hyphens.
  • publisher: The publisher's name
  • school: The school where the thesis was written
  • series: The series of books the book was published in (e.g. "The Hardy Boys" or "Lecture Notes in Computer Science")
  • title: The title of the work
  • type: The field overriding the default type of publication (e.g. "Research Note" for techreport, "{PhD} dissertation" for phdthesis, "Section" for inbook/incollection)
  • url: The WWW address
  • volume: The volume of a journal or multi-volume book
  • year: The year of publication (or, if unpublished, the year of creation)

You are free to extend this list with any item you want or which you think would be useful. For example an citation item, in which you store the complete Citation, as you would add it in a Bibliographic reference list at the end of an publication. I use the note item for this purpose, but...

Entry Types

The entry types are modelled as SRF classes holding the according properties (bibliographic items) in SMW. According to the Wikipedia BibTeX scheme we have 14 entry types, of which I show here the five most used:

Entry Type Description Required Items Optional Items
article An article from a journal or magazine. author, title, journal, year keywords, volume, number, pages, month, DOI, URL, note, key
book A book with an explicit publisher. author/editor, title, publisher, year keywords, volume, number, series, address, edition, month, ISBN, URL, note, key
inbook A part of a book, usually untitled. May be a chapter (or section or whatever) and/or a range of pages. author/editor, title, chapter/pages, publisher, year keywords, volume/number, series, type, address, edition, month, ISBN, URL, DOI, note, key
inproceedings An article in a conference proceedings. author, title, booktitle, year keywords, editor, volume/number, series, pages, address, month, organization, publisher, DOI, URL, ISBN, note, key
techreport A report published by a school or other institution, usually numbered within a series. author, title, institution, year keywords, type, number, address, month, DOI, URL, note, key

Implementation

For implementing the data structure in SMW, we use the Semantic Forms extension. Semantic Forms facilitates GUI's to create and edit structured data in SMW. Basically it allows users to add, edit and query data in SMW using forms.

The easiest way to implement the bibliographic data model is to use the Semantic Form "Create a Class". This creates all properties, forms, and templates automatically by filling out a form.

Screenshot of the "Create a Class" form, defining the BibBook class.

After filling out the Form and clicking on "create", you need to go to Special:SMWAdmin and run the "Start updating data", this triggers SMW to create all needed links, so you can find and work with the Forms and Templates in your wiki.

You can repeat this create class process for each Entry Type you want to implement. You need to enter all the bibliographic item properties again, so that the Forms and templates will contain them. The bibliographic item properties will not be duplicated if they already exist in SMW though.

Semantic Forms

Using the create class form SMW automatically created templates and forms to display and edit the data of the according class. The automatically created forms are fine, but with two minor edits you don't have to specify the Entry Types for each new item, which would be redundant, because we already defined the entry type through the class definition. In example we edit now the template Template:BibArticle and the form Form:BibArticle of the BibArticle class, to set the Entry Type automatically.

From the Form, we remove the

! BibType:
| {{{field|BibType}}}
|-
part, which would let the user enter a value for the BibType property, which we do not want in our model. The resulting form definition looks as follows:

Form:BibArticle

<noinclude>
This is the "BibArticle" form.
To create a page with this form, enter the page name below;
if a page with that name already exists, you will be sent to a form to edit that page.

{{#forminput:form=BibArticle}}
</noinclude><includeonly>
<div id="wikiPreview" style="display: none; padding-bottom: 25px; margin-bottom: 25px; border-bottom: 1px solid #AAAAAA;"></div>
{{{for template|BibArticle}}}
{| class="formtable"
! Author(s):
| {{{field|Author(s)}}}
|-
! Title:
| {{{field|Title}}}
|-
! Journal:
| {{{field|Journal}}}
|-
! Year:
| {{{field|Year}}}
|-
! Volume:
| {{{field|Volume}}}
|-
! Number:
| {{{field|Number}}}
|-
! Pages:
| {{{field|Pages}}}
|-
! Date:
| {{{field|Date}}}
|-
! DOI:
| {{{field|DOI}}}
|-
! URL:
| {{{field|URL}}}
|-
! Keyword(s):
| {{{field|Keyword(s)}}}
|-
! Key:
| {{{field|Key}}}
|-
! Note:
| {{{field|Note}}}
|}
{{{end template}}}

'''Free text:'''

{{{standard input|free text|rows=10}}}

{{{standard input|summary}}}

{{{standard input|minor edit}}} {{{standard input|watch}}}

{{{standard input|save}}} {{{standard input|preview}}} {{{standard input|changes}}} {{{standard input|cancel}}}
</includeonly>

In the template we set the BibType property statically, so that every BibArticle is of BibType::Article, we set [[BibType::Article]] as first entry. Additionally we set the category "Bibliographic Record" for the entry (last line), because every BibArticle is a Bibliographic Record. So you can later query for example for all Bibliographic Record's, yielding different entry types. See the following Template definition code:

Template:BibArticle

<noinclude>
This is the "BibArticle" template.
It should be called in the following format:
<pre>
{{BibArticle
|BibType=
|Author(s)=
|Title=
|Journal=
|Year=
|Volume=
|Number=
|Pages=
|Date=
|DOI=
|URL=
|Keyword(s)=
|Note=
|Key=
}}
</pre>
Edit the page to see the template text.
</noinclude><includeonly>{| class="wikitable"
! BibType
| [[BibType::Article]]
|-
! Author(s)
| {{#arraymap:{{{Author(s)|}}}|,|x|[[BibAuthor::x]]}}
|-
! Title
| [[BibTitle::{{{Title|}}}]]
|-
! Journal
| [[BibJournal::{{{Journal|}}}]]
|-
! Year
| [[BibYear::{{{Year|}}}]]
|-
! Volume
| [[BibVolume::{{{Volume|}}}]]
|-
! Number
| [[BibNumber::{{{Number|}}}]]
|-
! Pages
| [[BibPages::{{{Pages|}}}]]
|-
! Date
| [[BibDate::{{{Date|}}}]]
|-
! DOI
| [[BibDOI::{{{DOI|}}}]]
|-
! URL
| [[BibURL::{{{URL|}}}]]
|-
! Keyword(s)
| {{#arraymap:{{{Keyword(s)|}}}|,|x|[[BibKeyword::x]]}}
|-
! Note
| [[BibNote::{{{Note|}}}]]
|-
! Key
| [[BibKey::{{{Key|}}}]]
|}

[[Category:BibArticle]]
[[Category:Bibliographic Record]]
</includeonly>

Here you can find further examples and the sources of more Entry Type definition.

Authoring and editing bibliographic data

All authoring and editing is facilitated by the Forms we have created for the entry type classes. You can create new entries as well as editing existing entries using those forms.

Screenshot of the form for editing BibArticle entries.

Conclusion

In this post, the implementation of a bibliographic model in SMW was described in detail. You can find this implementation in my SMW instance, where you can look at the details I may forgot to mention here.

The actual use of the SMW based bibliographic data base will be described in an upcoming blog post soon. There I will dig into the powerful browsing, filtering and data rendering capabilities of SMW.

I hope this post helps some people getting their heads around the SMW concept, which can be kind of complex... As I heard of SMW first, it was immediately clear to me that this is a very powerful technology, which makes much sense. But I had to chew a bit on all of the concepts before it worked for me (after much of trial and error)...

Have fun!


 

comments powered by Disqus

blog comments powered by Disqus