Semantic Wikis

Semantic wikis combine wikis that enable simple and quick collaborative text editing over the Web and Semantic Web that enriches the data on the Web with well-defined meaning to provide easier way to find, share, and combine information. Semantic wiki extends a classical wiki by integrating it with the management capabilities for the formal knowledge representations. The long-term goal of semantic wiki should be to provide well structured knowledge representations and sound reasoning based on these representations in a user friendly way.

We can classify semantic wikis into two groups [5]:

  • Text-centered
  • Logic-centered

The text-centered semantic wikis enrich classical wiki environments with semantic annotations relating the textual content to a formal ontology. These wikis are text oriented. The goal of these semantic wikis is not to manage ontologies but to provide a formal backbone to wiki articles. Semantic MediaWiki, KiWi (continuation of IkeWiki), and KnowWE are examples of this semantic wiki type.

The logic-centered semantic wikis are designed and used as ontology engineering platforms. AceWiki and OntoWiki are examples of this semantic wiki type.

There are other applications that also exhibit semantic wiki characteristics (e.g. Freebase, Twine, Knoodl, and others).

Text-Centered Wikis

We will describe some key features of three text-centered wikis: Semantic MediaWiki, KiWi, and KnowWE.

Semantic MediaWiki (SMW)

SMW (Fig. 1) [7] (http://semantic-mediawiki.org/wiki/Semantic_MediaWiki) is an extension to MediaWiki that enables wiki users to semantically annotate wiki pages, based on which the wiki contents can be browsed, searched, and reused in novel ways. RDF and OWL are used in the background to formally annotate information in wiki pages.

The integration between MediaWiki and SMW is based on MediaWiki’ extentsion mechanism: SMW registers for certain events or requests, and MediaWiki calls SMW functions when needed.

SMW organizes content within wiki pages. These pages are further classified into namespaces which differentiate pages according to their function. The namespaces are defined through the wiki configuration. They cannot be defined by users. Examples of namespace are: “User:” for user home pages, “Help:” for documentation pages, etc. Every page belongs to an ontological element (including classes and properties) that can be further described by annotations on that page. The semantic roles that wiki pages can play are distinguished by the namespaces. The wiki pages can be:

  • Individual elements of a domain of interest
  • Categories (used to classify individual elements and to create sub-categories)
  • Properties (relationships between two pages or a page and a data value)
  • Types (used to distinguish various kinds of properties)

Each page in SMW can be assigned to one or more categories where each category is associated with a page in the “Category:” namespace. Category pages can be used to browse the classified pages and to organize categories hierarchically.

SMW collects information about the concept represented by a wiki page, not about the associated text. SMW collect semantics data via semantic annotations (markups) added to the wiki text by users. The markups processing is done by the SMW components for parsing and rendering (Fig.1).

Figure 1: Semantic MediaWiki Architecture (copied from [7])

The underlying SMW semantic conceptual framework based on properties and types is the core component of the SMW’s semantic processing. Properties are used to specify relationships between one entity (as represented by a wiki page) and other entities and data values. SMW lets wiki users control the set of available properties since each community is interested in different types of relationships in its domain of interest. Properties are used to augment a wiki page content in a structured way. SMW characterises hyperlinks between wiki pages as properties (relationships) where the link’s target becomes the value of a user-provided property. It does not mean that all properties take links’ targets as their values. The properties’ values could also be in a form of geographical coordinates, numeric values, dates, etc.

SMW also provides use of a special type of a wiki page just for properties. For example, a wiki might contain a page “Property:Population” where “Property:” is a namespace prefix. A property page can contain textual description of the page, data type for property values, etc. SMW provides a number of data types that can be used with properties (e.g. “String”, “Date”, “Geographic coordinate”, etc.).

Semantic annotations of a subject described by a wiki page are mapped to OWL DL [8] ontology language. Most annotations are mapped to OWL statements similar to RDF triples: wiki pages to abstract individuals, properties to OWL properties, categories to OWL classes, and property values to either abstract individuals or typed literals. Since OWL further distinguishes object properties, data properties and annotation properties, SMW properties can map to any of those depending on their type. SMW also provides built-in properties that may have a special semantic meaning. SMW can also be configured to interpret MediaWiki hierarchial organisation of categories as an OWL class hierarchy.

SMW is not intended as a general purpose ontology platform and because of that the semantic information representable in SMW is of limited scope.

SMW has a proprietary query language which syntax is closely related to wiki text and its semantics corresponds to curtain class expressions in OWL DL.

Generally speaking queries have one of the highest performance cost in any system. It is the same with SMW.  SMW has features and best practice recommendations to help users manage query performances: caching mechanism, SMW parameters to restrict query processing and complexity, limited size of query constructs, individual reasoning features disabled, etc. SMW uses two separate data stores, one for MediaWiki pages and another one for semantic data related to subjects (concepts) described by these pages. Both stores are based on MySQL database. However, the MySQL semantic data store can be replaced by faster data stores if they are available.

KiWi

KiWi (Knowledge in a Wiki) [6] (continuation of IkeWiki) (http://www.kiwi-project.eu) aims at providing a collaborative knowledge management based on semantic wikis. It augments existing informal articles (e.g. from Wikipedia) with formal annotations.

KiWi provides a platform for building different kinds of social semantic software powered by Semantic Web technologies. It enables content versatility what is the reuse of the same content in different kinds of social software applications.

In KiWi, every piece of information is a combination of human-readable content and associated metadata. The same piece of information can be presented to the user in many different forms such as a wiki page, a blog post, a comment to a blog, etc.. The display of the information is determined by the metadata of the content and a context in which the content is used (e.g., user preferences, device used, type of application, etc.). Since metadata in KiWi is represented by RDF it does not require a-priori schema definitions and the meta-model of a system can be extended in real-time.

“Content item” is the smallest piece of information in KiWi. It consits of human readable content in XHTML, associated metadata in RDF and it is identified by a URI. KiWi creates, stores, updates, versions, searches, and queries content items. The core properties of a content item (e.g. content, author, and creation date) are represented in XML and persisted in a relational database, all other properties can be defined using RDF properties and relations.  It is possible to make a KiWi system part of the Linked Open Data cloud based on the way how the content item’s URI is generated.

With regards to search, KiWi supports a combination of full-text search, metadata search (tags, types, persons), and database search (date, title).

The KiWi (Fig. 2) platform is structured into layers: Model Layer,Service Layer, Controller Layer, View Layer.

Figure 2: KiWi Architecture (copied from [6])

The Model Layer manages content and its related metadata. Itis implemented via a relational database, a triple store, and a full-text index. Entities are persisted (in a relational database) using the Hibernate framework7. The KiWi triple (RDF) store is an implementation based on the relational database. The full-text index is implemented using Hibernate Search.

The Service Layer provides services to upper layers. For example, the EntityManager service provides a unified access to content items in an entity database while the TripleStoreService provides a unified access to an RDF store.

The controller layer includes action components that implement a specific Kiwi functionality. Action components mostly implement functionalities offered in the user interface (e.g. view, edit, annotate, etc.). They use service components to access the Kiwi content and metadata.

The view layer enables user interactions with Kiwi via browser and it also offers web services for accessing the triple store and SKOS thesauruses. There is also a linked open data service that provides the KiWi content to linked open data clients.

KiWi core data model includes three concepts: Content Item, Tag, and Triple. Additional functionality can be added by KiWi Facades. The Content Item is a core concept in KiWi. It represents a “unit of information”. A user always interacting with a primary Content Item. Content Items could be any type of content (e.g. wiki page, user profile, rule definition, etc.). KiWi is not restricted to specific content formats. All Content Items are identified by Uniform Resource Identifiers (URIs). The textual or media content that belongs to a resource is for human consumption. Generally speaking each resource has both machine readable and human readable form (description). Machine readable content (semantic data) is represented as RDF and stored in a triple (RDF) store. Human readable (text) content is internally structured as an XML document that can be queried and transformed to orther representations like HTML, XSL-FO (for PDF and other printable formats). Tags are used to annotate Content Items. For example, they can be used to associate a Content Item with a specific topic or to group Content Items in knowledge spaces. Tags are mapped to an RDF structure. Machine readable metadata is stored in a form of extended Triples that contains additional information related to internal maintenance (e.g. versioning, transactions, associations between triples and other resources in KiWi, etc.).

KnowWE

KnowWE (http://www.is.informatik.uni-wuerzburg.de/en/research/applications/knowwe/ ) [3,4] is a knowledge wiki. In a semantic wiki every wiki page represents a distinct concept from a specific domain of interest. Knowledge wikis further represent a possible solution with every wiki page. On every page the content is described by semantically annotated text and/or by multimedia content (e.g., pictures, diagrams, etc.). The embedded knowledge can be used to derive the concept related to the particular wiki article.

In KnowWE the knowledge base is entered together with the standard text by using appropriate textual knowledge markups. When a wiki page is saved, the included markups are extracted and compiled into an executable knowledge base corresponding to the wiki page and stored in a knowledge base repository.

In KnowWE, a user is able to start an interactive interview by entering a problem description. When the user enters his/her inputs, an appropriate list of solutions is presented. These solutions are linked to the wiki pages representing the presented solution concepts. Every solution represented in the wiki is considered during the problem-solving process.

KnowWE uses a problem-solving ontology as an upper ontology of all concepts defined in an application project. All concepts and properties are inherited by concepts and properties of the upper ontology.

A new solution is added to KnowWE by creating a new wiki page having the solution’s name. The wiki page contains human readable text and explicit knowledge for deriving the new solution.

KnowWE is intended for small closed communities and semi-open medium-sized communities. It is based on the implementation of JSPWiki. Its parsing engines and problem-solvers are based on the d3web project [9].

Logic-Centered Wikis

AceWiki and OntoWiki will be presented below as examples of logic-centered wikis.

AceWiki

AceWiki (http://attempto.ifi.uzh.ch/acewiki/) uses the controlled natural language Attempto Controlled English (ACE) for representing its content. The logic-centered (ontology engineering) semantic wikis’ goal is to make acquisition, maintenance, and analysis of formal knowledge simpler and faster. According to the AceWiki team, most of the existing semantic wikis “have a very technical interface and are restricted to a relatively low level of expressivity” [5]. AceWiki provides two advantages: first, improve usability and achieve a shallow learning curve since the controlled natural language is used; second, ACE is more expressive than existing semantic formal languages (e.g. RDF, OWL, etc.).

The use of a controlled natural language allows ordinary users who are not experts in ontologies and formal semantic languages to create, understand, and modify formal wiki content. The main goals of AceWiki are to improve knowledge aggregation, knowledge representation, and to support a higher degree of expressivity. The key design principles of AceWiki are: naturalness (formal semantics is in a form of a natural language), uniformity (only one language is used at the user interface layer), and strict user guidance (a predictive editor enables well-formed statements created by users).

All other semantic wikis used as ontology engineering platforms (e.g. OntoWiki and others) allow creation of formal statements (annotations, metadata) that are not considered the main content but rather the enrichment of it which is not the case when the controlled natural language is used.

ACE looks like English but it is fully controlled without ambiguities of natural language. Every ACE text has a well-defined formal meaning since the ACE parser translates the ACE text into Discourse Representation Structures which are a syntactical variant of the  first-order logic. Semantic Web generally classifies semantic languages into three main high-level categories: ontology languages (e.g. OWL), rule languages (e.g. SWRL, RIF), and query languages (e.g. SPARQL). ACE plays role of all these languages. AceWiki translates ACE sentences into OWL what enables reasoning with existing OWL reasoners.

OntoWiki

OntoWiki [1] (http://ontowiki.net/Projects/OntoWiki) provides support for distributed knowledge engineering scenarios. OntoWiki does not mix text editing with knowledge engineering. It applies the wiki paradigm to knowledge engineering only. OntoWiki facilitates the visual presentation of a knowledge base as an information map, with different views on instance data. It enables authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWIG for text documents. OntoWiki regards RDF-based knowledge bases as “information maps”. Each node (represented by an OntoWiki Web page) in the information map is interlinked with related digital resources. The details of the OntoWiki Web page structure are provided in [1].

OntoWiki supports collaboration aspects through tracking changes, allowing comments and discussions on every part of a knowledge base, enabling to rate and measure the popularity of content and honoring the activity of users.

The main goal of the OntoWiki approach is to rapidly simplify the presentation and acquisition of knowledge.

OntoWiki pages that represent nodes in the information map, are divided into three sections: a left sidebar, a main content section and a right sidebar. The left sidebar offers selections that include knowledge bases, a class hierarchy, and a full-text search. When a selection is made, the main content section will show matching content in a list view linking to individual views for matching nodes. The right sidebar provides tools and complementary information specific to the selected content.

OntoWiki provides reusable user interface components for data editing, called widgets.  These are some of the provided widgets: Statements to edit subjects, predicates, and objects, Nodes to edit literals or resources, File to upload files, etc.

Solical collaborations within OntoWiki is one of its main characteristics. According to OntoWiki this eases the exchange of meta-information about the knowledge base and promotes collaboration scenarios where face-to-face communication is hard. This also contributes in creating  an ”architecture of participation” that enables users to add value to the system as they use it. The main social collaboration features of OntoWiki are:

  • Change tracking – All changes applied to a knowledge base are tracked.
  • Commenting –  Statements presented to the user can be annotated, and commented.
  • Rating –  OntoWiki allows to rate instances.
  • Popularity – All knowledge base accesses are logged what allows to arrange views on the content based on popularity.
  • Activity/Provenance – OntoWiki keeps track of what was contributed and by whom.

OntoWiki has been implemented as an alternative user interface for the schema editor in pOWL [2] that is a platform for Semantic Web application development.

OntoWiki is designed to work with knowledge bases of arbitrary size. OntoWiki loads only those parts of the knowledge base into main memory which are required to display the requested information.

References

1. Auer, S., Dietzold, S., Lehmann, J., Riechert, T.: OntoWiki: A Tool for Social, Semantic Collaboration. CKC, 2007. http://www2007.org/workshops/paper_91.pdf
2. Auer, S.: pOWL – A Web Based Platform for Collaborative Semantic Web Development. http://powl.sourceforge.net/overview.php
3. Baumeister, J., Reutelshoefer, J., Puppe, F.: KnowWE: A Semantic Wiki for Knowledge Engineering. In: Applied Intelligence (2010)
4. Baumeister, J.; Puppe, F.: Web-based Knowledge Engineering using Knowledge Wikis. Proc. of the AAAI 2008 Spring Symposium on “Symbiotic Relationships between Semantic Web and Knowledge Engineering”, pp. 1-13, Stanford University, USA, 2008. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.143.955&rep=rep1&type=pdf
5. Kuhn, T.: How Controlled English can Improve Semantic Wikis. Proceedings of the Fourth Workshop on Semantic Wikis. European Semantic Web Conference 2009, CEUR Workshop Proceedings, 2009. http://attempto.ifi.uzh.ch/site/pubs/papers/semwiki2009_kuhn.pdf
6. Schaffert, S., Eder, J., Grünwald, S., Kurz, T., Radulescu, M., Sint, R., Stroka, S.: KiWi – A Platform for Semantic Social Software. In: 4th Workshop on Semantic Wikis (SemWiki2009) at ESWC09, Heraklion, Greece, June 2009.
7. Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic Wikipedia. In Journal of Web Semantics 5/2007, pp. 251–261. Elsevier 2007. http://korrekt.org/page/Semantic_Wikipedia_(JWS2007)
8. W3C: OWL, http://www.w3.org/standards/techs/owl#w3c_all
9.d3web, http://sourceforge.net/apps/mediawiki/d3web/index.php?title=Main_Page

Advertisements

Uniform Resource Identifier (URI) – Definition and Types

Uniform Resource Identifier (URI) is one of the most important concepts of Semantic Web. An introduction of knowledge representation technologies and inference to operate over Web resources identified and indirectly defined via Uniform Resource Identifiers (URIs) is one of the key features of the Semantic Web (as stated in [3]).

URI Definition

A Uniform Resource Identifier (URI) is a string of characters that is about naming, identifying, addressing, and defining resources.

What really matters here is what we can do with a URI [2]. If we can dereference the URI it will indirectly give us authoritative information about a resource it identifies. On the other hand, the URI is also useful to others if it uniquely identifies the resource even if the resource is not completely described. This unique resource identification enables others to provide more information about the resource and it creates a network effect.

URIs were originally defined as two types:

  • Uniform Resource Locators (URLs) which are addresses with network locations, and
  • Uniform Resource Names (URNs) which are persistent names that are address independent.

A URL is a URI that identifies a network-homed resource and also specifies the means of acting upon or obtaining the representation, either through description of the primary access mechanism, or through network “location”. For example, the URL http://www.yahoo.com identifies a resource (Yahoo’s home page). It also implies that a representation of that resource (Yahoo home page’s current HTML code) is obtainable via HTTP from a network host named http://www.yahoo.com.

A Uniform Resource Name (URN) is a URI that identifies a resource by name, in a particular namespace. A URN can be used to identify a resource without implying its location or how to access it. For example, the URN urn:isbn:0-112-99333-8 is a URI that specifies the unique reference within the International Standard Book Number (ISBN) identifier system. It references a book, but doesn’t suggest where and how to obtain an actual copy of the book.

The URN defines a resource’s identity, while the URL provides a method for finding it.

Today most of the informational sources about URI reference URI only. For example, URL now serves only as a reminder that the some URIs act as addresses because they have schemes that imply some kind of network accessibility.

Clear Up Confusion Between the Uniform and Universal

Originally Tim Berners Lee used the word “Universal” in naming the Universal Resource Identifier (URI). Later on, the publication of RFC 2396 [4] in August 1998 changed the significance of the “U” in “URI” from “Universal” to “Uniform”. Unfortunately these two words are mixed in URI articles and papers today. To be consistent we will reference URI as Uniform Resource Identifier.

Resources and URIs

To publish data, we first have to identify the items of interest in our domains. The items of interest are the things whose attributes (properties) and relationships we want to describe in the data. We refer to these items of interest as resources.

We distinguish between two types of resources:

  • Information resources
  • Non-information resources

Information resources are files located on the Web (Internet and/or Intranet) and they include documents, images, and other media files.

Non-information resources are real-world resources that exist outside of the Web. The non-information resources can be classified into two groups: physical objects (i.e., people, books, buildings, etc.) and abstract concepts (i.e., color, height, weight, etc.).

Resource Identification

Each resource can be identified using a URI. We recommend that HTTP URIs are used only and to avoid other URI schemes such as URNs.

Widely available mechanisms (DNS and web servers, respectively) exist to support the use of HTTP URIs to not only globally identify resources without centralized management but also retrieve representations of information resources.

HTTP also provides substantial benefits, in terms of installed software base, scalability and, security, at low cost.

Resource Representation

Information resources can have representations. A representation is a stream of bytes in a certain format, such as HTML, JPEG, or RDF/XML. A single information resource can have many different representations (i.e., different content format, natural languages, etc.).

Dereferencing HTTP URIs

URI dereferencing is the process of looking up a URI on the Web in order to get information about the referenced resource. Since we have two types of resources (information and non-information), this is how the URIs identifying these resources are de-referenced:

  • Information Resources – A server that is used to manage URIs generates a new information resource representation and sends it back to the client using the HTTP response code 200 OK.
  • Non-Inofrmation Resources – They cannot be de-referenced directly. Instead of sending a representation of the resource, the server by using the HTTP response code 303 See Other sends the URI of an information resource which describes the non-information resource. With one more step, the client de-references this new URI and gets a representation that describes the original non-information resource.

For data publishing we can use two approaches to provide clients with URIs of information resources describing non-information resources:

  • Hash URIs
  • 303 redirects

These two methods are described in [1].

Hash URIs are better choice for small and stable sets of resources that evolve together. An ideal case are RDF Schema vocabularies and OWL ontologies. Their terms are used together and the number of terms usually does not grow much.

Hash URIs without content negotiation can be implemented by simply uploading static RDF files to a Web server, without any special server configuration. This makes them popular for quick-and-dirty RDF publication.

303 URIs are used for large sets of data that may grow and when it becomes unpractical to serve all related resources in a single document.

If in doubt, it’s better to use the more flexible 303 URI approach.

URI Types

HTTP-based URIs are mostly used as unambiguous names of non-information resources. At the same time, the same URI can be used as a document (information resource) locator. For example, if you use a URI to name a guitar, you can also provide a document, accessible via that URI (document locator), that describes (in a formal or in an informal way) the guitar the URI names.

However, using the same URI for the name of a resource and for the location of a document describing the resource creates an ambiguity. To avoid this situation, the W3C Technical Architecture Group (W3C TAG) recommends that the URI naming the non-information resource should forward (using an HTTP 303 See other status code) to a related URI for retrieving the descriptive document (information resource) about the resource.

What does this mean now? This means that for each non-information resource we need two URIs at least: one URI (Identifier URI) to name the resource and another URI (Document URI) for the location of its related descriptive document.

Each resource should have a concept in an ontology that models a specific domain that resource belongs to. It means that we also need a URI (Concept URI) to name the concept that models the resource and another URI (Ontology URI) that names the domain ontology this concept belongs to. The Ontology URI should be directly derived from the Concept URI since the Concept URI should fully contain the Ontology URI it belongs to. All this belongs to a case when a concept is fully defined in the domain ontology. However, if a concept is referenced from another domain ontology, its Ontology URI should still belong to the current domain ontology but other semantic metadata details of the concept should be extracted from its original domain ontology when needed.

There is also one more aspect of a document describing the resource. That aspect belongs to the document representations since each document can have one or more representations (e.g., Text, HTML, RDF, OWL, etc.). Each document representation needs its own URI. We call this URI type Document Representation URI.

Finally, this is a list of all URI types related to single resource and described above:

  • Identifier URI
  • Document URI
  • Document Representation URI (one per each document representation)
  • Concept URI
  • Ontology URI

URI Denoting

Two approaches are available for denoting four core different URI uses (name, concept, document location, and document representation(s)):

  • Different URI types
  • Different context

The different URI types are already explained above. The different context approach requires syntactic conventions for indicating the intended context in which the URI is referenced. Both approaches have their advantages and disadvantages. We recommend the use of the different URI types approach since it is emerging as a common approach that is also used in the Linked Data area.

Some pros and cons of both approaches described in [5] are:

Different Names
Pro: Name “shows” what a given URI identifies. Consistent meaning across languages.
Con: It requires people to agree on which of these four things a URI should indicate.

Different Context
Pro: It does not require everyone to agree on which of these four things a URI should indicate.
Con: Each Semantic Web language must have a language construct to clearly specify which of these four things is intended when a URI is written in that language.

Reference Material

1. Cool URIs for the Semantic Web
W3C Working Draft, 17 December, 2007, http://www.w3.org/TR/2007/WD-cooluris-20071217/
2. URIs and the Myth of Resource Identity
David Booth, HP Software, 2006, http://www.dbooth.org/2006/identity/
3. An Ontology of Resources for Linked Data
Harry Halpin, Valentina Presutti, 2009, http://events.linkeddata.org/ldow2009/papers/ldow2009_paper19.pdf
4. RFC 2396, http://www.ietf.org/rfc/rfc2396.txt
5. Four Uses of a URL: Name, Concept, Web Location and Document Instance
David Booth, 2003, http://www.w3.org/2002/11/dbooth-names/dbooth-names_clean.htm