Daprota M2 Modeling of MongoDB Manual References and DBRefs – Part 1/2

This series of posts provides details with examples for modeling MongoDB Manual References and DBRefs by Daprota M2 service. You can access M2 service using this link:

https://m2.daprota.com

For some data models, it is fine to model data with embedded documents (de-normalized model), but in some cases referencing documents (normalized model) is a better choice.

A referenced document can be

  • in the same collection, or
  • in a separate collection in the same database, or
  • in a separate collection in another database.

MongoDB supports two types of references:

  • Manual Reference
  • DBRef

Manual References are used to reference documents either in the same collection or in the separate collection in the same database. The parent documents are referenced via the value of their primary key’s  _id field.

Database references are references from one document to another using the value of the referenced (parent) document’s _id field, its collection name, and the database name.

In this part of the series we will look at the Manual Reference only. The second part will provide insights into DBRefs.

The Manual Reference MongoDB type indicates that the associated field references another document’s _id. The _id is a unique ID field that acts as a primary key. Manual references are simple to create and they should be used for nearly every use case where you want to store a relationship between two documents.

We will use MongoDB’s Publisher-Book example of the Referenced One-to-Many model. This model comes as a pre-created public model in M2:

Referenced One-to-Many V2 Model

Ref-One-To-Many

Publisher‘s id is of type String and it is referenced in the Book document by the publisher_id field of the type Manual reference:String. This means that the values of the publisher_id field will be referencing the values of the Publisher document _id field.

Now, we will demonstrate how we created this model in M2. We will concentrate only on the Publisher and Book collections creation and the creation of their relevant fields (_id and publisher_id) for this example.

First we will create the Referenced One-to-Many model in M2. Enter the name and description of the model and click the Create Model button to create the model as shown below:

AddModel-RefOneToMany

When the model is created, the M2 models page will be loaded and we will click the Referenced One-to-Many model link to load the model’s page:

AddModel-RefOneToMany-2

When the model page is loaded, click the Add Collection tab in order to add the Publisher collection to the model:

AddModel-RefOneToMany-2_2

Enter the Publisher name and description and click the Add Collection button to create it:

AddModel-RefOneToMany-3

Also create the Book collection.

When both collections are created

AddModel-RefOneToMany-5

we will continue with the Publisher document’s _id field creation. Click the Publisher collection link to load the Publisher collection page and then click the Publisher document link to load the Publisher document page. When the Publisher document page is loaded click the Add Field tab to add the _id field first:

AddModel-RefOneToMany-6

Click the Add Field button to add the field. When the field is added, the Publisher document page will be reloaded:

AddModel-RefOneToMany-8

Click the Full Model View to load the full model view page and then click the Book document link, as depicted below, to load the Book document page:

AddModel-RefOneToMany-9

When the Book document page is loaded, click the Add Field tab to add the publisher_id field. First we will select the Manual Reference as its type:

AddModel-RefOneToMany-10

and then we will add the String as the second part of its composite type which belongs to its values:

AddModel-RefOneToMany-11

Click the Add Field button to add the field. The document page will be reloaded when the field is added:

AddModel-RefOneToMany-12

Click the publisher_id link to load the field page and then click the Manual Reference tab to specify reference details:

AddModel-RefOneToMany-13

When the Manual Reference section is loaded, select the Publisher collection’s document and click the Reference Collection button to complete the Manual reference setup for the publisher_id field:

AddModel-RefOneToMany-14

M2 will create the manual reference and reload the Manual Reference section:

AddModel-RefOneToMany-15

Click the Referenced One-to-Many model link above to load the Referenced One-to-Many model page:

AddModel-RefOneToMany-16

The References section of the page (please see above) lists the reference that was just created. Both the Source (Parent) and Target (Child) column has the format: Collection –> Document –> Field. For example, the Target (Child) column contains the Book –> Book –> publisher_id value which means that Book is the target (child) collection and publisher_id is the field in the Book document of the Book collection whose value will reference the _id field value of the parent Collection (Publisher) document. The Database column is reserved for DBRefs only.

It is also possible that the target (child) document, in the Collection –> Document –> Field value, is not the target collection document but an embedded document (on any level) in the target collection. For example, the Role document in the User –> Role –> _id target reference value in the RBAC model below

AddModel-RefOneToMany-17

is not related to the Role collection but to the Role embedded document of the User document’s roles field:

AddModel-RefOneToMany-18

If you click the embedded Role document’s _id field link, the field page will be loaded with the full path for the _id field:

AddModel-RefOneToMany-19

Advertisements

Daprota M2 Cloud Service for MongoDB Data Modeling

The data model design is one of the key elements of the overall application design when MongoDB is used as a back-end (database management) system. While some people would still argue that data modeling is not needed with MongoDB since it is schemaless, the more you develop, deploy and manage applications using MongoDB technology, the more a need for the data model design becomes obvious. At the same time, while the format of documents in a single collection can change over time, in most cases in practice, collections are highly homogeneous. Even with the more frequent structural collection changes, the modeling tool can help you in properly documenting these changes.

Daprota just released the M2 cloud service which is the first service for the MongoDB data modeling. It enables the creation and management of data models for MongoDB.

Only a free M2 service plan is provided for now. It enables the creation and management of up to five private data models and an unlimited access to public data models provided by Daprota and M2 service users. Plan upgrades with either larger or unlimited number of private models to be managed will be available in the near future.

The current public models include Daprota models and models based on design patterns and use cases provided by MongoDB via the MongoDB website.

M2 features include:

  • Management of models and their elements (Collections, Documents, and Fields)
  • Copying and versioning of Models, Collections and Documents via related Copy utilities
  • Export/Import Models
  • Full models view in JSON format
  • Public models sharing
  • Models documentation repository
  • Messaging between M2 users

Daprota plans on adding more features to the service in the near future.

Spinoza’s Geometric Ontology

Geometric ontology is an ontology used by Baruch Spinoza  (later Benedict de Spinoza), one of the Great Philosophers, to establish and elaborate elements of the Ethics, his principal work.

The Ethics is divided into five parts. Each part begins with a set of definitions and axioms. They are followed by series of propositions and their related demonstrations. Each demonstration relies on previously introduced definitions and axioms and previously demonstrated propositions.

By using Geometric ontology Spinoza demonstrated his philosophy regarding the truth about God, nature and ourselves. The crucial message of the Spinoza’s Ethics is that our well-being is not in passions and transitory goods, nor in the religion, but rather in the life of reason.

Steven Nadler, Professor of Philosophy from the University of Wisconsin-Madison, is the author of  book Spinoza’s Ethics: An Introduction. This book is a great philosophical commentary about Spinoza’s Ethics. While reading the The Geometric Method chapter, I came up with an idea to present a short overview of Spinoza’s Geometric ontology.

The concepts of Spinoza’s Geometric ontology are:

  • Definition
  • Axiom
  • Proposition
  • Demonstration
  • Corollary
  • Scholia

GeometricOntology

Definition
A definition of a thing is such that “when it is considered alone without any others conjoined, all the thing’s properties can be deduced from it”. Definitions must be simple and basic, relative to the rest of the system. Understanding a definition must not require understanding of any other element of the system.

Axiom
Axioms are general principles about things.  Axioms sometimes require definitions. While a definition may or may not be true, an axiom must be true. Spinoza believes that the truth of an axiom should be self-evident.

Proposition
A proposition is a theorem about a basic claim. Propositions are core elements of the Spinoza’s Ethics philosophical conclusions about God, Nature and the human being.

Demonstration
Spinoza uses demonstrations to establish truth for each proposition. When a proposition is demonstrated it is used as a premise in the demonstrations of the subsequent propositions. Some propositions are also followed by corollaries and scholias.

Corollary
A corollary is a theorem related to a proposition. Each corollary has a respective demonstration.

Scholia
A scholia is an informal discussion in which Spinoza explains particular themes.

MongoDB Data Models

When creating MongoDB data models, besides knowing internal details of how MongoDB database engine works, there are few other factors that should be considered first:

  • How your data will grow and change over time?
  • What is the read/write ratio?
  • What kinds of queries your application will perform?
  • Are there any concurrency related constrains you should look at?

These factors very much affect what type of model you should create. There are several types of MongoDB models you can create:

  • Embedding Model
  • Referencing Model
  • Hybrid Model that combines embedding and referencing models.

There are also other factors that can affect your decision regarding the type of the model that will be created. These are mostly operational factors and they are documented at Data Modeling Considerations for MongoDB Applications

The key question is:

  • should you embed related objects within one another or
  • should you reference them by their identifier (ID)?

You will need to consider performance, complexity and flexibility of your solution in order to come up with the most appropriate model.

Embedding Model (De-normalization)

Embedding model enables de-normalization of data what means that two or more related pieces of data will be stored in a single document. Generally embedding provides better read operation performance since data can be retrieved in a single database operation. In other words, embedding supports locality. If you application frequently access related data objects the best performance can be achieved by putting them in a single document which is supported by the embedding model.

MongoDB provides atomic operations on a single document only. If fields of a document have to be modified together all of them have to be embedded in a single document in order to guarantee atomicity. MongoDB does not support multi-document transactions. Distributed transactions and distributed join operations are two main challenges associated with distributed database design. By not supporting these features MongoDB has been able to implement highly scalable and efficient atomic sharding solution.

Embedding has also its disadvantages. If we keep embedding related data in documents or constantly updating this data it may cause the document size to grow after the document creation. This can lead to data fragmentation. At the same time the size limit for documents in MongoDB is determined by the maximum BSON document size (BSON doc size) which is 16 MB. For larger documents, you have to consider using GridFS.

On the other hand, if documents are large the fewer documents can fit in RAM and the more likely the server will have to page fault to retrieve documents. The page faults lead to random disk I/O that can significantly slow down the system.

Referencing Model (Normalization)

Referencing model enables normalization of data by storing references between two documents to indicate a relationship between the data stored in each document. Generally referencing models should be used when embedding would result in extensive data duplication and/or data fragmentation (for increased data storage usage that can also lead to reaching maximum document size) with minimal performance advantages or with even negative performance implications; to increase flexibility in performing queries if your application queries data in many different ways, or if you do not know in advance the patterns in which data may be queried; to enable many-to-many relationships; to model large hierarchical data sets (e.g., tree structures)

Using referencing requires more roundtrips to the server.

Hybrid Model

Hybrid model is a combination of embedding and referencing model. It is usually used when neither embedding or referencing model is the best choice but their combination makes the most balanced model.

Polymorphic Schemas

MongoDB does not enforce a common structure for all documents in a collection. While it is possible (but generally not recommended) documents in a MongoDB collection can have different structures.

However our applications evolve over time so that we have to update the document structure for the MongoDB collections used in applications. This means that at some point documents related to the same collection can have different structures and the application has to take care of it. Meanwhile you can fully migrate the collection to the latest document structure what will enable the same application code to manage the collection.

You should also keep in mind that the MongoDB’s lack of schema enforcement requires the document structure details to be stored on a per-document basis what increases storage usage. Especially you should use a reasonable length for the document’s field names since the field names can add up to the overall storage used for the collection.

MongoDB Indexes

MongoDB indexes are based on B-tree data structure. Indexes are important elements in maintaining MongoDB performance if they are properly used. On the other hand, indexes have associated costs that include memory usage, disk usage and slower updates. MongoDB provides explain plan capability and database profiler utility to collect data about database operations that can be used for database tuning.

Memory

Ideally the entire index should be resident in RAM. If the number of distinct values for index is high, we have to ensure that index fits in RAM. Otherwise performance will be impacted.

The parts of index related to the recently inserted data will always be in active RAM. If you query on recent data, MongoDB index will perform well and MongoDB will use less memory. For example, this could be a case when index is based on a time/date field.

Compound Index

Besides single field indexes you can also create compound indexes containing more than one field. The order of fields in a compound index can significantly impact performance. It will improve performance if you place more selective element of your query first in compound index. At the same time, your other queries may be impacted by this choice. This is an example that shows that you have to analyze your entire application in order to make appropriate design decisions regarding indexes.

Each index is stored in a sorted order on all fields in the index and these rules should be followed to provide efficient indexing:

  • Fields that will be queried by equality should be the first fields in the index.
  • The next should be fields used to sort query results. If sorting is based on multiple fields they should occur in the same order in the index definition
  • The last filed in the index definition should be the one queried by range.

It is also good to know that an additional benefit of a compound index is that a leading field within the index can also be used. So if we query with a condition on a single field that is a leading field of an index, the index will be used.

On the other hand an index will be less efficient if we do not range and sort on a same set of fields.

MongoDB provides hint() method to force use of a specific index.

Unique Index

You can create a unique index that will enable uniqueness of the index field value. A compound index can also be specified as unique in which case each combination of index field values has to be unique.

Fields with No Values and Sparse Index

If an index field does not have a value then the index entry with value null will be created. Only one document can have a null value for an index field unless the sparse option is specified for the index in which case the index entries are not created for documents that do not have the field.

You should be aware that using a sparse index will sometime produce an incomplete result when index-based operations (e.g., sorting, filtering, etc.) are used.

Geospatial Index

MongoDB provides geospatial indexes that are used to optimize queries including locations within a two-dimensional space. When dealing with locations documents must have a field with a two-element array (latitude and longitude) to be indexed with a geospatial index.

Array Index

Fields that are arrays can be also indexed in which case each array value is stored as a separate index entry.

Create Index Operation

Creation of an index can be either a foreground or background operation. Foreground operation intensively consume resources and require lots of time in some cases. They are blocking operations in MongoDB. When indexes are created via a background operation more time is needed to create an index but database is not blocked and can be used while index is being created.

AND and OR Query Tips

If you know that a certain criteria in a query will be matching less documents and if this criteria is indexed, make sure that this criteria goes first in your query. This will enable a selection of a smallest number of documents needed to retrieve the data.

OR-style queries are opposite of AND queries. The most inclusive clauses (returning the largest number of documents) should go first since MongoDB has to check documents that are not part of the result set yet for every match.

Useful Information

  •  MongoDB optimizer generally uses one index at a time. If more than one predicate is used in a query then a compound index should be created based on rules previously described.
  • The maximum length of index name is 128 characters and an index entry size Cannot exceed 1,024 bytes.
  • Index maintenance during the add, remove or update operations will slow down these operations. If your application performs heavy updates you should carefully select indexes.
  • Ideally indexes should reduce a set of possible documents to select from so it is important to create high selectivity indexes. For example, an index based on a phone number is more selective than an index based on a ‘yes/no’ flag.
  • Indexes are not efficient in inequality queries.
  • When regular expressions are used leading wildcards will downgrade query performance because indexes are ordered.
  • Indexes are generally useful when we are retrieving a small subset of the total data. They usually stop being useful when we return half of the data or more in a collection.
  • A query that returns only a few fields should be fully covered by an index.
  • Whenever it is possible, create a compound index that can be used by mutiple queries.

Tomcat Security Realm with MongoDB

1. User and Role Document Model

Daprota User Model

2. web.xml

 We have two roles, ContentOwner and ServerAdmin. This is how we set up form-based authentication in web.xml:

  …
 <security-constraint>
     <web-resource-collection>
         <url-pattern>/*</url-pattern>
     </web-resource-collection>
     <auth-constraint>
         <role-name>ServerAdmin</role-name>
         <role-name>ContentOwner</role-name>
     </auth-constraint>
 </security-constraint>

 <login-config>
     <auth-method>FORM</auth-method>
     <realm-name> MongoDBRealm</realm-name>
     <form-login-config>
         <form-login-page>/login.jsp</form-login-page>
         <form-error-page>/login_error.jsp</form-error-page>
     </form-login-config>
 </login-config>

 <!-- Security roles referenced by this web application -->
 <security-role>
     <role-name>ServerAdmin</role-name>
 </security-role>
 <security-role>
 <role-name>ContentOwner</role-name>
 </security-role>

3. Create passwords for admin and test users

($CATALINA_HOME/bin/digest.[bat|sh] -a {algorithm} {cleartext-password})

os-prompt> digest -a SHA-256 manager
manager: 6ee4a469cd4e91053847f5d3fcb61dbcc91e8f0ef10be7748da4c4a1ba382d17

os-prompt> digest -a SHA-256 testpwd
testpwd:a85b6a20813c31a8b1b3f3618da796271c9aa293b3f809873053b21aec501087

Execute this JavaScript  code in MongoDB JS shell:

use mydb
usr = { userName: 'admin',
        password: '1a8565a9dc72048ba03b4156be3e569f22771f23',
        roles: [ { _id: ObjectId(),
                   name: 'ServerAdmin'}
               ]
}
db.user.insert(usr);
usr = { userName: 'test',
        password: '05ec834345cbcf1b86f634f11fd79752bf3b01f3',
        roles: [ { _id: ObjectId(),
                   name: 'ContentOwner'}
               ]
}
db.user.insert(usr);
db.user.find().pretty();

role = { name: 'ServerAdmin',
         description: 'Server administrator role'
}
db.role.insert(role);
role = { name: 'ContentOwner',
         description: 'End-user (client) role'
}
db.role.insert(role);
db.role.find().pretty();

mydb is a MongoDB database name we use in this example.

4. Realm element setup

Set up Realm element, as showed below, in your $CATALINA_HOME/conf/server.xml file:

      <Host name="localhost"  appBase="webapps"
            unpackWARs="true" autoDeploy="true">
        ...
        <Realm className="com.daprota.m2.realm.MongoDBRealm"
               connectionURL="mongodb://localhost:27017/mydb"
               digest="SHA-256"/>
      </Host>
 </Engine>

5. How to encrypt user’s password

The following Java code snippet is an example of how to encrypt a user’s password:

String password =  "password";

MessageDigest messageDigest = java.security.MessageDigest.getInstance("SHA-256");        
messageDigest.update(password.getBytes());            
byte byteData[] = messageDigest.digest();
//Convert byte data to hex format 
StringBuffer hexString = new StringBuffer();
for (int i = 0; i < byteData.length; i++) {            
    String hex=Integer.toHexString(0xff & byteData[i]);                
    if (hex.length()==1) 
        hexString.append('0');                
    hexString.append(hex);                
}

When you store password in MongoDB, store it via hexString.toString().

6. MongoDB realm source code

The source code of the Tomcat security realm implementation with MongoDB and ready to use m2-mongodb-realm.jar are available at

https://github.com/gzugic/mongortom

You just need to copy m2-mongodb-realm.jar to your $CATALINA_HOME/lib.

Semantic Wikis

Semantic wikis combine wikis that enable simple and quick collaborative text editing over the Web and Semantic Web that enriches the data on the Web with well-defined meaning to provide easier way to find, share, and combine information. Semantic wiki extends a classical wiki by integrating it with the management capabilities for the formal knowledge representations. The long-term goal of semantic wiki should be to provide well structured knowledge representations and sound reasoning based on these representations in a user friendly way.

We can classify semantic wikis into two groups [5]:

  • Text-centered
  • Logic-centered

The text-centered semantic wikis enrich classical wiki environments with semantic annotations relating the textual content to a formal ontology. These wikis are text oriented. The goal of these semantic wikis is not to manage ontologies but to provide a formal backbone to wiki articles. Semantic MediaWiki, KiWi (continuation of IkeWiki), and KnowWE are examples of this semantic wiki type.

The logic-centered semantic wikis are designed and used as ontology engineering platforms. AceWiki and OntoWiki are examples of this semantic wiki type.

There are other applications that also exhibit semantic wiki characteristics (e.g. Freebase, Twine, Knoodl, and others).

Text-Centered Wikis

We will describe some key features of three text-centered wikis: Semantic MediaWiki, KiWi, and KnowWE.

Semantic MediaWiki (SMW)

SMW (Fig. 1) [7] (http://semantic-mediawiki.org/wiki/Semantic_MediaWiki) is an extension to MediaWiki that enables wiki users to semantically annotate wiki pages, based on which the wiki contents can be browsed, searched, and reused in novel ways. RDF and OWL are used in the background to formally annotate information in wiki pages.

The integration between MediaWiki and SMW is based on MediaWiki’ extentsion mechanism: SMW registers for certain events or requests, and MediaWiki calls SMW functions when needed.

SMW organizes content within wiki pages. These pages are further classified into namespaces which differentiate pages according to their function. The namespaces are defined through the wiki configuration. They cannot be defined by users. Examples of namespace are: “User:” for user home pages, “Help:” for documentation pages, etc. Every page belongs to an ontological element (including classes and properties) that can be further described by annotations on that page. The semantic roles that wiki pages can play are distinguished by the namespaces. The wiki pages can be:

  • Individual elements of a domain of interest
  • Categories (used to classify individual elements and to create sub-categories)
  • Properties (relationships between two pages or a page and a data value)
  • Types (used to distinguish various kinds of properties)

Each page in SMW can be assigned to one or more categories where each category is associated with a page in the “Category:” namespace. Category pages can be used to browse the classified pages and to organize categories hierarchically.

SMW collects information about the concept represented by a wiki page, not about the associated text. SMW collect semantics data via semantic annotations (markups) added to the wiki text by users. The markups processing is done by the SMW components for parsing and rendering (Fig.1).

Figure 1: Semantic MediaWiki Architecture (copied from [7])

The underlying SMW semantic conceptual framework based on properties and types is the core component of the SMW’s semantic processing. Properties are used to specify relationships between one entity (as represented by a wiki page) and other entities and data values. SMW lets wiki users control the set of available properties since each community is interested in different types of relationships in its domain of interest. Properties are used to augment a wiki page content in a structured way. SMW characterises hyperlinks between wiki pages as properties (relationships) where the link’s target becomes the value of a user-provided property. It does not mean that all properties take links’ targets as their values. The properties’ values could also be in a form of geographical coordinates, numeric values, dates, etc.

SMW also provides use of a special type of a wiki page just for properties. For example, a wiki might contain a page “Property:Population” where “Property:” is a namespace prefix. A property page can contain textual description of the page, data type for property values, etc. SMW provides a number of data types that can be used with properties (e.g. “String”, “Date”, “Geographic coordinate”, etc.).

Semantic annotations of a subject described by a wiki page are mapped to OWL DL [8] ontology language. Most annotations are mapped to OWL statements similar to RDF triples: wiki pages to abstract individuals, properties to OWL properties, categories to OWL classes, and property values to either abstract individuals or typed literals. Since OWL further distinguishes object properties, data properties and annotation properties, SMW properties can map to any of those depending on their type. SMW also provides built-in properties that may have a special semantic meaning. SMW can also be configured to interpret MediaWiki hierarchial organisation of categories as an OWL class hierarchy.

SMW is not intended as a general purpose ontology platform and because of that the semantic information representable in SMW is of limited scope.

SMW has a proprietary query language which syntax is closely related to wiki text and its semantics corresponds to curtain class expressions in OWL DL.

Generally speaking queries have one of the highest performance cost in any system. It is the same with SMW.  SMW has features and best practice recommendations to help users manage query performances: caching mechanism, SMW parameters to restrict query processing and complexity, limited size of query constructs, individual reasoning features disabled, etc. SMW uses two separate data stores, one for MediaWiki pages and another one for semantic data related to subjects (concepts) described by these pages. Both stores are based on MySQL database. However, the MySQL semantic data store can be replaced by faster data stores if they are available.

KiWi

KiWi (Knowledge in a Wiki) [6] (continuation of IkeWiki) (http://www.kiwi-project.eu) aims at providing a collaborative knowledge management based on semantic wikis. It augments existing informal articles (e.g. from Wikipedia) with formal annotations.

KiWi provides a platform for building different kinds of social semantic software powered by Semantic Web technologies. It enables content versatility what is the reuse of the same content in different kinds of social software applications.

In KiWi, every piece of information is a combination of human-readable content and associated metadata. The same piece of information can be presented to the user in many different forms such as a wiki page, a blog post, a comment to a blog, etc.. The display of the information is determined by the metadata of the content and a context in which the content is used (e.g., user preferences, device used, type of application, etc.). Since metadata in KiWi is represented by RDF it does not require a-priori schema definitions and the meta-model of a system can be extended in real-time.

“Content item” is the smallest piece of information in KiWi. It consits of human readable content in XHTML, associated metadata in RDF and it is identified by a URI. KiWi creates, stores, updates, versions, searches, and queries content items. The core properties of a content item (e.g. content, author, and creation date) are represented in XML and persisted in a relational database, all other properties can be defined using RDF properties and relations.  It is possible to make a KiWi system part of the Linked Open Data cloud based on the way how the content item’s URI is generated.

With regards to search, KiWi supports a combination of full-text search, metadata search (tags, types, persons), and database search (date, title).

The KiWi (Fig. 2) platform is structured into layers: Model Layer,Service Layer, Controller Layer, View Layer.

Figure 2: KiWi Architecture (copied from [6])

The Model Layer manages content and its related metadata. Itis implemented via a relational database, a triple store, and a full-text index. Entities are persisted (in a relational database) using the Hibernate framework7. The KiWi triple (RDF) store is an implementation based on the relational database. The full-text index is implemented using Hibernate Search.

The Service Layer provides services to upper layers. For example, the EntityManager service provides a unified access to content items in an entity database while the TripleStoreService provides a unified access to an RDF store.

The controller layer includes action components that implement a specific Kiwi functionality. Action components mostly implement functionalities offered in the user interface (e.g. view, edit, annotate, etc.). They use service components to access the Kiwi content and metadata.

The view layer enables user interactions with Kiwi via browser and it also offers web services for accessing the triple store and SKOS thesauruses. There is also a linked open data service that provides the KiWi content to linked open data clients.

KiWi core data model includes three concepts: Content Item, Tag, and Triple. Additional functionality can be added by KiWi Facades. The Content Item is a core concept in KiWi. It represents a “unit of information”. A user always interacting with a primary Content Item. Content Items could be any type of content (e.g. wiki page, user profile, rule definition, etc.). KiWi is not restricted to specific content formats. All Content Items are identified by Uniform Resource Identifiers (URIs). The textual or media content that belongs to a resource is for human consumption. Generally speaking each resource has both machine readable and human readable form (description). Machine readable content (semantic data) is represented as RDF and stored in a triple (RDF) store. Human readable (text) content is internally structured as an XML document that can be queried and transformed to orther representations like HTML, XSL-FO (for PDF and other printable formats). Tags are used to annotate Content Items. For example, they can be used to associate a Content Item with a specific topic or to group Content Items in knowledge spaces. Tags are mapped to an RDF structure. Machine readable metadata is stored in a form of extended Triples that contains additional information related to internal maintenance (e.g. versioning, transactions, associations between triples and other resources in KiWi, etc.).

KnowWE

KnowWE (http://www.is.informatik.uni-wuerzburg.de/en/research/applications/knowwe/ ) [3,4] is a knowledge wiki. In a semantic wiki every wiki page represents a distinct concept from a specific domain of interest. Knowledge wikis further represent a possible solution with every wiki page. On every page the content is described by semantically annotated text and/or by multimedia content (e.g., pictures, diagrams, etc.). The embedded knowledge can be used to derive the concept related to the particular wiki article.

In KnowWE the knowledge base is entered together with the standard text by using appropriate textual knowledge markups. When a wiki page is saved, the included markups are extracted and compiled into an executable knowledge base corresponding to the wiki page and stored in a knowledge base repository.

In KnowWE, a user is able to start an interactive interview by entering a problem description. When the user enters his/her inputs, an appropriate list of solutions is presented. These solutions are linked to the wiki pages representing the presented solution concepts. Every solution represented in the wiki is considered during the problem-solving process.

KnowWE uses a problem-solving ontology as an upper ontology of all concepts defined in an application project. All concepts and properties are inherited by concepts and properties of the upper ontology.

A new solution is added to KnowWE by creating a new wiki page having the solution’s name. The wiki page contains human readable text and explicit knowledge for deriving the new solution.

KnowWE is intended for small closed communities and semi-open medium-sized communities. It is based on the implementation of JSPWiki. Its parsing engines and problem-solvers are based on the d3web project [9].

Logic-Centered Wikis

AceWiki and OntoWiki will be presented below as examples of logic-centered wikis.

AceWiki

AceWiki (http://attempto.ifi.uzh.ch/acewiki/) uses the controlled natural language Attempto Controlled English (ACE) for representing its content. The logic-centered (ontology engineering) semantic wikis’ goal is to make acquisition, maintenance, and analysis of formal knowledge simpler and faster. According to the AceWiki team, most of the existing semantic wikis “have a very technical interface and are restricted to a relatively low level of expressivity” [5]. AceWiki provides two advantages: first, improve usability and achieve a shallow learning curve since the controlled natural language is used; second, ACE is more expressive than existing semantic formal languages (e.g. RDF, OWL, etc.).

The use of a controlled natural language allows ordinary users who are not experts in ontologies and formal semantic languages to create, understand, and modify formal wiki content. The main goals of AceWiki are to improve knowledge aggregation, knowledge representation, and to support a higher degree of expressivity. The key design principles of AceWiki are: naturalness (formal semantics is in a form of a natural language), uniformity (only one language is used at the user interface layer), and strict user guidance (a predictive editor enables well-formed statements created by users).

All other semantic wikis used as ontology engineering platforms (e.g. OntoWiki and others) allow creation of formal statements (annotations, metadata) that are not considered the main content but rather the enrichment of it which is not the case when the controlled natural language is used.

ACE looks like English but it is fully controlled without ambiguities of natural language. Every ACE text has a well-defined formal meaning since the ACE parser translates the ACE text into Discourse Representation Structures which are a syntactical variant of the  first-order logic. Semantic Web generally classifies semantic languages into three main high-level categories: ontology languages (e.g. OWL), rule languages (e.g. SWRL, RIF), and query languages (e.g. SPARQL). ACE plays role of all these languages. AceWiki translates ACE sentences into OWL what enables reasoning with existing OWL reasoners.

OntoWiki

OntoWiki [1] (http://ontowiki.net/Projects/OntoWiki) provides support for distributed knowledge engineering scenarios. OntoWiki does not mix text editing with knowledge engineering. It applies the wiki paradigm to knowledge engineering only. OntoWiki facilitates the visual presentation of a knowledge base as an information map, with different views on instance data. It enables authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWIG for text documents. OntoWiki regards RDF-based knowledge bases as “information maps”. Each node (represented by an OntoWiki Web page) in the information map is interlinked with related digital resources. The details of the OntoWiki Web page structure are provided in [1].

OntoWiki supports collaboration aspects through tracking changes, allowing comments and discussions on every part of a knowledge base, enabling to rate and measure the popularity of content and honoring the activity of users.

The main goal of the OntoWiki approach is to rapidly simplify the presentation and acquisition of knowledge.

OntoWiki pages that represent nodes in the information map, are divided into three sections: a left sidebar, a main content section and a right sidebar. The left sidebar offers selections that include knowledge bases, a class hierarchy, and a full-text search. When a selection is made, the main content section will show matching content in a list view linking to individual views for matching nodes. The right sidebar provides tools and complementary information specific to the selected content.

OntoWiki provides reusable user interface components for data editing, called widgets.  These are some of the provided widgets: Statements to edit subjects, predicates, and objects, Nodes to edit literals or resources, File to upload files, etc.

Solical collaborations within OntoWiki is one of its main characteristics. According to OntoWiki this eases the exchange of meta-information about the knowledge base and promotes collaboration scenarios where face-to-face communication is hard. This also contributes in creating  an ”architecture of participation” that enables users to add value to the system as they use it. The main social collaboration features of OntoWiki are:

  • Change tracking – All changes applied to a knowledge base are tracked.
  • Commenting –  Statements presented to the user can be annotated, and commented.
  • Rating –  OntoWiki allows to rate instances.
  • Popularity – All knowledge base accesses are logged what allows to arrange views on the content based on popularity.
  • Activity/Provenance – OntoWiki keeps track of what was contributed and by whom.

OntoWiki has been implemented as an alternative user interface for the schema editor in pOWL [2] that is a platform for Semantic Web application development.

OntoWiki is designed to work with knowledge bases of arbitrary size. OntoWiki loads only those parts of the knowledge base into main memory which are required to display the requested information.

References

1. Auer, S., Dietzold, S., Lehmann, J., Riechert, T.: OntoWiki: A Tool for Social, Semantic Collaboration. CKC, 2007. http://www2007.org/workshops/paper_91.pdf
2. Auer, S.: pOWL – A Web Based Platform for Collaborative Semantic Web Development. http://powl.sourceforge.net/overview.php
3. Baumeister, J., Reutelshoefer, J., Puppe, F.: KnowWE: A Semantic Wiki for Knowledge Engineering. In: Applied Intelligence (2010)
4. Baumeister, J.; Puppe, F.: Web-based Knowledge Engineering using Knowledge Wikis. Proc. of the AAAI 2008 Spring Symposium on “Symbiotic Relationships between Semantic Web and Knowledge Engineering”, pp. 1-13, Stanford University, USA, 2008. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.143.955&rep=rep1&type=pdf
5. Kuhn, T.: How Controlled English can Improve Semantic Wikis. Proceedings of the Fourth Workshop on Semantic Wikis. European Semantic Web Conference 2009, CEUR Workshop Proceedings, 2009. http://attempto.ifi.uzh.ch/site/pubs/papers/semwiki2009_kuhn.pdf
6. Schaffert, S., Eder, J., Grünwald, S., Kurz, T., Radulescu, M., Sint, R., Stroka, S.: KiWi – A Platform for Semantic Social Software. In: 4th Workshop on Semantic Wikis (SemWiki2009) at ESWC09, Heraklion, Greece, June 2009.
7. Völkel, M., Krötzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic Wikipedia. In Journal of Web Semantics 5/2007, pp. 251–261. Elsevier 2007. http://korrekt.org/page/Semantic_Wikipedia_(JWS2007)
8. W3C: OWL, http://www.w3.org/standards/techs/owl#w3c_all
9.d3web, http://sourceforge.net/apps/mediawiki/d3web/index.php?title=Main_Page