MEAN and Full-Stack Development

JavaScript followed by Node.js enables a single language use across all application layers. Before this change emerged few years ago, we had fragmented technologies and separated teams of designers and developers working in these fragmented technology fields in order to build applications. The “JavaScript everywhere” enabled appearance of the full-stack frameworks that bring common modules from different technology layers together in order to build software in a fast and more agile way making it more efficient solution for frequently changing and highly scalable systems.

Full-stack development is about developing all parts of the application by using a single framework. It includes back-end which belongs to the database, middleware where the application logic and control reside and the last but not the least part is the user interface.

MEAN is a JavaScript and Node.js full-stack framework comprised of four main technologies:

You will need some time to learn all technologies involved in MEAN but it will be rewarding and professionally exiting. At the same time, a single language, JavaScript, is used through the framework and all parts of the application can user and/or enforce Model-View-Controller (MVC) pattern. MVC is fully data oriented. Model holds data, controller processes data and view renders data. Data marshaling is done using JSON so that the serialization and deserialization of data strictures are not needed.

The big advantage of the full-stack framework is that it has a holistic approach that looks at the system as a whole with all its components that exist on their own. In order to function together as the whole the system components have some interdependencies that need to be considered as well. These  interdependences should be minimized in order to properly support decoupling between the system components which is one of the most important aspects of the overall architecture of the system.

The framework is modular what means if tomorrow some of the components become obsolete it can be replaced with the new component. It would require some changes with some of the dependable components but they should be minimal.

The full-stack approach gives you better overall control since it helps the different parts work seamlessly together since they are built by a single developer or a small team of developers. This also supports microservices way of service design and implementation especially for systems that change frequently and/or have to be web scalable. The disposable services are the way to go in these kind of environments.

This post contains overview of MEAN applications, MEAN technologies, and MEAN architectural patterns.

If you are interested in additional details about the MEAN architectural patterns, Getting MEAN with Mongo, Express, Angular and Node book authored by Simon Holmes is a good source of information.

MEAN Applications

There are two types of MEAN applications:

  • Server Applications
  • Single Page Applications (SPA)

With a server application, each user request is routed through Express. Express finds out from its routes which controller will handle the request. This is the same process for each user request. This application type supports one-way data binding.  Node.js gets the data from MongoDB, and Express then compiles this data into HTML via provided templates and finally the HTML is delivered to the server. This implies that most of the processing is done on the server and browser just renders HTML and runs JavaScript if it is provided for interactivity.

MEAN-Architecture-NodeExpress_NEA

With SPA, the application logic is moved to the front-end away from the server and that is why it is called Single Page Application (SPA). The mostly used JavaScript frameworks for SPAs are AngularJS, Backbone and Ember. MEAN uses AngularJS. While this approach has its pros and cons, it is obvious that moving the application processing from the host (server) to the users’ browsers will lower the load on the server and network and bring the cost down. In some cases it will also improve the performance of the application. A browser sends an initial user’s request to the server and server returns AngularJS application with requested data. The subsequent user’s requests are processed most of the time by the AngularJS application running in the browser while data goes back and forth between the browser and server. SPA also supports two-way data binding where the template and data are sent independently to the browser. The browser compiles the template into a view and the data into a model. The view is “live” since it is bound to the model. If the model changes the view changes as well and if the view changes then the model also changes.

MEAN-Architecture-Angular_SPA

MEAN.IO and MEAN.JS are full-stack frameworks for developing MEAN-based applications.

 

MEAN Technologies

MEAN includes five main technologies:

  • MongoDB database and Mongoose object data modeling (ODM) tool
  • Express middleware
  • AngularJS front-end
  • Node.js server platform

MongoDB is a NoSQL document-based database management system which data model includes:

  • Collections
  • Documents
  • Fields
  • References

M2_Model

A Collection is a top model element. Each model can have one or more collections. Collections are analogous to tables in a relational database. Each collection contains documents that are analogous to records in the relational database. Collections model one or more concepts (e.g., account, user, order, publisher, book, etc.) the data is based on.

Documents are JSON-like data structures containing fields that have values of different types (e.g., String, Date,  Number, Boolean, etc.). A value can also belong to another document or an array of documents embedded in a document. Documents can have different structures in a collection. However, in most cases in practice, collections are highly homogeneous.

Fields are analogous to columns in the relational database. The field/value pairs (better known as key/value pairs) construct document’s structure.

MongoDB resolves relationships between documents by either embedding related documents or referencing related documents.

Mongoose is a MongoDB object data modeling (ODM) tool designed to work in an asynchronous environment. Besides the data modeling in Node.js, Mongoose also provides a layer of CRUD features on top of MongoDB. It also makes it easier to manage connections to MongoDB databases and perform data validations.

Express is a middleware framework for Node.js that abstracts away some common web server functionalities. Some of these functionalities include session management, routing, templating, and others.

Node.js is a foundation of the MEAN stack. Node.js is not a language. It is a software platform based on JavaScript. You will use it to build your own web server and applications that will run on top of it. Node.js applications when codded correctly are fast and they efficiently use system resources. This is supported by the core Node.js feature that it is single-threaded and executes a non-blocking event loop.

The web server running on Node.js is different from traditional multi-threaded web servers (e.g., Apache, IIS, etc.). The multi-threaded servers create new thread for each new user session and allocates memory and other computing resources for it. During the peak periods when many users access the server concurrently its resources can get exhausted in which case the system could halt its operations until the load decreases and/or more machines and resources are added. The precocious approach that many systems take is to often overpower the servers even if they do not need so much resources most of the time.  This definitely increases the cost of system operations. When Node.js is used, rather than giving each user a separate thread and pool of resources, each user joins the same thread and the interaction between the user and thread exist only when it is needed. In order to ensure that this approach works Node.js supports non-blocking by making blocking operations run asynchronously.

While you can use Node.js, Express and MongoDB to build data-driven applications, the use of AngularJS will bring more sophisticated features to the interactivity element of the MVC architectural pattern supported by MEAN. AngularJS puts HTML together based on provided data. It also supports two-way data binding by immediately updating the HTML based on changed data and also by updating the data if HTML changes.

 

MEAN Architectural Patterns

When you create MEAN-based applications, you can choose any of the architectural patterns or a combination of the architectural patterns (hybrid architectural patterns) listed here.

MEAN architectural patterns are based on the Model-View-Controller (MVC) pattern.

The MVC pattern is data oriented. Model holds data, Controller processes data and view renders data. There is also a route component between the controller and users’ browsers (Web). The route component coordinates interactions with the controller.

MVC.png

A common way to architect MEAN stack is to have a REST interface feeding a single page application (SPA).  REST interface is implemented via REST API that is built with MongoDB, Node.js and Express and SPA is built with AngularJS that runs in browser.

REST API creates a stateless interface to your database. It enables other applications to work with your data. There is also one more important technology component, Mongoose, that is a liaison between the controller and MongoDB.

Mongo-Mongoose-Express-Angular-Communications.png

MongoDB communicates with Mongoose only. Mongoose communicates with Node.js and Express and AngularJS communicates with Express only.

The REST API is a common architectural element used in all MEAN architectural patterns.

The following architectural patterns are enabled by the MEAN framework:

  • Node.js and Express Application (NEA)
  • Node.js and Express application with AngularJS addition for better interactivity (NEA2)
  • AngularJS Single Page Application (SPA)
  • Hybrid Patterns:
    • NEA and SPA
    • NEA2 and SPA

Node.js and Express application (NEA)

HTML and content are directly delivered from the server. The HTML content requires data that is delivered via REST API. REST API is developed with Node.js, Express, Mongoose and MongoDB.

MEAN-Architecture-NodeExpress_NEA

Node.js and Express Application with AngularJS Addition for Better Interactivity (NEA2)

If you need a richer interactive experience for your users, you can add AngularJS to your pages.

MEAN-Architecture-NodeExpressAngular_NEA2

AngularJS Single Page Application (SPA)

In order to implement Single Page Applications, AngularJS is needed.

MEAN-Architecture-Angular_SPA

Hybrid Patterns

The three above listed architectural patterns can also be combined into hybrid architectural patterns. The two most common combinations are:

  • NEA and SPA
  • NEA2 and SPA

NEA and SPA

This pattern is for the applications that require combination of application constraints that are best supported by both NEA and SPA. For example, NEA best supported application constraints include: short duration of user interactions, low interactions, content rich, etc. SPA best supported application constraints include: feature-rich, highly interactive, long duration of user interactions, private, fast response, etc..

MEAN-Architecture-NodeExpress_NEA-SPA.png

NEA2 and SPA

Finally, NEA2 and SPA is like NEA and SPA with a bit richer interactivity on the server side (NEA2) via AngularJS addition.

MEAN-Architecture-NodeExpress_NEA2-SPA.png

 

Java Code Generation from MongoDB Data Models Created in Daprota M2

Daprota just released a new version of its MongoDB data modeling service M2 that now provides a code generator for Java and JSON from MongoDB data models created by it.

The generated code includes persistence APIs based on MongoDB Java driver, NoSQLUnit tests, and test data in JSON format.

When code is generated you can download it and then use Apache Maven to build the software and run tests via single Maven command (mvn install).

You can save a significant amount of time and effort by creating MongoDB data models in M2 and then generate Java code via just a single click. The quality of your code will also be improved. It will be unit tested and all this will be done for you by M2 in a fully automated fashion.

This kind of service can also be very useful for a quick creation of disposable schemas (data models) in an agile environment when you want to quickly create schemas, generate Java persistence code from it, immediately test it, and repeat this procedure starting over with the current schema update or a completely new schema creation.

As soon as you become familiar with data models creation in M2, which is very intuitive, the speed of the software creation and build from it will be instantaneous.

All your models are fully managed in M2 where you can also store your models’ documentation in M2 repository or you can provide links for external documentations.

Daprota has documented its MongoDB Data Modeling Advisor  which you can access to find out more about MongoDB schema design (data modeling) patterns and best practices.

M2 is a free service including the Java code generator.

Data Modeling Adviser for MongoDB

Daprota just published Data Modeling Adviser guide for MongoDB. It documents

  • MongoDB data modeling basics;
  • Key considerations with data modeling for MongoDB;
  • Key properties of MongoDB data model types (embedded, referenced, hybrid);
  • Data design optimization patterns for these model types.

This guide is a work in progress and it will be regularly updated especially with new optimization patterns.

Its content is very much based on the following sources:

All model samples in the guide are created by the Daprota M2 service and they can be accessed via provided links.

Selecting a MongoDB Shard Key

Scalability is an important non-functional requirement for systems using MongoDB as a back-end component.  A system has to be able to scale when a single server running MongoDB cannot handle a large dataset size and/or high level of data processing. In general, there are two standard approaches addressing scalability:

  • Vertical scaling
  • Horizontal scaling

Vertical scaling is achieved by adding more computing resources (CPU, memory, storage) to a single machine. This is considered to be an expensive option. At the same time computing resources on a single machine have physical limitations.

Horizontal scaling achieves scalability by horizontally extending the system by adding commodity machines in order to distribute data processing across all these machines. This option is considered to be less expensive and in most cases meets computing needs for physical resources without limitations. MongoDB supports horizontal scaling by implementing sharding (data partitioning) across all machines (shards) clustered in a MongoDB cluster.

The success of the MongoDB sharding depends on a selected shard key that is used to partition data across multiple shards. MongoDB distributes data by a shard key at the collection level.

You should carefully amalyze all options before selecting the shard key since it can significantly affect your system performance and it cannot be changed after data is inserted in MongoDB.  Shard keys cannot be arrays and you cannot shard on a geospatial index. When selecting a shard key, you should also keep in mind that its values cannot be updated. However if you still have to change the value, you will have to remove the document first, change the key value, and reinsert it.

MongoDB divides data into chunks based on values of a shard key and distribute them evenly across the shards.

Sharding

Usually you will not shard all collections but only collections that need data to be distributed over shards to improve read and/or write performance. All un-sharded collections will be held in only one shard that is called primary shard (e.g., Shard A in the picture above). The primary shard can also contain sharded collections.

MongoDB supports three types of sharding:

  • Range-based sharding
  • Hash-based sharding
  • Tag-aware sharding

With the range-based sharding MongoDB divides datasets into ranges determined by the shard key values. With the hash-based sharding MongoDB creates chunks via hash values it computes from the field’s values of the shard key. In general, range-based sharding provides better support for range queries that need query isolation while the hash-based sharding supports write operations more efficiently.

With tag-aware sharding users associate shard key values with specific shards. This type of sharding is usually used to optimize physical locations of documents for location-based applications.

In order to properly select a shard key for your MongoDB sharded cluster, it is important to understand how your application reads and writes data. Actually the main question is

        What is more critical, query isolation, or write scaling, or both?

For the query isolation an ideal situation is when the queries are routed to a single shard or a small subset of shards. In order to select an optimal shard key for query isolation you must take into consideration the following:

  • Analyze what query operations are most performance dependent;
  • Determine which fields are used the most in these operations and include them in the shard key;
  • Make sure that the selected shard key enable even (balanced) distribution of data across shards;
  • A high cardinality field is preferable. Low cardinality fields tend to group documents on a small number of shards what would require frequent rebalancing of the chunks.

MongoDB query router (mongos) will route queries to a shard or subset of shards only when a shard key or a prefix of the shard key is used in the query. Otherwise mongos will route the query to all shards. Also all sharded collections must have an index that starts with a shard key. All documents having the same value for the shard key will reside on the same shard.

For an efficient write scaling, choose a shard key that has both high cardinality and enables even distribution of write operations across the shards.

You should keep in mind that whatever shard key you choose it should be easily divisible to enable even distribution of data across shards when data grows. Shard keys that have a limited number of possible values can result in chunks that are “unsplittable.”.

The most common techniques people use to distribute data are:

  • Ascending key distribution – The shard key field is usually of Date, Timestamp or Objectld type. With this pattern all writes are routed to one shard which MongoDB will keep splitting and spending lots of time migrating data between shards to keep data distribution relatively balanced across the shards. This pattern is not definitely good for the write scaling.
  • Random distribution – This pattern is achieved by fields that do not have an identifiable pattern in the dataset. For example, these fields include usernames, UUIDs, email addresses, or any field which value has a high level of randomness. This is a preferable pattern for write scaling since it enables balanced distribution of write operations and data across the shards. However this pattern does not work well for the query  isolation if the critical queries must retrieve large amount of “close” data based on range criteria  in which case the query will be spread across the most of the shards in the cluster.
  • Location-based distribution – The idea around the location-based data distribution pattern is that the documents with some location-related similarity will fall into the same  range. The location related field could be postal address, IP, postal code, latitude and longitude, etc.
  • Compound Shard Key – Combine more than one field into a shard key in order to come up with optimal shard key  values for high cardinality and balanced distribution of data for an efficient  write scaling and query isolation.
  • Data modeling  to the rescue – Design a data model to include a field that will be exclusively used to enable balanced distribution of data with good support for write scaling and query isolation. First analyze your application read and write operations to get a full understanding of its writing and data retrieval patterns.

The table below lists key considerations for a shard key selection regarding the query isolation and write scaling requirements.

Query isolation importance
Write scaling importance
Shard Key Selection
high
low
  • Range shard key
  • If the selected key does not provide relatively even distribution of data you can either
  • use a compound shard key (containing more than one document filed); or
  • add a special purpose field to your data model that will be used as a shard key. This is an example when data modeling comes to the rescue; or
  • for location-based applications you can manually associate specific ranges of a shard key with a specific shard or subset of shards.
low
high
  • Hashed shard key with high cardinality that will efficiently distribute write operations across the shards.
  • Having a high cardinality does not guarantee an appropriate write scaling all the time. The ascending key distribution is a good example. Write operations that require a high level of scaling should be carefully analyzed to find the best field candidate for the shard key.
  • If a selected key does not provide relatively even distribution of data you can add a special purpose field to your data model that will be used as a shard key.
high
high
  • A shard key enabling mid-high randomness and relatively even distribution of data.  A compound shard keys are usually good candidates.
  • Since an ideal shard key is almost impossible in this case, determine what shard key has the least performance affect on the most critical use cases for both query isolation and write scaling.
  • Data modeling can also help with embedding, referencing and hybrid model options to consider for improving  performance.
  • If a selected key does not provide relatively even distribution of data you can add a special purpose field to your data model that will be used as a shard key.

Daprota M2 Modeling of MongoDB Manual References and DBRefs – Part 2/2

This series of posts provides details with examples for modeling MongoDB Manual References and DBRefs by Daprota M2 service. You can access M2 service using this link:

https://m2.daprota.com

The previous part of the series (Daprota M2 Modeling of MongoDB Manual References and DBRefs – Part 1/2) covered Manual References. In this part of the series we will look at DBRefs.

Database references (DBRefs) are references from one document to another using the value of the referenced (parent) document’s _id field, its collection name, and the database name. While the MongoDB allows DBRefs without the database name provided, M2 models require the database name to be provided. The reason for this is because a Manual Reference in an M2 model must specify the collection name for the model to be complete in which case the DBRef without the database name from the M2 model point of view is the same as the Manual Reference. The database name in DBRef is more of an implementation aspect of the model and it is needed in order to make the DBRef definition complete. Otherwise, without the database name, the DBRef is the same as the Manual Reference to M2.

To resolve DBRefs, your application must perform additional queries to return the referenced documents. Many language drivers supporting MondoDB have helper methods that form the query for the DBRef automatically. Some drivers do not automatically resolve DBRefs into documents. Please refer to MongoDB language drivers documentation for more details.

The DBRef format provides common semantics for representing links between documents if your database must interact with multiple frameworks and tools.

Most of the data model design patterns can be supported by Manual References. Generally speaking you should use Manual References unless you have a firm reason for using DBRefs.

The example below is taken from MongoDB’s DBRef documentation page:

        {
            “_id” : ObjectId(“5126bbf64aed4daf9e2ab771”),
            // .. application fields
            “creator” : {
                  “$ref” : “creators”,
                  “$id” : ObjectId(“5126bc054aed4daf9e2ab772”),
                  “$db” : “users”
            }
        }              

The DBRef in this example references the creators collection’s document that has ObjectId(“5126bc054aed4daf9e2ab772”) value for its _id field. The creators collection is stored in the users database.

Let us model a sample collection Object in M2.

First we will create a model with the name DBRef Sample Model:

CreateModel-DBRef

Click the Create Model button to create the model. When the model is created, the M2 home page will be reloaded:

DBRef-ListModels

Click the DBRef Sample Model link to load the model page and then click the Add Collection tab to load the section for the collection creation. Enter the name and description of the Object collection:

DBRef-ObjectCollection

Click the Add Collection button to create the collection. M2 will also automatically create the collection’s document:

DBRef-ListCollection

Click the Object collection link to load the collection page and then click the Object document link in the Documents section to load the document page:

DBRef-Document

Click the Add Field tab to load the section for the field creation. Enter the name and description of the creator field and select DBRef for the field’s type. When the DBRef is selected as the field’s type, M2 will also require selection of the field’s value type which belongs to the value type of the referenced document’s _id field. It will be ObjectId in this example:

DBRef-AddField

Click the Add Field button to create the field. When the field is created it will be listed on the document page:

DBRef-DocWithField

Click the creator field link to load the field page:

DBRef-DBRef

Click the DBRef tab to load the DBRef section and specify the referenced collection name (creators) and its database (users) to complete the creator field creation:

DBRef-Spec

As you can see, you can either specify a collection name if it is not included in the model (as in this case) or select a collection from the Collections list if it is included in the model. Click the Add DBRef button to update the creator field definition:

DBRef-Final2

Click the model link above to load the DBRef Sample Model page:

DBRef-Final3

The References section of the page, as represented above, lists the reference that was just created. The Target (Child) column has the format: Collection –> Document –> Field. It contains the Object –> Object –> creator value which means that the Object is the target (child) collection and the creator is the field in the Object document of the Object collection whose value will reference the _id field value of the parent Collection (creators) document. The Database column specifies the database of the source (parent) collection.

It is also possible that the target (child) document, in the Collection –> Document –> Field value, is not the target collection document but an embedded document (on any level) in the target collection.

Daprota M2 Modeling of MongoDB Manual References and DBRefs – Part 1/2

This series of posts provides details with examples for modeling MongoDB Manual References and DBRefs by Daprota M2 service. You can access M2 service using this link:

https://m2.daprota.com

For some data models, it is fine to model data with embedded documents (de-normalized model), but in some cases referencing documents (normalized model) is a better choice.

A referenced document can be

  • in the same collection, or
  • in a separate collection in the same database, or
  • in a separate collection in another database.

MongoDB supports two types of references:

  • Manual Reference
  • DBRef

Manual References are used to reference documents either in the same collection or in the separate collection in the same database. The parent documents are referenced via the value of their primary key’s  _id field.

Database references are references from one document to another using the value of the referenced (parent) document’s _id field, its collection name, and the database name.

In this part of the series we will look at the Manual Reference only. The second part will provide insights into DBRefs.

The Manual Reference MongoDB type indicates that the associated field references another document’s _id. The _id is a unique ID field that acts as a primary key. Manual references are simple to create and they should be used for nearly every use case where you want to store a relationship between two documents.

We will use MongoDB’s Publisher-Book example of the Referenced One-to-Many model. This model comes as a pre-created public model in M2:

Referenced One-to-Many V2 Model

Ref-One-To-Many

Publisher‘s id is of type String and it is referenced in the Book document by the publisher_id field of the type Manual reference:String. This means that the values of the publisher_id field will be referencing the values of the Publisher document _id field.

Now, we will demonstrate how we created this model in M2. We will concentrate only on the Publisher and Book collections creation and the creation of their relevant fields (_id and publisher_id) for this example.

First we will create the Referenced One-to-Many model in M2. Enter the name and description of the model and click the Create Model button to create the model as shown below:

AddModel-RefOneToMany

When the model is created, the M2 models page will be loaded and we will click the Referenced One-to-Many model link to load the model’s page:

AddModel-RefOneToMany-2

When the model page is loaded, click the Add Collection tab in order to add the Publisher collection to the model:

AddModel-RefOneToMany-2_2

Enter the Publisher name and description and click the Add Collection button to create it:

AddModel-RefOneToMany-3

Also create the Book collection.

When both collections are created

AddModel-RefOneToMany-5

we will continue with the Publisher document’s _id field creation. Click the Publisher collection link to load the Publisher collection page and then click the Publisher document link to load the Publisher document page. When the Publisher document page is loaded click the Add Field tab to add the _id field first:

AddModel-RefOneToMany-6

Click the Add Field button to add the field. When the field is added, the Publisher document page will be reloaded:

AddModel-RefOneToMany-8

Click the Full Model View to load the full model view page and then click the Book document link, as depicted below, to load the Book document page:

AddModel-RefOneToMany-9

When the Book document page is loaded, click the Add Field tab to add the publisher_id field. First we will select the Manual Reference as its type:

AddModel-RefOneToMany-10

and then we will add the String as the second part of its composite type which belongs to its values:

AddModel-RefOneToMany-11

Click the Add Field button to add the field. The document page will be reloaded when the field is added:

AddModel-RefOneToMany-12

Click the publisher_id link to load the field page and then click the Manual Reference tab to specify reference details:

AddModel-RefOneToMany-13

When the Manual Reference section is loaded, select the Publisher collection’s document and click the Reference Collection button to complete the Manual reference setup for the publisher_id field:

AddModel-RefOneToMany-14

M2 will create the manual reference and reload the Manual Reference section:

AddModel-RefOneToMany-15

Click the Referenced One-to-Many model link above to load the Referenced One-to-Many model page:

AddModel-RefOneToMany-16

The References section of the page (please see above) lists the reference that was just created. Both the Source (Parent) and Target (Child) column has the format: Collection –> Document –> Field. For example, the Target (Child) column contains the Book –> Book –> publisher_id value which means that Book is the target (child) collection and publisher_id is the field in the Book document of the Book collection whose value will reference the _id field value of the parent Collection (Publisher) document. The Database column is reserved for DBRefs only.

It is also possible that the target (child) document, in the Collection –> Document –> Field value, is not the target collection document but an embedded document (on any level) in the target collection. For example, the Role document in the User –> Role –> _id target reference value in the RBAC model below

AddModel-RefOneToMany-17

is not related to the Role collection but to the Role embedded document of the User document’s roles field:

AddModel-RefOneToMany-18

If you click the embedded Role document’s _id field link, the field page will be loaded with the full path for the _id field:

AddModel-RefOneToMany-19

Daprota M2 Cloud Service for MongoDB Data Modeling

The data model design is one of the key elements of the overall application design when MongoDB is used as a back-end (database management) system. While some people would still argue that data modeling is not needed with MongoDB since it is schemaless, the more you develop, deploy and manage applications using MongoDB technology, the more a need for the data model design becomes obvious. At the same time, while the format of documents in a single collection can change over time, in most cases in practice, collections are highly homogeneous. Even with the more frequent structural collection changes, the modeling tool can help you in properly documenting these changes.

Daprota just released the M2 cloud service which is the first service for the MongoDB data modeling. It enables the creation and management of data models for MongoDB.

Only a free M2 service plan is provided for now. It enables the creation and management of up to five private data models and an unlimited access to public data models provided by Daprota and M2 service users. Plan upgrades with either larger or unlimited number of private models to be managed will be available in the near future.

The current public models include Daprota models and models based on design patterns and use cases provided by MongoDB via the MongoDB website.

M2 features include:

  • Management of models and their elements (Collections, Documents, and Fields)
  • Copying and versioning of Models, Collections and Documents via related Copy utilities
  • Export/Import Models
  • Full models view in JSON format
  • Public models sharing
  • Models documentation repository
  • Messaging between M2 users

Daprota plans on adding more features to the service in the near future.

MongoDB Data Models

When creating MongoDB data models, besides knowing internal details of how MongoDB database engine works, there are few other factors that should be considered first:

  • How your data will grow and change over time?
  • What is the read/write ratio?
  • What kinds of queries your application will perform?
  • Are there any concurrency related constrains you should look at?

These factors very much affect what type of model you should create. There are several types of MongoDB models you can create:

  • Embedding Model
  • Referencing Model
  • Hybrid Model that combines embedding and referencing models.

There are also other factors that can affect your decision regarding the type of the model that will be created. These are mostly operational factors and they are documented at Data Modeling Considerations for MongoDB Applications

The key question is:

  • should you embed related objects within one another or
  • should you reference them by their identifier (ID)?

You will need to consider performance, complexity and flexibility of your solution in order to come up with the most appropriate model.

Embedding Model (De-normalization)

Embedding model enables de-normalization of data what means that two or more related pieces of data will be stored in a single document. Generally embedding provides better read operation performance since data can be retrieved in a single database operation. In other words, embedding supports locality. If you application frequently access related data objects the best performance can be achieved by putting them in a single document which is supported by the embedding model.

MongoDB provides atomic operations on a single document only. If fields of a document have to be modified together all of them have to be embedded in a single document in order to guarantee atomicity. MongoDB does not support multi-document transactions. Distributed transactions and distributed join operations are two main challenges associated with distributed database design. By not supporting these features MongoDB has been able to implement highly scalable and efficient atomic sharding solution.

Embedding has also its disadvantages. If we keep embedding related data in documents or constantly updating this data it may cause the document size to grow after the document creation. This can lead to data fragmentation. At the same time the size limit for documents in MongoDB is determined by the maximum BSON document size (BSON doc size) which is 16 MB. For larger documents, you have to consider using GridFS.

On the other hand, if documents are large the fewer documents can fit in RAM and the more likely the server will have to page fault to retrieve documents. The page faults lead to random disk I/O that can significantly slow down the system.

Referencing Model (Normalization)

Referencing model enables normalization of data by storing references between two documents to indicate a relationship between the data stored in each document. Generally referencing models should be used when embedding would result in extensive data duplication and/or data fragmentation (for increased data storage usage that can also lead to reaching maximum document size) with minimal performance advantages or with even negative performance implications; to increase flexibility in performing queries if your application queries data in many different ways, or if you do not know in advance the patterns in which data may be queried; to enable many-to-many relationships; to model large hierarchical data sets (e.g., tree structures)

Using referencing requires more roundtrips to the server.

Hybrid Model

Hybrid model is a combination of embedding and referencing model. It is usually used when neither embedding or referencing model is the best choice but their combination makes the most balanced model.

Polymorphic Schemas

MongoDB does not enforce a common structure for all documents in a collection. While it is possible (but generally not recommended) documents in a MongoDB collection can have different structures.

However our applications evolve over time so that we have to update the document structure for the MongoDB collections used in applications. This means that at some point documents related to the same collection can have different structures and the application has to take care of it. Meanwhile you can fully migrate the collection to the latest document structure what will enable the same application code to manage the collection.

You should also keep in mind that the MongoDB’s lack of schema enforcement requires the document structure details to be stored on a per-document basis what increases storage usage. Especially you should use a reasonable length for the document’s field names since the field names can add up to the overall storage used for the collection.

MongoDB Indexes

MongoDB indexes are based on B-tree data structure. Indexes are important elements in maintaining MongoDB performance if they are properly used. On the other hand, indexes have associated costs that include memory usage, disk usage and slower updates. MongoDB provides explain plan capability and database profiler utility to collect data about database operations that can be used for database tuning.

Memory

Ideally the entire index should be resident in RAM. If the number of distinct values for index is high, we have to ensure that index fits in RAM. Otherwise performance will be impacted.

The parts of index related to the recently inserted data will always be in active RAM. If you query on recent data, MongoDB index will perform well and MongoDB will use less memory. For example, this could be a case when index is based on a time/date field.

Compound Index

Besides single field indexes you can also create compound indexes containing more than one field. The order of fields in a compound index can significantly impact performance. It will improve performance if you place more selective element of your query first in compound index. At the same time, your other queries may be impacted by this choice. This is an example that shows that you have to analyze your entire application in order to make appropriate design decisions regarding indexes.

Each index is stored in a sorted order on all fields in the index and these rules should be followed to provide efficient indexing:

  • Fields that will be queried by equality should be the first fields in the index.
  • The next should be fields used to sort query results. If sorting is based on multiple fields they should occur in the same order in the index definition
  • The last filed in the index definition should be the one queried by range.

It is also good to know that an additional benefit of a compound index is that a leading field within the index can also be used. So if we query with a condition on a single field that is a leading field of an index, the index will be used.

On the other hand an index will be less efficient if we do not range and sort on a same set of fields.

MongoDB provides hint() method to force use of a specific index.

Unique Index

You can create a unique index that will enable uniqueness of the index field value. A compound index can also be specified as unique in which case each combination of index field values has to be unique.

Fields with No Values and Sparse Index

If an index field does not have a value then the index entry with value null will be created. Only one document can have a null value for an index field unless the sparse option is specified for the index in which case the index entries are not created for documents that do not have the field.

You should be aware that using a sparse index will sometime produce an incomplete result when index-based operations (e.g., sorting, filtering, etc.) are used.

Geospatial Index

MongoDB provides geospatial indexes that are used to optimize queries including locations within a two-dimensional space. When dealing with locations documents must have a field with a two-element array (latitude and longitude) to be indexed with a geospatial index.

Array Index

Fields that are arrays can be also indexed in which case each array value is stored as a separate index entry.

Create Index Operation

Creation of an index can be either a foreground or background operation. Foreground operation intensively consume resources and require lots of time in some cases. They are blocking operations in MongoDB. When indexes are created via a background operation more time is needed to create an index but database is not blocked and can be used while index is being created.

AND and OR Query Tips

If you know that a certain criteria in a query will be matching less documents and if this criteria is indexed, make sure that this criteria goes first in your query. This will enable a selection of a smallest number of documents needed to retrieve the data.

OR-style queries are opposite of AND queries. The most inclusive clauses (returning the largest number of documents) should go first since MongoDB has to check documents that are not part of the result set yet for every match.

Useful Information

  •  MongoDB optimizer generally uses one index at a time. If more than one predicate is used in a query then a compound index should be created based on rules previously described.
  • The maximum length of index name is 128 characters and an index entry size Cannot exceed 1,024 bytes.
  • Index maintenance during the add, remove or update operations will slow down these operations. If your application performs heavy updates you should carefully select indexes.
  • Ideally indexes should reduce a set of possible documents to select from so it is important to create high selectivity indexes. For example, an index based on a phone number is more selective than an index based on a ‘yes/no’ flag.
  • Indexes are not efficient in inequality queries.
  • When regular expressions are used leading wildcards will downgrade query performance because indexes are ordered.
  • Indexes are generally useful when we are retrieving a small subset of the total data. They usually stop being useful when we return half of the data or more in a collection.
  • A query that returns only a few fields should be fully covered by an index.
  • Whenever it is possible, create a compound index that can be used by mutiple queries.

Tomcat Security Realm with MongoDB

1. User and Role Document Model

Daprota User Model

2. web.xml

 We have two roles, ContentOwner and ServerAdmin. This is how we set up form-based authentication in web.xml:

  …
 <security-constraint>
     <web-resource-collection>
         <url-pattern>/*</url-pattern>
     </web-resource-collection>
     <auth-constraint>
         <role-name>ServerAdmin</role-name>
         <role-name>ContentOwner</role-name>
     </auth-constraint>
 </security-constraint>

 <login-config>
     <auth-method>FORM</auth-method>
     <realm-name> MongoDBRealm</realm-name>
     <form-login-config>
         <form-login-page>/login.jsp</form-login-page>
         <form-error-page>/login_error.jsp</form-error-page>
     </form-login-config>
 </login-config>

 <!-- Security roles referenced by this web application -->
 <security-role>
     <role-name>ServerAdmin</role-name>
 </security-role>
 <security-role>
 <role-name>ContentOwner</role-name>
 </security-role>

3. Create passwords for admin and test users

($CATALINA_HOME/bin/digest.[bat|sh] -a {algorithm} {cleartext-password})

os-prompt> digest -a SHA-256 manager
manager: 6ee4a469cd4e91053847f5d3fcb61dbcc91e8f0ef10be7748da4c4a1ba382d17

os-prompt> digest -a SHA-256 testpwd
testpwd:a85b6a20813c31a8b1b3f3618da796271c9aa293b3f809873053b21aec501087

Execute this JavaScript  code in MongoDB JS shell:

use mydb
usr = { userName: 'admin',
        password: '1a8565a9dc72048ba03b4156be3e569f22771f23',
        roles: [ { _id: ObjectId(),
                   name: 'ServerAdmin'}
               ]
}
db.user.insert(usr);
usr = { userName: 'test',
        password: '05ec834345cbcf1b86f634f11fd79752bf3b01f3',
        roles: [ { _id: ObjectId(),
                   name: 'ContentOwner'}
               ]
}
db.user.insert(usr);
db.user.find().pretty();

role = { name: 'ServerAdmin',
         description: 'Server administrator role'
}
db.role.insert(role);
role = { name: 'ContentOwner',
         description: 'End-user (client) role'
}
db.role.insert(role);
db.role.find().pretty();

mydb is a MongoDB database name we use in this example.

4. Realm element setup

Set up Realm element, as showed below, in your $CATALINA_HOME/conf/server.xml file:

      <Host name="localhost"  appBase="webapps"
            unpackWARs="true" autoDeploy="true">
        ...
        <Realm className="com.daprota.m2.realm.MongoDBRealm"
               connectionURL="mongodb://localhost:27017/mydb"
               digest="SHA-256"/>
      </Host>
 </Engine>

5. How to encrypt user’s password

The following Java code snippet is an example of how to encrypt a user’s password:

String password =  "password";

MessageDigest messageDigest = java.security.MessageDigest.getInstance("SHA-256");        
messageDigest.update(password.getBytes());            
byte byteData[] = messageDigest.digest();
//Convert byte data to hex format 
StringBuffer hexString = new StringBuffer();
for (int i = 0; i < byteData.length; i++) {            
    String hex=Integer.toHexString(0xff & byteData[i]);                
    if (hex.length()==1) 
        hexString.append('0');                
    hexString.append(hex);                
}

When you store password in MongoDB, store it via hexString.toString().

6. MongoDB realm source code

The source code of the Tomcat security realm implementation with MongoDB and ready to use m2-mongodb-realm.jar are available at

https://github.com/gzugic/mongortom

You just need to copy m2-mongodb-realm.jar to your $CATALINA_HOME/lib.