Lesson 5
Schema Validation in MongoDB
Introduction

Welcome back to our journey through MongoDB. In our last few lessons, we've explored various data types supported by MongoDB and tackled different types of relations: one-to-one, one-to-many, and many-to-many. Now, we'll be adding another tool to our toolkit: Schema Validation. Validation is a way for us to enforce certain rules and standards on our data, ensuring that all entries in our database meet certain requirements.

Structuring Documents

As mentioned earlier, MongoDB is schemaless, which means that a single collection can store documents with entirely different structures. For example:

1[ 2 { "title": "To Kill a Mockingbird", "published_year": NumberInt(1949) }, 3 { "name": "To Kill a Mockingbird", "published_at": NumberInt(1949) }, 4 { "book_title": "To Kill a Mockingbird", "year": NumberInt(1949) } 5]

Although this is technically possible, developers typically ensure that all documents in a collection share a similar structure or at least have consistent core fields to avoid complications in querying and data integrity. For instance:

1[ 2 { "title": "To Kill a Mockingbird", "author": "Harper Lee", "published_year": NumberInt(1960), "genre": "Fiction", "copies_sold": NumberInt(40000000) }, 3 { "title": "1984", "author": "George Orwell", "published_year": NumberInt(1949), "cover_type": "Hardcover" }, 4 { "title": "Brave New World", "author": "Aldous Huxley", "published_year": NumberInt(1932), "language": "English", "pages": NumberInt(288) } 5]

In this example, while title, author, and published_year are the core fields required for each document, several optional fields like genre, copies_sold, cover_type, language, and pages are also included. These optional fields provide additional information but aren't mandatory for each document. To enforce the required fields and maintain consistency, we can use Schema Validation.

Schema Validation in MongoDB

Schema validation allows us to define specific rules that all documents in a collection must follow. These rules can range from simple—such as requiring a field to be a string—to complex, like ensuring a numerical field falls within a certain range. By implementing validation, we ensure data integrity and consistency across our collections.

MongoDB provides two important options for fine-tuning schema validation: validationLevel and validationAction.

The validationLevel option specifies which documents should undergo validation:

  • strict: This is the default setting. All documents must comply with the validation rules, including those already in the collection and any newly inserted or updated documents.
  • moderate: This setting enforces validation only during updates or insertions. Existing documents that do not conform to the rules will not cause errors unless they are modified.

The validationAction option specifies what should happen when validation fails:

  • error: This is the default setting. Any document that does not meet the validation rules will cause the insertion or update operation to fail with an error.
  • warn: Instead of causing an error, the operation will proceed, but a warning will be logged. This can be useful for tracking validation rule violations without rejecting data.

You can specify these options when creating or modifying a collection with schema validation.

Creating a New Collection with Schema Validation

Let's jump into a practical example. Suppose we're creating a new collection called journals. We want each journal document to have three fields: title, author, and published_year. Also, we want published_year to be no earlier than 1440 (the year the Gutenberg Press was invented!).

This is how we do it:

JavaScript
1use library_db 2 3db.createCollection("journals", { 4 validator: { // Start of the validator object 5 $jsonSchema: { 6 bsonType: "object", // The documents must be objects 7 required: ["title", "author", "published_year"], // These fields are required 8 properties: { 9 title: { 10 bsonType: "string", // The 'title' field must be a string 11 description: "must be a string and is required" 12 }, 13 author: { 14 bsonType: "string", // The 'author' field must be a string 15 description: "must be a string and is required" 16 }, 17 published_year: { 18 bsonType: "int", // The 'published_year' field must be an integer 19 minimum: 1440, // 'published_year' cannot be earlier than 1440 20 description: "must be an integer no less than 1440 and is required" 21 } 22 } 23 } 24 }, 25 validationLevel: "strict", // Setting validation level to 'strict' 26 validationAction: "error" // Setting validation action to 'error' 27});

This command establishes a new collection in our library_db database with a schema validator. Now, every new document in the journals collection must meet the rules we've set.

Modifying an Existing Collection with Schema Validation

But what if we want to add schema validation to a collection that already exists? Or we want to change the validation rules we defined for a specific collection? MongoDB has us covered. Let's say we want to update the validation rules for the journals collection and we want to make sure that published year is at least 1800 instead of 1450 as we defined earlier. Here's how we can do that:

JavaScript
1use library_db 2 3db.runCommand({ 4 collMod: "journals", 5 validator: { 6 $jsonSchema: { 7 bsonType: "object", 8 required: ["title", "author", "published_year"], 9 properties: { 10 title: { 11 bsonType: "string", 12 description: "must be a string and is required" 13 }, 14 author: { 15 bsonType: "string", 16 description: "must be a string and is required" 17 }, 18 published_year: { 19 bsonType: "int", 20 minimum: 1800, // This field was updated 21 description: "must be an integer and is required" 22 } 23 } 24 } 25 }, 26 validationLevel: "strict", 27 validationAction: "error" 28});

In this command, we use db.runCommand() with the collMod operation, effectively updating schema validation of an existing collection. Notice that published_year now should have a value no less than 1800.

Validating Basic Fields

MongoDB allows you to validate basic BSON types such as strings, numbers, booleans, and more. Here is how you can validate a document's structure for various basic fields:

JavaScript
1use library_db 2 3db.createCollection("journals", { 4 validator: { 5 $jsonSchema: { 6 bsonType: "object", 7 required: ["title", "pages", "isPublished", "publishDate", "createdAt", "metadata", "price", "views"], 8 properties: { 9 title: { bsonType: "string", description: "must be a string and is required" }, 10 pages: { bsonType: "int", minimum: 1, description: "must be a positive integer and is required" }, 11 isPublished: { bsonType: "bool", description: "must be a boolean and is required" }, 12 publishDate: { bsonType: "date", description: "must be a Date object and is required" }, 13 createdAt: { bsonType: "timestamp", description: "must be a Timestamp object and is required" }, 14 metadata: { bsonType: ["null", "object"], description: "must be either null or an object" }, 15 price: { bsonType: "decimal", description: "must be a decimal and is required" }, 16 views: { bsonType: "long", description: "must be a long integer and is required" } 17 } 18 } 19 } 20})

In this example, the journals collection requires fields title, pages, isPublished, publishDate, createdAt, metadata,price, and views with specific BSON types. For metadata, using an array of BSON types ["null", "object"] means that it can be either of those types.

Validating Nested Objects

When dealing with nested objects, you can define validation rules for each nested field. Here's an example of validating an object structure:

JavaScript
1use library_db 2 3db.createCollection("journals", { 4 validator: { 5 $jsonSchema: { 6 bsonType: "object", 7 required: ["title", "editor"], 8 properties: { 9 title: { bsonType: "string", description: "must be a string and is required" }, 10 editor: { 11 bsonType: "object", 12 required: ["name", "years_of_experience"], 13 properties: { 14 name: { bsonType: "string", description: "must be a string and is required" }, 15 years_of_experience: { bsonType: "int", minimum: 1, description: "must be an integer and at least 1" } 16 }, 17 description: "must be an object and is required" 18 } 19 } 20 } 21 } 22});

Here, the journals collection requires an editor field that must be an object containing name and years_of_experience with their respective rules.

Validating Array Fields

MongoDB also supports validation of arrays and their contents. Below is an example of how to enforce rules on array fields:

JavaScript
1use library_db 2 3db.createCollection("journals", { 4 validator: { 5 $jsonSchema: { 6 bsonType: "object", 7 required: ["keywords", "ratings"], 8 properties: { 9 keywords: { 10 bsonType: "array", 11 items: { bsonType: "string", description: "each item must be a string" }, 12 description: "must be an array of strings and is required" 13 }, 14 ratings: { 15 bsonType: "array", 16 items: { bsonType: "int", minimum: 1, maximum: 5, description: "each rating must be an integer between 1 and 5" }, 17 description: "must be an array of integers and is required" 18 } 19 } 20 } 21 } 22});

In this snippet, the journals collection requires keywords to be an array of strings and ratings to be an array of integers between 1 and 5, inclusive.

Summary

Great job making it this far! We've covered the concept of schema validation in MongoDB: what it is, why it's useful, and how to apply it to new and existing collections with the journals example.

Next, you'll be designing your own schemas and applying them to various collections through hands-on exercises. This practice will solidify your skills and prepare you for our next topic: data querying and retrieval. Keep up the impressive work as we continue to dive deeper into MongoDB's capabilities.

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.