MongoDB vs. Relational Databases: Myth vs. Reality in NoSQL Design

MongoDB

A document oriented database

Recap

•

Document oriented DB (vs. Relational) [aka noSQL]

•

Collections instead of tables

•

Relationships not required between collections

•

Consistency between collections objects not required

•

Data stored (invisibly) as text files

•

Data manipulation via Mongo API

Myth vs. Reality of noSQL DBs

•

While it is true that noSQL/ Mongo does not require a rigid schema and multiple

tables/ formal relationships are not required …

Myth

•

No design is needed

Reality

•

You STILL need a schema design

•

in some ways more important now, since it becomes too easy to create ‘random’ data

Myth

•

No relationships needed

Reality

•

It’s a choice.  You CAN embed all data in each document, or you can refer to other

documents

Myth vs. Reality of noSQL DBs

Reality

•

Lists of items are acceptable

•

Duplication of data is acceptable

Limits

•

There is a 16MB limit in document size.  If you embed too much …

•

Sooo … use references

Embedding vs. Referencing

MongoDB schema design actually comes down to only two choices for every piece of data. You can either

embed that data directly or reference another piece of data using the

$lookup

 operator (similar to a JOIN).

Let's look at the pros and cons of using each option in your schema.

Embedding

•

Advantages

•

You can retrieve all relevant information in a single query.

•

Avoid implementing joins in application code or using

$lookup

•

Update related information as a single atomic operation.

By default, all CRUD operations on a single

document are ACID compliant

•

However, if you need a transaction across multiple operations, you can use the

transaction operator

•

Though

transactions are available starting 4.0

, however, I should add that it's an anti-pattern to be overly

reliant on using them in your application.

•

Limitations

•

Large documents mean more overhead if most fields are not relevant. You can increase query performance

by limiting the size of the documents that you are sending over the wire for each query.

•

There is a

16-MB document size limit in MongoDB

. If you are embedding too much data inside a single

document, you could potentially hit this limit.

MongoDB Schema Design Best Practices | MongoDB

Referencing

•

Okay, so the other option for designing our schema is referencing another document

using a document's

unique object ID

 and connecting them together using

the

$lookup

 operator. Referencing works similarly as the JOIN operator in an SQL query.

It allows us to split up data to make more efficient and scalable queries, yet maintain

relationships between data.

•

Advantages

•

By splitting up data, you will have smaller documents.

•

Less likely to reach

16-MB-per-document limit

•

Infrequently accessed information not needed on every query.

•

Reduce the amount of duplication of data. However, it's important to note that data

duplication should not be avoided if it results in a better schema.

•

Limitations

•

In order to retrieve all the data in the referenced documents, a minimum of two queries

or

$lookup

 required to retrieve all the information.

Working with MongoDB

We will be using the native mongodb drive from Nodejs

Quick Reference - Node.js Driver v6.8 (mongodb.com)

We’ll look at a few scenarios using this library

•

Creating a collection (aka a table)

•

Adding a document (aka a row) to a collection

•

Getting a collection

•

Getting data

•

Querying multiple collections

MongoDB perennial question

•

Should I be in a relationship?

•

What kind of relationship should I be in?

•

Answer

•

It depends

•

Rule(s) of thumb

•

If it’s a one-to-one: Keep data in the object

•

If it’s one-to-many

•

How many? (One to a few?  One to dozens, hundreds, thousands …)

•

Does the ‘other data’ need to exist separately?

•

If both answers are ‘no’, then embed the data

•

If the other data needs to exist separately … then use a reference

•

Anything else ‘it depends’ (but if you have many, many … probably a reference)

mongosh

•

The command shell for mongo (akin to psql)

swen343db> db['Food-HR'].find()

    _id: ObjectId('667dd6330243dd1dbbc0e385'),

    'First name': 'John',

    'Last name': 'Doe',

    Email: 'johndoe@example.com',

    'Home address': '123 Main St',

    'Job title': 'Manager',

    Salary: '60000',

    Department: 'Management',

    'Hire date': '2020-03-15'

},

> mongosh

test> use swen343db

switched to db swen343db

swen343db> show collections

auto-models-v2

auto-teams-v2

autos

Food-HR

Food-IT

Food-payroll

….

…

swen343db> db['Food-HR'].count()

RDB vs Doc DB

•

If your mindset is still in Relational DB, this might be a useful

reference

•

SQL to MongoDB Mapping Chart - MongoDB Manual v7.0

e.g.

Primary Key?

•

There isn’t really the same concept (no ‘rows’ in mongo), however, each

document is assigned a system generated unique id per document

•

This is the field “_id” (becomes part of the document)

•

You can override the system to create your own ‘unique id’ … not recommended

•

Generating Globally Unique Identifiers for Use with MongoDB | MongoDB

•

Instead, let the system create the _id value, and use it as a ‘primary key’ type

identifier

•

In JS, this is an object, and the mongodb module has its own ObjectId type for this

•

import

ObjectId

from

"mongodb"

•

And then you can do things like

•

const

filter

= {

_id:

ObjectId

createFromHexString

idValue

) };

Updating data

•

Mongodb has commands to help manage data

•

Update a document (the $set is the mongo command with the data as the

param)

•

const

updateDoc

= {

$set:

data

};

//Update the whole row

•

const

result

await

collection

updateOne

filter

updateDoc

);

•

Create a new document

•

const

result

await

collection

insertOne

data

);

Searching for data

•

db.getCollections(‘collection-name’).find(

{‘field1’:’value1’})

//Any number of parameters in the find clause

//If you have a hyphen in the collection name, you will have to use

getCollection to retrieve it vs. using the name of the collection directly

And, of course, data can be sorted

•

db.getCollections(‘collection-name’).find(

{‘field1’:’value1’}).sort({‘fieldX’: 1}) //1 = ascending, -1 = descending

A few suggestions/ cautions

•

Some commands are different between monosh and mongodb for

nodejs

•

Always check the documentation for each

•

Commands can be complex

•

I have found it useful create a single purpose API (JS method) for a particular

situation and test locally before creating a more general purpose RESTful API

with flexible parameters

•

Implement server method

•

Implement Jest test

•

Once it works, refactor for flexibility

•

Implement RESTful API

•

Implement client UI

Advanced Mongo

More complex DB work

•

Working within a single set of documents and basic fields is pretty

straightforward

•

And this is one of the attractive parts of Mongo (quick startup)

•

Doing more complex queries is another story

•

Personal view: SQL is easier to work with; but Mongo exists, and you work

with what you have!

Aggregation

let

pipeline

 = [ {

$group:

({

_id:

`$

${

field

count:

$sum:

 }} )

}];

let

aggCursor

collection

aggregate

pipeline

);

let

result

 = [];

for

await

const

doc

of

aggCursor

) {

result

push

doc

);

This is similar to the GROUP

and COUNT(*) in SQL

Note the aggregation works as

a sequence of operations

(pipeline).

Specify the field you want to

group by and then the

operation (count, in this case)

Also note you get a cursor

back and need to iterate the

cursor to get the actual data

Aggregation … more

let

pipeline

 = [ {

$group:

({

_id:

`$

${

field

averageSalary:

'$avg'

`$Gross Pay This Period`

 }} )

}];

let

aggCursor

collection

aggregate

pipeline

);

And you can do average …

Note how you can handle

annoying column names (with

spaces … or just don’t do

that!)

Even more …

•

The pipeline sequences operations

•

So if you want to aggregate and then sort, the array has two stages

•

Note how the variable ‘sortField’ is treated in []

pipeline

= [

$group:

({

_id:

'$ Condition'

count:

$sum:

 }} ) },

$sort:

({

[sortField] :

})}

];

Joins

•

But what if you want to connect data from more than one collection

…

•

$lookup: This is the basic tool in mongo for joins

•

However!!!  It does the equivalent of an outer left join! (Don’t ask me why)

•

The good news

•

You can still get what you need

•

The heart of this is still $lookup, but you use more of the mongo ‘pipeline’

approach

It’s as easy as …

let

aggCursor

db

collection

‘uni-students'

).

aggregate

([{

$lookup:

from:

“student-grades"

//The 2nd collection

localField:

“student_id"

//The join field in the 1st collection

foreignField:

“student_id"

//The join field in the 2nd collection

as:

"result"

//The output

},{

$match:

result:

$ne:

[]

//The removal of non-matching fields (inner join)

},{

$addFields:

result:

$arrayElemAt:

"$result"

//Put remaining fields back into the result

},{

$project:

//Create the joined 'view', picking the desired fields

‘StudentID

’

’$student_id'

‘Student

 First Name’

’$Student First Name'

‘Student Last Name'

‘$Student Last Name'

Degree:

’$Degree'

‘Advisor'

‘$Advisor'

Year:

"$Year"

Grade:

"$result.

grade

//Note this is from the result

}])

let

joinedData

 = [];

for

await

const

doc

of

aggCursor

) {

joinedData

push

doc

);

Slide Note

Embed Share

Download

MongoDB is a document-oriented database that challenges traditional relational databases by offering flexible schema designs. While myths suggest no need for schema design or relationships in MongoDB, the reality emphasizes the importance of thoughtful schema design and choice in data relationships. The discussion between embedding and referencing data in MongoDB schema design presents advantages and limitations for each approach.

scarlet Follow

Uploaded on Jul 16, 2024 | 2 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

MongoDB A document oriented database

Recap Document oriented DB (vs. Relational) [aka noSQL] Collections instead of tables Relationships not required between collections Consistency between collections objects not required Data stored (invisibly) as text files Data manipulation via Mongo API

Myth vs. Reality of noSQL DBs While it is true that noSQL/ Mongo does not require a rigid schema and multiple tables/ formal relationships are not required Myth No design is needed Reality You STILL need a schema design in some ways more important now, since it becomes too easy to create random data Myth No relationships needed Reality It s a choice. You CAN embed all data in each document, or you can refer to other documents

Myth vs. Reality of noSQL DBs Reality Lists of items are acceptable Duplication of data is acceptable Limits There is a 16MB limit in document size. If you embed too much Sooo use references

Embedding vs. Referencing MongoDB schema design actually comes down to only two choices for every piece of data. You can either embed that data directly or reference another piece of data using the $lookup operator (similar to a JOIN). Let's look at the pros and cons of using each option in your schema. Embedding Advantages You can retrieve all relevant information in a single query. Avoid implementing joins in application code or using $lookup. Update related information as a single atomic operation. By default, all CRUD operations on a single document are ACID compliant. However, if you need a transaction across multiple operations, you can use the transaction operator. Though transactions are available starting 4.0, however, I should add that it's an anti-pattern to be overly reliant on using them in your application. Limitations Large documents mean more overhead if most fields are not relevant. You can increase query performance by limiting the size of the documents that you are sending over the wire for each query. There is a 16-MB document size limit in MongoDB. If you are embedding too much data inside a single document, you could potentially hit this limit. MongoDB Schema Design Best Practices | MongoDB

Referencing Okay, so the other option for designing our schema is referencing another document using a document's unique object ID and connecting them together using the $lookup operator. Referencing works similarly as the JOIN operator in an SQL query. It allows us to split up data to make more efficient and scalable queries, yet maintain relationships between data. Advantages By splitting up data, you will have smaller documents. Less likely to reach 16-MB-per-document limit. Infrequently accessed information not needed on every query. Reduce the amount of duplication of data. However, it's important to note that data duplication should not be avoided if it results in a better schema. Limitations In order to retrieve all the data in the referenced documents, a minimum of two queries or $lookup required to retrieve all the information.

Working with MongoDB We will be using the native mongodb drive from Nodejs Quick Reference - Node.js Driver v6.8 (mongodb.com) We ll look at a few scenarios using this library Creating a collection (aka a table) Adding a document (aka a row) to a collection Getting a collection Getting data Querying multiple collections

MongoDB perennial question Should I be in a relationship? What kind of relationship should I be in? Answer It depends Rule(s) of thumb If it s a one-to-one: Keep data in the object If it s one-to-many How many? (One to a few? One to dozens, hundreds, thousands ) Does the other data need to exist separately? If both answers are no , then embed the data If the other data needs to exist separately then use a reference Anything else it depends (but if you have many, many probably a reference)

mongosh The command shell for mongo (akin to psql) swen343db> db['Food-HR'].find() [ > mongosh test> use swen343db switched to db swen343db swen343db> show collections auto-models-v2 auto-teams-v2 autos Food-HR Food-IT Food-payroll . swen343db> db['Food-HR'].count() 50 { _id: ObjectId('667dd6330243dd1dbbc0e385'), 'First name': 'John', 'Last name': 'Doe', Email: 'johndoe@example.com', 'Home address': '123 Main St', 'Job title': 'Manager', Salary: '60000', Department: 'Management', 'Hire date': '2020-03-15' }, {

RDB vs Doc DB If your mindset is still in Relational DB, this might be a useful reference SQL to MongoDB Mapping Chart - MongoDB Manual v7.0 e.g. SELECT * FROM people db.people.find() SELECT COUNT(*) FROM people db.people.count() INSERT INTO people(user_id, age, status) VALUES ("bcd001", 45, "A") db.people.insertOne( { user_id: "bcd001", age: 45, status: "A" } )

Primary Key? There isn t really the same concept (no rows in mongo), however, each document is assigned a system generated unique id per document This is the field _id (becomes part of the document) You can override the system to create your own unique id not recommended Generating Globally Unique Identifiers for Use with MongoDB | MongoDB Instead, let the system create the _id value, and use it as a primary key type identifier In JS, this is an object, and the mongodb module has its own ObjectId type for this import { ObjectId } from "mongodb"; And then you can do things like const filter = { _id: ObjectId.createFromHexString(idValue) };

Updating data Mongodb has commands to help manage data Update a document (the $set is the mongo command with the data as the param) const updateDoc = { $set: data };//Update the whole row const result = await collection.updateOne(filter, updateDoc); Create a new document const result = await collection.insertOne(data);

Searching for data db.getCollections( collection-name ).find( { field1 : value1 }) //Any number of parameters in the find clause //If you have a hyphen in the collection name, you will have to use getCollection to retrieve it vs. using the name of the collection directly And, of course, data can be sorted db.getCollections( collection-name ).find( { field1 : value1 }).sort({ fieldX : 1}) //1 = ascending, -1 = descending

A few suggestions/ cautions Some commands are different between monosh and mongodb for nodejs Always check the documentation for each Commands can be complex I have found it useful create a single purpose API (JS method) for a particular situation and test locally before creating a more general purpose RESTful API with flexible parameters Implement server method Implement Jest test Once it works, refactor for flexibility Implement RESTful API Implement client UI

Advanced Mongo

More complex DB work Working within a single set of documents and basic fields is pretty straightforward And this is one of the attractive parts of Mongo (quick startup) Doing more complex queries is another story Personal view: SQL is easier to work with; but Mongo exists, and you work with what you have!

Aggregation let pipeline = [ { This is similar to the GROUP and COUNT(*) in SQL Note the aggregation works as a sequence of operations (pipeline). Specify the field you want to group by and then the operation (count, in this case) Also note you get a cursor back and need to iterate the cursor to get the actual data $group:({ _id: `$${field}`, count: { $sum: 1 }} ) }]; let aggCursor = collection.aggregate(pipeline); let result = []; for await (const doc of aggCursor) { result.push(doc); }

Aggregation more let pipeline = [ { And you can do average $group:({ _id: `$${field}`, averageSalary: { '$avg': `$Gross Pay This Period` }} ) Note how you can handle annoying column names (with spaces or just don t do that!) }]; let aggCursor = collection.aggregate(pipeline);

Even more The pipeline sequences operations So if you want to aggregate and then sort, the array has two stages pipeline = [ { $group:({ _id: '$ Condition', count: { $sum: 1 }} ) }, {$sort: ({ [sortField] : 1})} ]; Note how the variable sortField is treated in []

Joins But what if you want to connect data from more than one collection $lookup: This is the basic tool in mongo for joins However!!! It does the equivalent of an outer left join! (Don t ask me why) The good news You can still get what you need The heart of this is still $lookup, but you use more of the mongo pipeline approach

Its as easy as let aggCursor = db.collection( uni-students').aggregate([{ $lookup: { from: student-grades", //The 2nd collection localField: student_id", //The join field in the 1st collection foreignField: student_id", //The join field in the 2nd collection as: "result" //The output } },{$match: { result: { $ne: [] //The removal of non-matching fields (inner join) } } },{ $addFields: { result: { $arrayElemAt: ["$result",0] //Put remaining fields back into the result } } },{$project: { //Create the joined 'view', picking the desired fields StudentID : $student_id', Student First Name : $Student First Name', Student Last Name': $Student Last Name', Degree: $Degree', Advisor': $Advisor', Year: "$Year", Grade: "$result.grade" //Note this is from the result } }]) let joinedData = []; for await (const doc of aggCursor) { joinedData.push(doc); }

MongoDB vs. Relational Databases: Myth vs. Reality in NoSQL Design

Download Presentation

Presentation Transcript

Related

More Related Content