MongoDB vs. Relational Databases: Myth vs. Reality in NoSQL Design

 
MongoDB
 
A document oriented database
 
Recap
 
Document oriented DB (vs. Relational) [aka noSQL]
Collections instead of tables
Relationships not required between collections
Consistency between collections objects not required
Data stored (invisibly) as text files
Data manipulation via Mongo API
 
Myth vs. Reality of noSQL DBs
 
While it is true that noSQL/ Mongo does not require a rigid schema and multiple
tables/ formal relationships are not required …
Myth
No design is needed
Reality
You STILL need a schema design
in some ways more important now, since it becomes too easy to create ‘random’ data
Myth
No relationships needed
Reality
It’s a choice.  You CAN embed all data in each document, or you can refer to other
documents
 
Myth vs. Reality of noSQL DBs
 
Reality
Lists of items are acceptable
Duplication of data is acceptable
 
Limits
There is a 16MB limit in document size.  If you embed too much …
Sooo … use references
 
Embedding vs. Referencing
 
MongoDB schema design actually comes down to only two choices for every piece of data. You can either
embed that data directly or reference another piece of data using the 
$lookup
 operator (similar to a JOIN).
Let's look at the pros and cons of using each option in your schema.
Embedding
Advantages
You can retrieve all relevant information in a single query.
Avoid implementing joins in application code or using 
$lookup
.
Update related information as a single atomic operation. 
By default, all CRUD operations on a single
document are ACID compliant
.
However, if you need a transaction across multiple operations, you can use the 
transaction operator
.
Though 
transactions are available starting 4.0
, however, I should add that it's an anti-pattern to be overly
reliant on using them in your application.
Limitations
Large documents mean more overhead if most fields are not relevant. You can increase query performance
by limiting the size of the documents that you are sending over the wire for each query.
There is a 
16-MB document size limit in MongoDB
. If you are embedding too much data inside a single
document, you could potentially hit this limit.
 
MongoDB Schema Design Best Practices | MongoDB
 
Referencing
 
Okay, so the other option for designing our schema is referencing another document
using a document's 
unique object ID
 and connecting them together using
the 
$lookup
 operator. Referencing works similarly as the JOIN operator in an SQL query.
It allows us to split up data to make more efficient and scalable queries, yet maintain
relationships between data.
Advantages
By splitting up data, you will have smaller documents.
Less likely to reach 
16-MB-per-document limit
.
Infrequently accessed information not needed on every query.
Reduce the amount of duplication of data. However, it's important to note that data
duplication should not be avoided if it results in a better schema.
Limitations
In order to retrieve all the data in the referenced documents, a minimum of two queries
or 
$lookup
 required to retrieve all the information.
 
Working with MongoDB
 
We will be using the native mongodb drive from Nodejs
 
Quick Reference - Node.js Driver v6.8 (mongodb.com)
We’ll look at a few scenarios using this library
Creating a collection (aka a table)
 
Adding a document (aka a row) to a collection
 
Getting a collection
 
Getting data
 
Querying multiple collections
MongoDB perennial question
 
Should I be in a relationship?
What kind of relationship should I be in?
Answer
It depends
Rule(s) of thumb
If it’s a one-to-one: Keep data in the object
If it’s one-to-many
How many? (One to a few?  One to dozens, hundreds, thousands …)
Does the ‘other data’ need to exist separately?
If both answers are ‘no’, then embed the data
If the other data needs to exist separately … then use a reference
Anything else ‘it depends’ (but if you have many, many … probably a reference)
 
mongosh
 
The command shell for mongo (akin to psql)
 
swen343db> db['Food-HR'].find()
[
  {
    _id: ObjectId('667dd6330243dd1dbbc0e385'),
    'First name': 'John',
    'Last name': 'Doe',
    Email: 'johndoe@example.com',
    'Home address': '123 Main St',
    'Job title': 'Manager',
    Salary: '60000',
    Department: 'Management',
    'Hire date': '2020-03-15'
  },
  {
 
> mongosh
test> use swen343db
switched to db swen343db
swen343db> show collections
auto-models-v2
auto-teams-v2
autos
Food-HR
Food-IT
Food-payroll
….
swen343db> db['Food-HR'].count()
50
 
RDB vs Doc DB
 
If your mindset is still in Relational DB, this might be a useful
reference
SQL to MongoDB Mapping Chart - MongoDB Manual v7.0
 
e.g.
 
Primary Key?
 
There isn’t really the same concept (no ‘rows’ in mongo), however, each
document is assigned a system generated unique id per document
This is the field “_id” (becomes part of the document)
You can override the system to create your own ‘unique id’ … not recommended
Generating Globally Unique Identifiers for Use with MongoDB | MongoDB
Instead, let the system create the _id value, and use it as a ‘primary key’ type
identifier
In JS, this is an object, and the mongodb module has its own ObjectId type for this
import
 { 
ObjectId
 } 
from
 
"mongodb"
;
And then you can do things like
 
    
const
 
filter
 = { 
_id:
 
ObjectId
.
createFromHexString
(
idValue
) };
 
Updating data
 
Mongodb has commands to help manage data
Update a document (the $set is the mongo command with the data as the
param)
const
 
updateDoc
 = { 
$set:
 
data
 };
//Update the whole row
const
 
result
 = 
await
 
collection
.
updateOne
(
filter
, 
updateDoc
);
 
Create a new document
   
const
 
result
 = 
await
 
collection
.
insertOne
(
data
);
 
Searching for data
db.getCollections(‘collection-name’).find(
{‘field1’:’value1’})
//Any number of parameters in the find clause
//If you have a hyphen in the collection name, you will have to use
getCollection to retrieve it vs. using the name of the collection directly
 
And, of course, data can be sorted
db.getCollections(‘collection-name’).find(
{‘field1’:’value1’}).sort({‘fieldX’: 1}) //1 = ascending, -1 = descending
 
A few suggestions/ cautions
 
Some commands are different between monosh and mongodb for
nodejs
Always check the documentation for each
Commands can be complex
I have found it useful create a single purpose API (JS method) for a particular
situation and test locally before creating a more general purpose RESTful API
with flexible parameters
Implement server method
Implement Jest test
Once it works, refactor for flexibility
Implement RESTful API
Implement client UI
 
Advanced Mongo
 
 
More complex DB work
 
Working within a single set of documents and basic fields is pretty
straightforward
And this is one of the attractive parts of Mongo (quick startup)
Doing more complex queries is another story
Personal view: SQL is easier to work with; but Mongo exists, and you work
with what you have!
 
Aggregation
    
let
 
pipeline
 = [ {
        
$group:
({ 
_id:
 
`$
${
field
}
`
, 
count:
 { 
$sum:
 
1
 }} )
      }];
    
let
 
aggCursor
 =  
collection
.
aggregate
(
pipeline
);
    
let
 
result
 = [];
    
for
 
await
 (
const
 
doc
 
of
 
aggCursor
) {
        
result
.
push
(
doc
);
    }
 
This is similar to the GROUP
and COUNT(*) in SQL
Note the aggregation works as
a sequence of operations
(pipeline).
Specify the field you want to
group by and then the
operation (count, in this case)
Also note you get a cursor
back and need to iterate the
cursor to get the actual data
 
Aggregation … more
    
let
 
pipeline
 = [ {
        
$group:
({ 
_id:
 
`$
${
field
}
`
, 
averageSalary:
 {
'$avg'
:
 
`$Gross Pay This Period`
 }} )
      }];
    
let
 
aggCursor
 =  
collection
.
aggregate
(
pipeline
);
 
And you can do average …
Note how you can handle
annoying column names (with
spaces … or just don’t do
that!)
 
Even more …
 
The pipeline sequences operations
So if you want to aggregate and then sort, the array has two stages
 
 
 
 
 
Note how the variable ‘sortField’ is treated in []
        
pipeline
 = [
            { 
$group:
({ 
_id:
 
'$ Condition'
, 
count:
 { 
$sum:
 
1
 }} ) },
            {
$sort:
 ({ 
[sortField] :
 
1
})}
        ];
 
Joins
 
But what if you want to connect data from more than one collection
$lookup: This is the basic tool in mongo for joins
However!!!  It does the equivalent of an outer left join! (Don’t ask me why)
The good news
You can still get what you need
The heart of this is still $lookup, but you use more of the mongo ‘pipeline’
approach
 
It’s as easy as …
     
let
 
aggCursor
 = 
db
.
collection
(
‘uni-students'
).
aggregate
([{
        
$lookup:
 {
            
from:
 
“student-grades"
, 
//The 2nd collection
            
localField:
 
“student_id"
, 
//The join field in the 1st collection
            
foreignField:
 
“student_id"
, 
//The join field in the 2nd collection
            
as:
 
"result"
 
//The output
          }
      },{
         
$match:
 {
             
result:
 {
               
$ne:
 [] 
//The removal of non-matching fields (inner join)
             }
          }
      },{
        
$addFields:
 {
             
result:
 {
               
$arrayElemAt:
 [
"$result"
,
0
] 
//Put remaining fields back into the result
             }
        }
      },{
         
$project:
 { 
//Create the joined 'view', picking the desired fields
            
‘StudentID
:
’$student_id'
,
            
‘Student
 First Name’
:
’$Student First Name'
,
            
‘Student Last Name'
:
 
‘$Student Last Name'
,
            
Degree:
’$Degree'
,
            
‘Advisor'
:
 
‘$Advisor'
,
            
Year:
 
"$Year"
,
            
Grade:
 
"$result.
grade
"
 
//Note this is from the result
         }
      }])
    
let
 
joinedData
 = [];
    
for
 
await
 (
const
 
doc
 
of
 
aggCursor
) {
        
joinedData
.
push
(
doc
);
    }
Slide Note
Embed
Share

MongoDB is a document-oriented database that challenges traditional relational databases by offering flexible schema designs. While myths suggest no need for schema design or relationships in MongoDB, the reality emphasizes the importance of thoughtful schema design and choice in data relationships. The discussion between embedding and referencing data in MongoDB schema design presents advantages and limitations for each approach.

  • MongoDB
  • Relational Databases
  • NoSQL
  • Schema Design
  • Data Relationships

Uploaded on Jul 16, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. MongoDB A document oriented database

  2. Recap Document oriented DB (vs. Relational) [aka noSQL] Collections instead of tables Relationships not required between collections Consistency between collections objects not required Data stored (invisibly) as text files Data manipulation via Mongo API

  3. Myth vs. Reality of noSQL DBs While it is true that noSQL/ Mongo does not require a rigid schema and multiple tables/ formal relationships are not required Myth No design is needed Reality You STILL need a schema design in some ways more important now, since it becomes too easy to create random data Myth No relationships needed Reality It s a choice. You CAN embed all data in each document, or you can refer to other documents

  4. Myth vs. Reality of noSQL DBs Reality Lists of items are acceptable Duplication of data is acceptable Limits There is a 16MB limit in document size. If you embed too much Sooo use references

  5. Embedding vs. Referencing MongoDB schema design actually comes down to only two choices for every piece of data. You can either embed that data directly or reference another piece of data using the $lookup operator (similar to a JOIN). Let's look at the pros and cons of using each option in your schema. Embedding Advantages You can retrieve all relevant information in a single query. Avoid implementing joins in application code or using $lookup. Update related information as a single atomic operation. By default, all CRUD operations on a single document are ACID compliant. However, if you need a transaction across multiple operations, you can use the transaction operator. Though transactions are available starting 4.0, however, I should add that it's an anti-pattern to be overly reliant on using them in your application. Limitations Large documents mean more overhead if most fields are not relevant. You can increase query performance by limiting the size of the documents that you are sending over the wire for each query. There is a 16-MB document size limit in MongoDB. If you are embedding too much data inside a single document, you could potentially hit this limit. MongoDB Schema Design Best Practices | MongoDB

  6. Referencing Okay, so the other option for designing our schema is referencing another document using a document's unique object ID and connecting them together using the $lookup operator. Referencing works similarly as the JOIN operator in an SQL query. It allows us to split up data to make more efficient and scalable queries, yet maintain relationships between data. Advantages By splitting up data, you will have smaller documents. Less likely to reach 16-MB-per-document limit. Infrequently accessed information not needed on every query. Reduce the amount of duplication of data. However, it's important to note that data duplication should not be avoided if it results in a better schema. Limitations In order to retrieve all the data in the referenced documents, a minimum of two queries or $lookup required to retrieve all the information.

  7. Working with MongoDB We will be using the native mongodb drive from Nodejs Quick Reference - Node.js Driver v6.8 (mongodb.com) We ll look at a few scenarios using this library Creating a collection (aka a table) Adding a document (aka a row) to a collection Getting a collection Getting data Querying multiple collections

  8. MongoDB perennial question Should I be in a relationship? What kind of relationship should I be in? Answer It depends Rule(s) of thumb If it s a one-to-one: Keep data in the object If it s one-to-many How many? (One to a few? One to dozens, hundreds, thousands ) Does the other data need to exist separately? If both answers are no , then embed the data If the other data needs to exist separately then use a reference Anything else it depends (but if you have many, many probably a reference)

  9. mongosh The command shell for mongo (akin to psql) swen343db> db['Food-HR'].find() [ > mongosh test> use swen343db switched to db swen343db swen343db> show collections auto-models-v2 auto-teams-v2 autos Food-HR Food-IT Food-payroll . swen343db> db['Food-HR'].count() 50 { _id: ObjectId('667dd6330243dd1dbbc0e385'), 'First name': 'John', 'Last name': 'Doe', Email: 'johndoe@example.com', 'Home address': '123 Main St', 'Job title': 'Manager', Salary: '60000', Department: 'Management', 'Hire date': '2020-03-15' }, {

  10. RDB vs Doc DB If your mindset is still in Relational DB, this might be a useful reference SQL to MongoDB Mapping Chart - MongoDB Manual v7.0 e.g. SELECT * FROM people db.people.find() SELECT COUNT(*) FROM people db.people.count() INSERT INTO people(user_id, age, status) VALUES ("bcd001", 45, "A") db.people.insertOne( { user_id: "bcd001", age: 45, status: "A" } )

  11. Primary Key? There isn t really the same concept (no rows in mongo), however, each document is assigned a system generated unique id per document This is the field _id (becomes part of the document) You can override the system to create your own unique id not recommended Generating Globally Unique Identifiers for Use with MongoDB | MongoDB Instead, let the system create the _id value, and use it as a primary key type identifier In JS, this is an object, and the mongodb module has its own ObjectId type for this import { ObjectId } from "mongodb"; And then you can do things like const filter = { _id: ObjectId.createFromHexString(idValue) };

  12. Updating data Mongodb has commands to help manage data Update a document (the $set is the mongo command with the data as the param) const updateDoc = { $set: data };//Update the whole row const result = await collection.updateOne(filter, updateDoc); Create a new document const result = await collection.insertOne(data);

  13. Searching for data db.getCollections( collection-name ).find( { field1 : value1 }) //Any number of parameters in the find clause //If you have a hyphen in the collection name, you will have to use getCollection to retrieve it vs. using the name of the collection directly And, of course, data can be sorted db.getCollections( collection-name ).find( { field1 : value1 }).sort({ fieldX : 1}) //1 = ascending, -1 = descending

  14. A few suggestions/ cautions Some commands are different between monosh and mongodb for nodejs Always check the documentation for each Commands can be complex I have found it useful create a single purpose API (JS method) for a particular situation and test locally before creating a more general purpose RESTful API with flexible parameters Implement server method Implement Jest test Once it works, refactor for flexibility Implement RESTful API Implement client UI

  15. Advanced Mongo

  16. More complex DB work Working within a single set of documents and basic fields is pretty straightforward And this is one of the attractive parts of Mongo (quick startup) Doing more complex queries is another story Personal view: SQL is easier to work with; but Mongo exists, and you work with what you have!

  17. Aggregation let pipeline = [ { This is similar to the GROUP and COUNT(*) in SQL Note the aggregation works as a sequence of operations (pipeline). Specify the field you want to group by and then the operation (count, in this case) Also note you get a cursor back and need to iterate the cursor to get the actual data $group:({ _id: `$${field}`, count: { $sum: 1 }} ) }]; let aggCursor = collection.aggregate(pipeline); let result = []; for await (const doc of aggCursor) { result.push(doc); }

  18. Aggregation more let pipeline = [ { And you can do average $group:({ _id: `$${field}`, averageSalary: { '$avg': `$Gross Pay This Period` }} ) Note how you can handle annoying column names (with spaces or just don t do that!) }]; let aggCursor = collection.aggregate(pipeline);

  19. Even more The pipeline sequences operations So if you want to aggregate and then sort, the array has two stages pipeline = [ { $group:({ _id: '$ Condition', count: { $sum: 1 }} ) }, {$sort: ({ [sortField] : 1})} ]; Note how the variable sortField is treated in []

  20. Joins But what if you want to connect data from more than one collection $lookup: This is the basic tool in mongo for joins However!!! It does the equivalent of an outer left join! (Don t ask me why) The good news You can still get what you need The heart of this is still $lookup, but you use more of the mongo pipeline approach

  21. Its as easy as let aggCursor = db.collection( uni-students').aggregate([{ $lookup: { from: student-grades", //The 2nd collection localField: student_id", //The join field in the 1st collection foreignField: student_id", //The join field in the 2nd collection as: "result" //The output } },{$match: { result: { $ne: [] //The removal of non-matching fields (inner join) } } },{ $addFields: { result: { $arrayElemAt: ["$result",0] //Put remaining fields back into the result } } },{$project: { //Create the joined 'view', picking the desired fields StudentID : $student_id', Student First Name : $Student First Name', Student Last Name': $Student Last Name', Degree: $Degree', Advisor': $Advisor', Year: "$Year", Grade: "$result.grade" //Note this is from the result } }]) let joinedData = []; for await (const doc of aggCursor) { joinedData.push(doc); }

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#