MongoDB vs. Relational Databases: Myth vs. Reality in NoSQL Design

Slide Note
Embed
Share

MongoDB is a document-oriented database that challenges traditional relational databases by offering flexible schema designs. While myths suggest no need for schema design or relationships in MongoDB, the reality emphasizes the importance of thoughtful schema design and choice in data relationships. The discussion between embedding and referencing data in MongoDB schema design presents advantages and limitations for each approach.


Uploaded on Jul 16, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. MongoDB A document oriented database

  2. Recap Document oriented DB (vs. Relational) [aka noSQL] Collections instead of tables Relationships not required between collections Consistency between collections objects not required Data stored (invisibly) as text files Data manipulation via Mongo API

  3. Myth vs. Reality of noSQL DBs While it is true that noSQL/ Mongo does not require a rigid schema and multiple tables/ formal relationships are not required Myth No design is needed Reality You STILL need a schema design in some ways more important now, since it becomes too easy to create random data Myth No relationships needed Reality It s a choice. You CAN embed all data in each document, or you can refer to other documents

  4. Myth vs. Reality of noSQL DBs Reality Lists of items are acceptable Duplication of data is acceptable Limits There is a 16MB limit in document size. If you embed too much Sooo use references

  5. Embedding vs. Referencing MongoDB schema design actually comes down to only two choices for every piece of data. You can either embed that data directly or reference another piece of data using the $lookup operator (similar to a JOIN). Let's look at the pros and cons of using each option in your schema. Embedding Advantages You can retrieve all relevant information in a single query. Avoid implementing joins in application code or using $lookup. Update related information as a single atomic operation. By default, all CRUD operations on a single document are ACID compliant. However, if you need a transaction across multiple operations, you can use the transaction operator. Though transactions are available starting 4.0, however, I should add that it's an anti-pattern to be overly reliant on using them in your application. Limitations Large documents mean more overhead if most fields are not relevant. You can increase query performance by limiting the size of the documents that you are sending over the wire for each query. There is a 16-MB document size limit in MongoDB. If you are embedding too much data inside a single document, you could potentially hit this limit. MongoDB Schema Design Best Practices | MongoDB

  6. Referencing Okay, so the other option for designing our schema is referencing another document using a document's unique object ID and connecting them together using the $lookup operator. Referencing works similarly as the JOIN operator in an SQL query. It allows us to split up data to make more efficient and scalable queries, yet maintain relationships between data. Advantages By splitting up data, you will have smaller documents. Less likely to reach 16-MB-per-document limit. Infrequently accessed information not needed on every query. Reduce the amount of duplication of data. However, it's important to note that data duplication should not be avoided if it results in a better schema. Limitations In order to retrieve all the data in the referenced documents, a minimum of two queries or $lookup required to retrieve all the information.

  7. Working with MongoDB We will be using the native mongodb drive from Nodejs Quick Reference - Node.js Driver v6.8 (mongodb.com) We ll look at a few scenarios using this library Creating a collection (aka a table) Adding a document (aka a row) to a collection Getting a collection Getting data Querying multiple collections

  8. MongoDB perennial question Should I be in a relationship? What kind of relationship should I be in? Answer It depends Rule(s) of thumb If it s a one-to-one: Keep data in the object If it s one-to-many How many? (One to a few? One to dozens, hundreds, thousands ) Does the other data need to exist separately? If both answers are no , then embed the data If the other data needs to exist separately then use a reference Anything else it depends (but if you have many, many probably a reference)

  9. mongosh The command shell for mongo (akin to psql) swen343db> db['Food-HR'].find() [ > mongosh test> use swen343db switched to db swen343db swen343db> show collections auto-models-v2 auto-teams-v2 autos Food-HR Food-IT Food-payroll . swen343db> db['Food-HR'].count() 50 { _id: ObjectId('667dd6330243dd1dbbc0e385'), 'First name': 'John', 'Last name': 'Doe', Email: 'johndoe@example.com', 'Home address': '123 Main St', 'Job title': 'Manager', Salary: '60000', Department: 'Management', 'Hire date': '2020-03-15' }, {

  10. RDB vs Doc DB If your mindset is still in Relational DB, this might be a useful reference SQL to MongoDB Mapping Chart - MongoDB Manual v7.0 e.g. SELECT * FROM people db.people.find() SELECT COUNT(*) FROM people db.people.count() INSERT INTO people(user_id, age, status) VALUES ("bcd001", 45, "A") db.people.insertOne( { user_id: "bcd001", age: 45, status: "A" } )

  11. Primary Key? There isn t really the same concept (no rows in mongo), however, each document is assigned a system generated unique id per document This is the field _id (becomes part of the document) You can override the system to create your own unique id not recommended Generating Globally Unique Identifiers for Use with MongoDB | MongoDB Instead, let the system create the _id value, and use it as a primary key type identifier In JS, this is an object, and the mongodb module has its own ObjectId type for this import { ObjectId } from "mongodb"; And then you can do things like const filter = { _id: ObjectId.createFromHexString(idValue) };

  12. Updating data Mongodb has commands to help manage data Update a document (the $set is the mongo command with the data as the param) const updateDoc = { $set: data };//Update the whole row const result = await collection.updateOne(filter, updateDoc); Create a new document const result = await collection.insertOne(data);

  13. Searching for data db.getCollections( collection-name ).find( { field1 : value1 }) //Any number of parameters in the find clause //If you have a hyphen in the collection name, you will have to use getCollection to retrieve it vs. using the name of the collection directly And, of course, data can be sorted db.getCollections( collection-name ).find( { field1 : value1 }).sort({ fieldX : 1}) //1 = ascending, -1 = descending

  14. A few suggestions/ cautions Some commands are different between monosh and mongodb for nodejs Always check the documentation for each Commands can be complex I have found it useful create a single purpose API (JS method) for a particular situation and test locally before creating a more general purpose RESTful API with flexible parameters Implement server method Implement Jest test Once it works, refactor for flexibility Implement RESTful API Implement client UI

  15. Advanced Mongo

  16. More complex DB work Working within a single set of documents and basic fields is pretty straightforward And this is one of the attractive parts of Mongo (quick startup) Doing more complex queries is another story Personal view: SQL is easier to work with; but Mongo exists, and you work with what you have!

  17. Aggregation let pipeline = [ { This is similar to the GROUP and COUNT(*) in SQL Note the aggregation works as a sequence of operations (pipeline). Specify the field you want to group by and then the operation (count, in this case) Also note you get a cursor back and need to iterate the cursor to get the actual data $group:({ _id: `$${field}`, count: { $sum: 1 }} ) }]; let aggCursor = collection.aggregate(pipeline); let result = []; for await (const doc of aggCursor) { result.push(doc); }

  18. Aggregation more let pipeline = [ { And you can do average $group:({ _id: `$${field}`, averageSalary: { '$avg': `$Gross Pay This Period` }} ) Note how you can handle annoying column names (with spaces or just don t do that!) }]; let aggCursor = collection.aggregate(pipeline);

  19. Even more The pipeline sequences operations So if you want to aggregate and then sort, the array has two stages pipeline = [ { $group:({ _id: '$ Condition', count: { $sum: 1 }} ) }, {$sort: ({ [sortField] : 1})} ]; Note how the variable sortField is treated in []

  20. Joins But what if you want to connect data from more than one collection $lookup: This is the basic tool in mongo for joins However!!! It does the equivalent of an outer left join! (Don t ask me why) The good news You can still get what you need The heart of this is still $lookup, but you use more of the mongo pipeline approach

  21. Its as easy as let aggCursor = db.collection( uni-students').aggregate([{ $lookup: { from: student-grades", //The 2nd collection localField: student_id", //The join field in the 1st collection foreignField: student_id", //The join field in the 2nd collection as: "result" //The output } },{$match: { result: { $ne: [] //The removal of non-matching fields (inner join) } } },{ $addFields: { result: { $arrayElemAt: ["$result",0] //Put remaining fields back into the result } } },{$project: { //Create the joined 'view', picking the desired fields StudentID : $student_id', Student First Name : $Student First Name', Student Last Name': $Student Last Name', Degree: $Degree', Advisor': $Advisor', Year: "$Year", Grade: "$result.grade" //Note this is from the result } }]) let joinedData = []; for await (const doc of aggCursor) { joinedData.push(doc); }

Related