
Accessing Databases and Managing Data: Steps and Tools
Delve into the world of databases with insights on database management, common operations, servers, clients, and practical demos. Learn how to connect, query, and access databases using tools like DBeaver, MySQL, and Python for efficient data handling in data science projects.
Uploaded on | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Data Management and Database Access
Outline Understanding Databases What is a database? Types of databases Common database operations for data science Database servers and clients Demo - Connecting to a database server from a client - MySQL and DBeaver Demo - Accessing the employees database on a MySQL server from DBeaver Demo - Querying the employees database with SQL Accessing databases programmatically Demo - Connecting to a MySQL database from Python Demo - Reading data from the employees table with Python Getting the information you need from a database Demo - Querying the employees database for information Summary Observations, tips & tricks
Understanding Databases What is a database? a structured collection of data organized for quick retrieval and updating of pieces of data Types of databases relational - organized as relationships between entities - e.g. employees, departments, tables, rows, columns SQL noSQL - organized as documents and collections objects in a JSON-like format Common database operations Retrieving (most common for data science), adding, modifying, and deleting data
Database Servers and Clients Database servers and clients Server - where the database software is installed and the data is stored - E.g. MySQL server Client - any software that can talk to the server to access the data - E.g. MySQL shell, DBeaver Popular Relational (SQL) databases MySQL, MSSQL, Oracle, Postgres Popular non-relational NoSQL databases MongoDB, Redis, Cassandra
Getting Ready - Database Client and Server DBeaver Database Client: https://dbeaver.io/download/ MySQL Server Host URL: dsmethods-sample-database.ckpjakz3adxb.us-east- 2.rds.amazonaws.com Port: 3306 Username: dsmethods_student Password: dsmethods_student database: employees
Demo - Connecting to a Database using DBeaver MySQL Server DBeaver - A GUI-based Database client software Employees database
Demo - Exploring the Employees Database Tables in the employees database Relationships between tables
Demo - Querying the Employees Database with SQL Sample queries How many employees are there? (Difficulty - easy) How many departments are there? (Difficulty - easy) How many unique first names? (Difficulty - medium) Who is the highest paid employee? (Difficulty - hard)
Accessing databases programmatically Why programmatically? GUI-based and command line clients are mainly used for db administration what if we want to do complex data analysis? how about visualizing the data?
Demo - Connecting to MySQL from Python The Python MySQL Connector A Python module to connect to and interact with MySQL databases.
Summary Structured vs unstructured data relational databases - SQL non-relational databases - NoSQL Queries and the SELECT statement MySQL, DBeaver, and Python MySQL connector
Useful Links DBeaver: https://dbeaver.io/download/ Python: https://www.python.org/downloads/ Jupyter Notebook: https://jupyter.org/install MySQL Connector/Python: https://dev.mysql.com/doc/connector-python/en/ SQLAlchemy: https://www.sqlalchemy.org/
Optional - Observations, tips and tricks - - - Are databases an efficient storage for data science projects? Database tables vs JSON The Intersection between Software Engineering and Data Science - software testing - start with writing tests - computational complexity - time and space constraints - only read what you need - automation - fire and forget - error handling