--- marp: true paginate: true math: mathjax theme: default theme: buutti title: Databases --- # Databases # SQL - Structured Query Language - A language to organize and manipulate data in a relational database - Originally developed by IBM in the 70s - Quickly became the most popular database language ``` SELECT id, email FROM users WHERE first_name = 'Teppo'; ``` # Relational Database Management Systems * In relational databases, values are stored in **tables** * Each table has **rows** and **columns** * Data is displayed in a two-dimensional matrix * Values in a table are related to each other * Values can also be related to values in other tables * A relational database management system (RDBMS) is a program that executes queries to relational databases # Relational Database Management Systems ![](imgs/3-databases-with-docker_0.png) [https://db-engines.com/en/ranking](https://db-engines.com/en/ranking) # PostgreSQL A free, open-source, cross-platform relational database management system Emphasizes extensibility and SQL compliance Fully [ACID](https://en.wikipedia.org/wiki/ACID)-compliant (atomicity, consistency, isolation and durability) # Running Postgres in Docker Using the official [Postgres Docker image](https://hub.docker.com/_/postgres) , let's create a locally running Postgres instance. You can choose the values for POSTGRES_PASSWORD and POSTGRES_USER freely. ``` docker run --name my-postgres --env POSTGRES_PASSWORD=pgpass --env POSTGRES_USER=pguser -p 5432:5432 -d postgres:15.2 ``` ![](imgs/3-databases-with-docker_1.png) # Managing the PostgreSQL database with psql *psql is a terminal-based front-end to PostgreSQL. It enables you to type in queries interactively, issue them to PostgreSQL, and see the query results. Alternatively, input can be from a file or from command line arguments. In addition, psql provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks.* https://www.postgresql.org/docs/current/app-psql.html You can use psql directly from the container running the database ``` docker exec -it my-postgres psql -U pguser ``` # Managing the PostgreSQL database with psql * If you have connected to the server as user "pguser", you are by default connected to "pguser" database. * This is a database for user information. Do not use it for storing program data!! * All databases can be listed using the *list* command: `\l` * PostgreSQL uses a default database named "postgres" * Users can connect to a different database with *connect* command `\c`: * `\c ` * A new database is created with CREATE command: * `CREATE DATABASE ;` <-- notice the semicolon! * After creating a new database, you still need to connect to it! * Exit psql with the command `exit` # Exercise 1: Postgres Server Start a local instance of Postgres in Docker Connect to the server using psql Use command `\l` to see what databases there are already created on the server. Create a new database called "sqlpractice". Connect to the newly created database. # pgAdmin Administration and development platform for PostgreSQL Cross-platform, features a web interface Basically a control panel application for your PostgreSQL database A graphical alternative to psql Completely separate from PostgreSQL, so pgAdmin is not required for using PostgreSQL. It is only one example of how you can administer the database. Other options include the forementioned psql and a wide variety of other database management tools, like [DBeaver](https://dbeaver.io/). # Running pgAdmin in Docker Using the official [pgAdmin](https://hub.docker.com/r/dpage/pgadmin4) image, we'll run pgAdmin alongside Postgres ``` docker run --name my-pgadmin -p 5050:80 --env PGADMIN_DEFAULT_EMAIL= --env PGADMIN_DEFAULT_PASSWORD= -d dpage/pgadmin4 ``` ![](imgs/3-databases-with-docker_2.png) # Logging into pgAdmin With pgAdmin running, navigate your web browser to http://localhost:5050 and use the username and password you provided to log in. ![](imgs/3-databases-with-docker_3.png) # PostgreSQL internal IP Address Both **PostgreSQL** and **pgAdmin** are now running in our *local* Docker. To connect pgAdmin to PostgreSQL, we need the IP address used inside Docker. Using Docker's inspect, we'll get Docker's internal IP for my-postgres -container. Since the command produces quite a lot of information, we pipe the result to grep (Linux) or findstr (Windows), to see only the rows that contain the word "IPAddress" ``` docker inspect | grep IPAddress docker inspect | findstr IPAddress ``` In the example output, the IP Address is 172.17.0.2 ![](imgs/3-databases-with-docker_4.png) # Connecting PgAdmin to our DB Now we have all that we need for a connection. In pgAdmin, select "Object" > "Register" > "Server". In the "General" tab, give the server a name that identifies the connection. ![](imgs/3-databases-with-docker_5.png) --- If Object menu is greyed out, click on Servers. ![](imgs/3-databases-with-docker_6.png) --- * In the "Connection" tab, enter * Host name/address: * the PostgreSQL internal Docker address * Port, Username, Password * the values defined when running the PostgreSQL container * Then click Save. You should now see all the databases available on this server. --- ![](imgs/3-databases-with-docker_7.png) --- ![](imgs/3-databases-with-docker_8.png) # Exercise 2: pgAdmin Start a local instance of pgAdmin in Docker Following lecture instructions, connect the pgAdmin to your already running PostgreSQL server. Verify that you can see the database created in the previous assignment. # PostgreSQL: Querying With psql: After connecting to a database, just type a query and hit enter. With pgAdmin: Right-click a database > Query tool Insert a query into the Query Editor and hit Execute (F5) ![](imgs/3-databases-with-docker_9.png) --- ![](imgs/3-databases-with-docker_10.png) # Editing Data with pgAdmin * Tables of Data in a DataBase are found under * Database > Schemas > Tables * Inspect and edit data in pgAdmin by right-clicking a table and selecting View/Edit Data --- ![](imgs/3-databases-with-docker_11.png) --- ![](imgs/3-databases-with-docker_12.png) Individual values in the table can be directly modified by double clicking the value and then editing the value in the visual user interface Save the changes with the Save Data Changes button ![](imgs/3-databases-with-docker_13.png) # Exercise 3: Preparing the Database Using either PgAdmin or PSQL Insert the [provided query](https://gitea.buutti.com/education/academy-assignments/src/branch/master/Databases/initdb.txt) to the database you created previously. Verify that the query has created new tables to your database. # Types of queries - Select - Insert - Delete - Update - Create & Drop # Querying data with SELECT Syntax: ``` SELECT column1, column2, column3 FROM table_name; ``` Examples: - `SELECT full_name, email FROM users;` - `SELECT full_name AS name, email FROM users;` - `SELECT * FROM users;` # Filtering data with WHERE Syntax: ``` SELECT column1, column2 FROM table_name WHERE condition; ``` Text is captured in **single** quotes. In a LIKE condition, % sign acts as a wildcard. IS and IS NOT are also valid comparison operators. Example: - `SELECT full_name FROM users;` - `WHERE full_name = 'Teppo Testaaja';` - `SELECT * FROM books WHERE name LIKE '%rr%';` - `SELECT * FROM books WHERE author IS NOT null;` # Ordering data with ORDER BY Syntax: ``` SELECT column1 FROM table_name ORDER BY column1 ASC; ``` Examples: - `SELECT full_name FROM users ORDER BY full_name ASC;` - `SELECT full_name FROM users ORDER BY full_name DESC` # Combining with JOIN Also known as INNER JOIN Corresponds to intersection from set theory ![](imgs/3-databases-with-docker_14.png) # JOIN examples ``` SELECT users.id, users.full_name, borrows.id, borrows.user_id, borrows.due_date, borrows.returned_at FROM users JOIN borrows ON users.id = borrows.user_id; ``` ``` SELECT U.full_name AS name, B.due_date AS due_date, B.returned_at AS returned_at FROM users AS U JOIN borrows AS B ON U.id = B.user_id; ``` # Exercise 4: Querying the Library Using SQL queries, get 1) All columns from loans that are loaned before 1.3.2000 2) All columns of loans that are returned 3) Columns user.full_name and borrows.borrowed_at of the user with an id of 1 4) Columns book.name, book.release_year and language.name of all books that are released after 1960 # INSERT Syntax ``` INSERT INTO table_name (column1, column2, column3) VALUES (value1, value2, value3); ``` Example - `INSERT INTO users (full_name, email, created_at) VALUES ('Pekka Poistuja', 'pekka.poistuja@buutti.com', NOW());` Since id is not provided, it will be automatically generated. # UPDATE Syntax ``` UPDATE table_nameSET column1 = value1, column2 = value2 WHERE condition; ``` Notice: if a condition is not provided, all rows will be updated! If updating only one row, it is usually best to use id. Example - `UPDATE usersSET email = 'taija.testaaja@gmail.com' WHERE id = 2;` # REMOVE Syntax ``` DELETE FROM table_name WHERE condition; ``` Again, if the condition is not provided, DELETE affects all rows. Before deleting, it is a good practice to execute an equivalent SELECT query to make sure that only the proper rows will be affected. Example: - `SELECT * FROM users WHERE id = 5;` - First use SELECT to confirm the data you're about to delete - `DELETE FROM users WHERE id = 5;` - Then delete the data # Exercise 5: Editing Data Postpone the due date of the loan with an id of 2 by two days in the borrows table Add a couple of new books to the books table Delete one of the loans. # CREATE TABLE Before data can be manipulated, a database and its tables need to be initialized. Syntax ``` CREATE TABLE table_name ( column1 datatype, column2 datatype, … ); ``` Example: ``` CREATE TABLE "users" ( "id" SERIAL PRIMARY KEY, "full_name" varchar NOT NULL, "email" varchar UNIQUE NOT NULL, "created_at" timestamp NOT NULL ); ``` # DROP In order to remove tables or databases, we use a DROP statement. Syntax: ``` DROP TABLE table_name; DROP DATABASE database_name; ``` These statements do not ask for confirmation and there is no undo feature. Take care when using a drop statement. # NoSQL * In addition to SQL databases, there are also NoSQL databases * Many differing definitions, but... * most agree that NoSQL databases store data in a format other than tables * They can still store relational data - just differently * Four different database types: * Document databases * Key-value databases * Wide-column stores * Graph databases * Example database engines include MongoDB, Redis and Cassandra --- Document databases store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or objects, and their structures typically align with objects developers are working with in code. Because of their variety of field value types and powerful query languages, document databases are great for a wide variety of use cases and can be used as a general purpose database. They can horizontally scale-out to accomodate large data volumes. MongoDB is consistently ranked as the world's most popular NoSQL database according to DB-engines and is an example of a document database. For more on document databases, visit What is a Document Database?. --- Key-value databases are a simpler type of database where each item contains keys and values. A value can typically only be retrieved by referencing its value, so learning how to query for a specific key-value pair is typically simple. Key-value databases are great for use cases where you need to store large amounts of data but you don't need to perform complex queries to retrieve it. Common use cases include storing user preferences or caching. Redis and DynanoDB are popular key-value databases. --- Wide-column stores store data in tables, rows, and dynamic columns. Wide-column stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Many consider wide-column stores to be two-dimensional key-value databases. Wide-column stores are great for when you need to store large amounts of data and you can predict what your query patterns will be. Wide-column stores are commonly used for storing Internet of Things data and user profile data. Cassandra and HBase are two of the most popular wide-column stores. --- Graph databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Neo4j and JanusGraph are examples of graph databases. # Object-Relational Mappers * ORMs allow developers to manipulate databases with code instead of SQL queries * For example, for performing CRUD operations on their database * Some popular ORMs: * Hibernate (Java) * EFCore (.NET) * Sequelize (Node.js) * TypeORM (TypeScript) - [documentation](https://typeorm.io/#installation) # PostgreSQL with Node # Local environment Using our previously created, Dockerized Postgres instance, we'll create a Node application to connect to our database. If you have deleted your postgres container, just create a new one with the same command. If your container has shut down (after a reboot for example), you can start it with ``` docker start my-postgres ``` # Preparing our Node application Initialize a new TypeScript application. Install [express](https://www.npmjs.com/package/express) and [PostgreSQL client for Node.JS](https://www.npmjs.com/package/pg), and their respective TypeScript types Install [dotenv](https://www.npmjs.com/package/dotenv), nodemon and ts-node development dependencies # Dotenv Example of `.env` file: ``` PORT=3000 PG_HOST=localhost PG_PORT=5432 PG_USERNAME=pguser PG_PASSWORD=mypassword PG_DATABASE=postgres ``` These values must match the values declared when running the PostgreSQL container. The database must exist as well. Note that the example uses the default "postgres" database, but you can use any database you want. # Dotenv (continued) ```json { "name": "products_api", "version": "1.0.0", "scripts": { "dev": "nodemon -r dotenv/config ./src/index.ts" }, "dependencies": { "express": "^4.18.2", "pg": "^8.9.0" }, "devDependencies": { "@types/express": "^4.17.17", "@types/pg": "^8.6.6", "dotenv": "^16.0.3", "nodemon": "^2.0.20", "ts-node": "^10.9.1", "typescript": "^4.9.5" } } ``` --- Dotenv is usually only used in development, not in production. In professional setting the dotenv config is often preloaded in the development startup script You can require dotenv when running npm run dev_ -r is short for --require # Dotenv and Git * .env files usually contain sensitive data that we do _not_ want to store in Git repositories. * Thus, usually _.env_ file is excluded from the Git repository * Add _.env_ to _.gitignore * If you have auto-generated .gitignore with `npx gitignore Node`, environment files are excluded automatically ![](imgs/3-databases-with-docker_16.png) # Connecting to PostgreSQL Our database file contains functions and configuration for initializing the Postgres pool, creating tables and running queries. At the moment, we have only one query. It is a single-use query that creates a products table to the database if such table does not yet exist. ```ts // db.ts import pg from "pg"; const { PG_HOST, PG_PORT, PG_USERNAME, PG_PASSWORD, PG_DATABASE } = process.env; const pool = new pg.Pool({ host: PG_HOST, port: Number(PG_PORT), user: PG_USERNAME, password: PG_PASSWORD, database: PG_DATABASE, }); const executeQuery = async (query: string, parameters?: Array) => { const client = await pool.connect(); try { const result = await client.query(query, parameters); return result; } catch (error: any) { console.error(error.stack); error.name = "dbError"; throw error; } finally { client.release(); } }; export const createProductsTable = async () => { const query = `CREATE TABLE IF NOT EXISTS "products" ( "id" SERIAL PRIMARY KEY, "name" VARCHAR(100) NOT NULL, "price" REAL NOT NULL )`; await executeQuery(query); console.log("Products table initialized"); }; ``` --- At the moment our `index.ts` does nothing but creates a single table for our database and launches express server. It doesn't even have any endpoints, so it's not much of a server yet. ```ts import express from "express"; import { createProductsTable } from "./db"; const server = express(); createProductsTable(); const { PORT } = process.env; server.listen(PORT, () => { console.log("Products API listening to port", PORT); }); ``` # Launching the application Let's use our predefined `npm run dev` ![](imgs/3-databases-with-docker_17.png) --- …And check with psql that our application succeeds in connecting to the database and creating the table. Epic success! ![](imgs/3-databases-with-docker_18.png) # Exercise 6: Node & PostgreSQL Following the lecture example, create an Express server that connects to your local PostgreSQL instance. The database information should be stored in environment variables. When the server starts, it should create a product table with three columns: id (serial, primary key), name (varchar) and price (real). # Creating Queries * Next, we will create an actual CRUD API for communicating with the database. * For that we need endpoints for creating, reading, updating and deleting products. * All of these need their own queries. # Using queries * We'll use the pre-made executeQuery() function for querying the database from a few slides back * It takes in two arguments: * the actual query string * an optional parameters array * When supplying parameters, the query string should have placeholders $1, $2, etc * These will be replaced with the contents of the parameters array. # Parameterized queries example ![](imgs/3-databases-with-docker_19.png) When running executeQuery(query, parameters) with above values defined, the query would be parsed as ``` SELECT * FROM cats WHERE color = 'yellow' and age > 10; ``` # Why not just use String templating? …Because of [SQL injections](https://fi.wikipedia.org/wiki/SQL-injektio) . Always use database library's built-in parameterization! _NEVER DO THIS!!!_ ![](imgs/3-databases-with-docker_20.png) ![](imgs/3-databases-with-docker_21.png) # Creating queries We will create a Data Access Object, `dao.ts` that will handle interacting with our database. The idea is that we want to just tell our DAO what we want done (e.g. "add this customer to the database") and the DAO will handle the details of that action. The DAO will also return possible additional information that was created during the action. --- Our insertProduct function generates a new, unique ID for the product using [uuid](https://www.npmjs.com/package/uuid) constructs a parameters array containing said id, the name of the product and the price of the product executes the query using db.executeQuery method returns the database result object ![](imgs/3-databases-with-docker_22.png) ---
![](imgs/3-databases-with-docker_23.png)
![](imgs/3-databases-with-docker_24.png)
--- The rest of the DAO operations work in similar fashion. The router that declares the endpoints, uses the DAO to interact with the database. # Testing the API
Now we can use Insomnia to verify that all the endpoints work as expected. We can also use psql to observe the changes in the database ![](imgs/3-databases-with-docker_26.png)
![](imgs/3-databases-with-docker_25.png)
# Exercise 7: Creating Queries Continue following the lecture example. Create a router and a database access object to handle - Creating a product - Reading a product - Updating a product - Deleting a product - Listing all products # Dockerized PostgreSQL App # Setting Environment Variables Docker has two kinds of environment variables: run-time and build-time. In this scenario we want to set our environment variables at _build time_. This means that the Docker image will contain all the environment variable information, including sensitive things like passwords. This might be an issue in some scenarios. In those cases the environment variables need to be set at _run time_. In the Dockerfile we set the build-time values by setting ARG parameters. Then we use these values to set the run-time environment variables by setting ENV parameters. More information: [https://vsupalov.com/docker-arg-env-variable-guide/](https://vsupalov.com/docker-arg-env-variable-guide/) --- When the ARGs and ENVs have been set in the Dockerfile, we provide the ARG values when building the Docker image by using_--build-arg =_ flags. To build an image with these parameters, we'd use something like ``` docker build --build-arg PORT=3000 --build-arg PG_HOST=https://my.postgres.server --build-arg PG_PORT=5432 --build-arg PG_USERNAME=pguser --build-arg PG_PASSWORD=pgpass --build-arg PG_DATABASE=my-database -t my-app . ``` --- And include the build-arg parameters in our Dockerfile with them mapped to environment variables. ``` ARG PORT ARG PG_HOST ARG PG_PORT ARG PG_USERNAME ARG PG_PASSWORD ARG PG_DATABASE ENV PORT=${PORT} ENV PG_HOST=${PG_HOST} ENV PG_PORT=${PG_PORT} ENV PG_USERNAME=${PG_USERNAME} ENV PG_PASSWORD=${PG_PASSWORD} ENV PG_DATABASE=${PG_DATABASE} ``` [Docker documentation here!](https://www.docker.com/blog/how-to-use-the-postgres-docker-official-image/) # Exercise 8: Dockerized PG App Dockerize the application you have built. Build the docker image, run the app and test that it works using insomnia/postman. Remember that when you run the application on your local Docker, both the app and the database are in the same Docker network, so you have to check the database IP address just like when running pgAdmin.