# Databases # SQL * __Structured Query Language__ * A language __ __ to organize and manipulate data in a relational database * Originally developed by IBM in the 70s * Quickly became the most popular database language * SELECT id, email * FROM users * WHERE first\_name = 'Teppo'; # Relational Database Management Systems * In relational databases, values are stored in __tables__ * Each table has __rows __ and __columns__ * Data is displayed in a two-dimensional matrix * Values in a table are related to each other * Values can also be related to values in other tables * A relational database management system (RDBMS) is a program that executes queries to relational databases ![](imgs/3-databases-with-docker_0.png) [https://db-engines.com/en/ranking](https://db-engines.com/en/ranking) # PostgreSQL A free, open-source, cross-platform relational database management system Emphasizes extensibility and SQL compliance Fully [ACID](https://en.wikipedia.org/wiki/ACID) -compliant (atomicity, consistency, isolation and durability) # pgAdmin Administration and development platform for PostgreSQL Cross-platform, features a web interface Basically a control panel application for your PostgreSQL database # Running Postgres in Docker Using the official [Postgres Docker image](https://hub.docker.com/_/postgres) , let's create a locally running Postgres instance. You can choose the values for POSTGRES\_PASSWORD and POSTGRES\_USER freely. __docker run --name my-postgres --env POSTGRES\_PASSWORD=pgpass --env POSTGRES\_USER=pguser -p 5432:5432 -d postgres:15.2__ ![](imgs/3-databases-with-docker_1.png) # PostgreSQL: Using psql If you have PostgreSQL installed Win search _psql _ and open _SQL Shell (psql) Press enter four times to insert default values, or values matching to your configuration, and insert your password for the fifth prompt If you do not have PostgreSQL installed, you can use psql directly from the container running the database docker exec -it my-postgres psql -U pguser * If you have connected to the server as user "pguser", you are by default connected to "pguser" database. * This is a database for user information. * You should not use it for storing program data!! * All databases can be listed using the _list _ command: __ \\l __ * PostgreSQL uses a default database named "postgres" * Users can connect to a different database with _connect _ command __\\c__ : * __\\c __ * A new database is created with CREATE command: * __CREATE DATABASE ;__ * Do not forget the __;__ at the end * After creating a new database, you still need to connect to it! * Exit psql with the command __exit__ # Exercise 1: Postgres Server Start a local instance of Postgres in Docker Connect to the server using psql Use command \\l to see what databases there are already created on the server. Create a new database called "sqlpractice". Connect to the newly created database. # Running pgAdmin in Docker Using the official [pgAdmin](https://hub.docker.com/r/dpage/pgadmin4) image, we'll run pgAdmin alongside Postgres __docker run --name my-pgadmin -p 5050:80 __ __-e PGADMIN\_DEFAULT\_EMAIL= __ __-e PGADMIN\_DEFAULT\_PASSWORD= __ __-d dpage/pgadmin4__ ![](imgs/3-databases-with-docker_2.png) # Logging into pgAdmin With pgAdmin running, navigate your web browser to [http://localhost:5050](http://localhost:5050) and use the username and password you provided to log in. ![](imgs/3-databases-with-docker_3.png) # PostgreSQL internal IP Address Both __PostgreSQL__ and __pgAdmin __ are now running in our _local_ Docker. To connect pgAdmin to PostgreSQL, we need the IP address used _inside_ Docker. Using Docker's inspect, we'll get Docker's internal IP for my-postgres -container. Since the command produces quite a lot of information, we pipe the result to grep, to see only the rows that contain the word "IPAddress" __docker inspect | grep IPAddress __ __<- unix__ __docker inspect | findstr IPAddress __ __<- windows__ In the example output, the IP Address is 172.17.0.2 ![](imgs/3-databases-with-docker_4.png) # Connecting PgAdmin to our DB Now we have all that we need for a connection. In pgAdmin, select "Object" > "Register" > "Server". In the "General" tab, give the server a name that identifies the connection. ![](imgs/3-databases-with-docker_5.png) If Object menu is greyed out, click on Servers. ![](imgs/3-databases-with-docker_6.png) * In the "Connection" tab, enter * Host name/address: * the PostgreSQL internal Docker address * Port, Username, Password * the values defined when running the PostgreSQL container * Then click Save. You should now see all the databases available on this server. ![](imgs/3-databases-with-docker_7.png) ![](imgs/3-databases-with-docker_8.png) # Exercise 2: pgAdmin Start a local instance of pgAdmin in Docker Following lecture instructions, connect the pgAdmin to your already running PostgreSQL server. Verify that you can see the database created in the previous assignment. # PostgreSQL: Querying With psql: After connecting to a database, just type a query and hit enter. With pgAdmin: Right-click a database _ _ > _Query tool_ Insert a query into the _Query Editor_ and hit _Execute _ (F5) ![](imgs/3-databases-with-docker_9.png) ![](imgs/3-databases-with-docker_10.png) --- # Editing Data with pgAdmin * Tables of Data in a DataBase are found under * Database > Schemas > Tables * Inspect and edit data in pgAdmin by right-clicking a table and selecting _View/Edit Data_ ![](imgs/3-databases-with-docker_11.png) ![](imgs/3-databases-with-docker_12.png) Individual values in the table can be directly modified by double clicking the value and then editing the value in the visual user interface Save the changes with the _Save Data Changes_ button ![](imgs/3-databases-with-docker_13.png) # Exercise 3: Preparing the Database Using either PgAdmin or PSQL Insert the [provided query](https://gitea.buutti.com/education/academy-assignments/src/branch/master/Databases/initdb.txt) to the database you created previously. Verify that the query has created new tables to your database. # Types of queries Select Insert Delete Update Create & Drop # Querying data with SELECT Syntax: __SELECT column1, column2, column3 FROM table\_name;__ Examples: SELECT full\_name, email FROM users; SELECT full\_name AS name, email FROM users; SELECT * FROM users; # Filtering data with WHERE Syntax: __SELECT column1, column2 FROM table\_name WHERE condition;__ Text is captured in _single quotes_ . In a LIKE condition, _%_ sign acts as a wildcard. IS and IS NOT are also valid comparison operators. Example: SELECT full\_name FROM users WHERE full\_name = 'Teppo Testaaja'; SELECT * FROM books WHERE name LIKE '%rr%'; SELECT * FROM books WHERE author IS NOT null; # Ordering data with ORDER BY Syntax: __SELECT column1 FROM table\_name ORDER BY column1 ASC;__ Examples: SELECT full\_name FROM users ORDER BY full\_name ASC; SELECT full\_name FROM users ORDER BY full\_name DESC # Combining with JOIN Also known as INNER JOIN Corresponds to intersection from set theory ![](imgs/3-databases-with-docker_14.png) # JOIN examples SELECT users.id, users.full\_name, borrows.id, borrows.user\_id, borrows.due\_date, borrows.returned\_at FROM users JOIN borrows ON users.id = borrows.user\_id; SELECT U.full\_name AS name, B.due\_date AS due\_date, B.returned\_at AS returned\_at FROM users AS U JOIN borrows AS B ON U.id = B.user\_id; # Combining with LEFT JOIN Also known as LEFT OUTER JOIN Example: SELECT U.full\_name AS name, B.due\_date AS due\_date, B.returned\_at AS returned\_at FROM users AS U LEFT JOIN borrows AS B ON U.id = B.user\_id; ![](imgs/3-databases-with-docker_15.png) # Exercise 4: Querying the Library Using SQL queries, get All columns from loans that are loaned before 1.3.2000 All columns of loans that are returned Columns user.full\_name and borrows.borrowed\_at of the user with an id of 1 Columns book.name, book.release\_year and language.name of all books that are released after 1960 # INSERT Syntax __INSERT INTO table\_name (column1, column2, column3)_ __VALUES (value1, value2, value3);__ Example INSERT INTO users (full\_name, email, created\_at) VALUES ('Pekka Poistuja', 'pekka.poistuja@buutti.com', NOW()); Since id is not provided, it will be automatically generated. # UPDATE Syntax __UPDATE table\_name__ __SET column1 = value1, column2 = value2__ __WHERE condition;__ __Notice:__ if a condition is not provided, all rows will be updated!If updating only one row, it is usually best to use id. Example UPDATE usersSET email = 'taija.testaaja@gmail.com'WHERE id = 2; # REMOVE Syntax __DELETE FROM table\_name WHERE condition;__ Again, if the _condition_ is not provided, DELETE affects _all_ rows. Before deleting, it is a good practice to execute an equivalent SELECT query to make sure that only the proper rows will be affected. Example: SELECT __*__ FROM __users __ WHERE __id = 5;__ DELETE FROM users WHERE id = 5; # Exercise 5: Editing Data Postpone the due date of the loan with an id of 2 by two days in the _borrows_ _ _ table Add a couple of new books to the _books _ table Delete one of the loans. # CREATE TABLE Before data can be manipulated, a database and its tables need to be initialized. Syntax __CREATE TABLE table\_name (__ __ column1 datatype,__ __ column2 datatype,__ __ …__ __);__ Example:CREATE TABLE "users" ( "id" SERIAL PRIMARY KEY, "full\_name" varchar NOT NULL, "email" varchar UNIQUE NOT NULL, "created\_at" timestamp NOT NULL ); # DROP In order to remove tables or databases, we use a DROP statement __DROP TABLE table\_name;__ __DROP DATABASE database\_name;__ These statements do not ask for confirmation and there is no undo feature. Take care when using a drop statement. # NoSQL * In addition to SQL databases, there are also NoSQL databases * Many differing definitions, but... * most agree that NoSQL databases store data in a format other than tables * They can still store relational data - just differently * Four different database types: * Document databases * Key-value databases * Wide-column stores * Graph databases * Example database engines include MongoDB, Redis and Cassandra --- Document databases store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or objects, and their structures typically align with objects developers are working with in code. Because of their variety of field value types and powerful query languages, document databases are great for a wide variety of use cases and can be used as a general purpose database. They can horizontally scale-out to accomodate large data volumes. MongoDB is consistently ranked as the world's most popular NoSQL database according to DB-engines and is an example of a document database. For more on document databases, visit What is a Document Database?. Key-value databases are a simpler type of database where each item contains keys and values. A value can typically only be retrieved by referencing its value, so learning how to query for a specific key-value pair is typically simple. Key-value databases are great for use cases where you need to store large amounts of data but you don't need to perform complex queries to retrieve it. Common use cases include storing user preferences or caching. Redis and DynanoDB are popular key-value databases. Wide-column stores store data in tables, rows, and dynamic columns. Wide-column stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Many consider wide-column stores to be two-dimensional key-value databases. Wide-column stores are great for when you need to store large amounts of data and you can predict what your query patterns will be. Wide-column stores are commonly used for storing Internet of Things data and user profile data. Cassandra and HBase are two of the most popular wide-column stores. Graph databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Neo4j and JanusGraph are examples of graph databases. # Object-Relational Mappers * ORMs allow developers to manipulate databases with code instead of SQL queries * For example, for performing CRUD operations on their database * Some popular ORMs: * Hibernate (Java) * EFCore (.NET) * Sequelize (Node.js) * TypeORM (TypeScript) * We recommend using TypeORM instead of writing SQL yourself. You can find a good step-by-step guide within [their own documentation](https://typeorm.io/#installation) . # PostgreSQL with Node # Local environment Using our previously created, Dockerized Postgres instance, we'll create a Node.Js application to connect to our database. If you have deleted your postgres container, a new one can be created with a following command: __docker run --name my-postgres -e POSTGRES\_PASSWORD=mypassword -e POSTGRES\_USER=pguser -e POSTGRES\_DB=mydb -p 5432:5432 -d postgres:15.2__ With a previously created container, this command will start the container: __docker start my-postgres__ # Preparing our Node application Initialize the application with: __npm init -y__ Install [express](https://www.npmjs.com/package/express) and [PostgreSQL client for Node.JS](https://www.npmjs.com/package/pg) __ npm install express pg__ Install [dotenv](https://www.npmjs.com/package/dotenv) and nodemon as a development dependencies __npm install –-save-dev dotenv nodemon__ If using TypeScript, install types for express and pg, plus ts-node for running nodemon. __npm install --save-dev @types/express @types/pg ts-node__ # Dotenv Example of .env file: In development, it's customary to store ports, database login info, etc. as _environment variables_ in a separate .env file Previously installed package _dotenv_ is used to load variables from .env to the main program dotenv can be imported with PORT=3000 PG\_HOST=localhost PG\_PORT=5432 PG\_USERNAME=pguser PG\_PASSWORD=mypassword PG\_DATABASE=postgres These values must match the values declared when running the PostgreSQL container. The database must exist as well. Note that the example uses the default "postgres" database, but you can use any database you want. import dotenv from 'dotenv' dotenv.config() # Dotenv (continued) { "name": "products\_api", "version": "1.0.0", "scripts": { "dev": "nodemon -r dotenv/config ./src/index.ts" }, "dependencies": { "express": "^4.18.2", "pg": "^8.9.0" }, "devDependencies": { "@types/express": "^4.17.17", "@types/pg": "^8.6.6", "dotenv": "^16.0.3", "nodemon": "^2.0.20", "ts-node": "^10.9.1", "typescript": "^4.9.5" } } Dotenv is usually only used in development, not in production. In professional setting the dotenv config is often preloaded in the development startup script You can require dotenv when running _npm run dev_ -r is short for --require # Dotenv and Git * .env files usually contain sensitive data that we do __not__ want to store in Git repositories. * Thus, usually __.env__ file is excluded from the Git repository * Add __.env__ to __.gitignore__ * If you have auto-generated .gitignore with npx gitignore Node, environment files are excluded automatically ![](imgs/3-databases-with-docker_16.png) # Connecting to PostgreSQL Our database file contains functions and configuration for initializing the Postgres pool, creating tables and running queries. At the moment, we have only one query. It is a single-use query that creates a products table to the database if such table does not yet exist. // db.ts import pg from "pg"; const { PG\_HOST, PG\_PORT, PG\_USERNAME, PG\_PASSWORD, PG\_DATABASE } = process.env; const pool = new pg.Pool({ host: PG\_HOST, port: Number(PG\_PORT), user: PG\_USERNAME, password: String(PG\_PASSWORD), database: PG\_DATABASE, }); const executeQuery = async (query: string, parameters?: Array) => { const client = await pool.connect(); try { const result = await client.query(query, parameters); return result; } catch (error: any) { console.error(error.stack); error.name = "dbError"; throw error; } finally { client.release(); } }; export const createProductsTable = async () => { const query = \` CREATE TABLE IF NOT EXISTS "products" ( "id" SERIAL PRIMARY KEY, "name" VARCHAR(100) NOT NULL, "price" REAL NOT NULL )\`; await executeQuery(query); console.log("Products table initialized"); }; At the moment our __index.js__ does nothing but creates a single table for our database and launches express server. It doesn't even have any endpoints, so it's not much of a server yet. import express from "express"; import { createProductsTable } from "./db"; const server = express(); createProductsTable(); const { PORT } = process.env; server.listen(PORT, () => { console.log("Products API listening to port", PORT); }); # Launching the application Let's use our predefined _npm run dev_ ![](imgs/3-databases-with-docker_17.png) …And check with psql that our application succeeds in connecting to the database and creating the table. Epic success! ![](imgs/3-databases-with-docker_18.png) # Exercise 6: Node & PostgreSQL Following the lecture example, create an Express server that connects to your local PostgreSQL instance. The database information should be stored in environment variables. When the server starts, it should create a product table with three columns: id (serial, primary key), name (varchar) and price (real). # Creating Queries * Next, we will create an actual CRUD API for communicating with the database. * For that we need endpoints for creating, reading, updating and deleting products. * All of these need their own queries. # Using queries * We'll use the pre-made executeQuery() function for querying the database from a few slides back * It takes in two arguments: * the actual query string * an optional parameters array * When supplying parameters, the query string should have placeholders $1, $2, etc * These will be replaced with the contents of the parameters array. # Parameterized queries example When running executeQuery(query, parameters) with above values defined, the query would be parsed as ![](imgs/3-databases-with-docker_19.png) SELECT * FROM cats WHERE color = 'yellow' and age > 10; # Why not just use String templating? …Because of [SQL injections](https://fi.wikipedia.org/wiki/SQL-injektio) . Always use database library's built-in parameterization! _NEVER DO THIS!!!_ ![](imgs/3-databases-with-docker_20.png) ![](imgs/3-databases-with-docker_21.png) # Creating queries We will create a Data Access Object, __dao.js__ that will handle interacting with our database. The idea is that we want to just tell our DAO what we want done (e.g. "add this customer to the database") and the DAO will handle the details of that action. The DAO will also return possible additional information that was created during the action. Our _insertProduct _ function generates a new, unique ID for the product using [uuid](https://www.npmjs.com/package/uuid) constructs a parameters array containing said id, the name of the product and the price of the product executes the query using _db.executeQuery_ method returns the database result object ![](imgs/3-databases-with-docker_22.png) ![](imgs/3-databases-with-docker_23.png) ![](imgs/3-databases-with-docker_24.png) The rest of the DAO operations work in similar fashion. The router that declares the endpoints, uses the DAO to interact with the database. # Testing the API ![](imgs/3-databases-with-docker_25.png) Now we can use Insomnia to verify that all the endpoints work as expected. We can also use psql to observe the changes in the database ![](imgs/3-databases-with-docker_26.png) # Exercise 7: Creating Queries Continue following the lecture example. Create a router and a database access object to handle Creating a product Reading a product Updating a product Deleting a product Listing all products # Dockerized PostgreSQL App # Setting Environment Variables Docker has two kinds of environment variables: run-time and build-time. In this scenario we want to set our environment variables at __build time__ . This means that _the Docker image will contain all the environment variable information_ , including sensitive things like passwords. This might be an issue in some scenarios. In those cases the environment variables need to be set at __run time__ . In the Dockerfile we set the build-time values by setting ARG parameters. Then we use these values to set the run-time environment variables by setting ENV parameters. More information: [https://vsupalov.com/docker-arg-env-variable-guide/](https://vsupalov.com/docker-arg-env-variable-guide/) When the ARGs and ENVs have been set in the Dockerfile, we provide the ARG values when building the Docker image by using __--build-arg =__ flags. To build an image with these parameters, we'd use something like __docker build __ __--build-arg PORT=3000 __ __--build-arg PG\_HOST=https://my.postgres.server__ __--build-arg PG\_PORT=5432 __ __--build-arg PG\_USERNAME=pguser __ __--build-arg PG\_PASSWORD=pgpass __ __--build-arg PG\_DATABASE=my-database __ __-t my-app .__ ARG PORT ARG PG\_HOST ARG PG\_PORT ARG PG\_USERNAME ARG PG\_PASSWORD ARG PG\_DATABASE ENV PORT=${PORT} ENV PG\_HOST=${PG\_HOST} ENV PG\_PORT=${PG\_PORT} ENV PG\_USERNAME=${PG\_USERNAME} ENV PG\_PASSWORD=${PG\_PASSWORD} ENV PG\_DATABASE=${PG\_DATABASE} [Docker documentation here!](https://www.docker.com/blog/how-to-use-the-postgres-docker-official-image/) # Exercise 8: Dockerized PG App Dockerize the application you have built. Build the docker image, run the app and test that it works using insomnia/postman. Remember that when you run the application on your local Docker, both the app and the database are in the same Docker network, so you have to check the database IP address just like when running pgAdmin.