This a plain file that contains the commands that can be used to replicate the
hands-on example "PSQL Query Plan and Indexes".

Commands are wrapped in a blocks of ``` (according to Markdown:
https://en.wikipedia.org/wiki/Markdown; please use a Markdown viewer for proper
readability) and on top the description of the code is underline with `=======`.

## (0) Preparation
================================================================================

Follow the instructions of Assignment 1 to set up the PostgreSQL database such
that the queries Q1-Q4 (cf. Assignment 1) can be executed.

Link to the description of Assignment 1:
- https://dbresearch.uni-salzburg.at/teaching/2023ss/dim/assignment1.pdf

Then, open a psql terminal.

## (1) Initial Query Plan
================================================================================
```
EXPLAIN SELECT * FROM titles WHERE startyear = 1990;
```

We observe that the optimizer chooses to perform a sequential scan on the table
"titles" while checking the condition "staryear = 1990" for each single tuple in
this table:

```
                           QUERY PLAN                            
-----------------------------------------------------------------
 Seq Scan on titles  (cost=0.00..168978.50 rows=12808 width=113)
   Filter: (startyear = 1990)
(2 rows)
```

## (2) Create an Index (Shortcut)
================================================================================
```
CREATE INDEX startyear_idx ON titles(startyear);
```

This command will take some time and effectively creates a potential shortcut
for queries that involve the attribute "startyear" of the table "titles".
However, the optimizer does not have to use this shortcut and we will see that
this essentially depends on the specific query.

## (3) "Tell" PostgreSQL that a shortcut exists
================================================================================
```
ANALYZE titles;
```

Sometimes, we need to "tell" the database system that additional information is
available (such a shortcuts). This is also the case for PostgreSQL, hence we use
the "ANALYZE" command, which gathers information about the entire database or
about a specific table (in our example, we only analyze the table "titles").

## (4) New Query Plan
================================================================================
```
EXPLAIN SELECT * FROM titles WHERE startyear = 1990;
```

We observe that the optimizer now makes a different choice, namely it chooses to
use the newly created index as shortcut to find the movies with
"staryear = 1990". This is also reflected by the overall costs that are reported
by the optimizer (32,919.48 units with index vs. 168,978.50 units without
index), hence the optimizer chooses to use the index as this is probably faster.
In the next step, we will see that this is not always the case.

```
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Index Scan using startyear_idx on titles  (cost=0.43..32919.48 rows=12556 width=113)
   Index Cond: (startyear = 1990)
(2 rows)
```

## (5) Query Plan for Another Query
================================================================================
```
EXPLAIN SELECT * FROM titles WHERE startyear > 1990;
```

We slightly adapt our query by replacing "=" with ">". In this case, we observe
that the optimizer still chooses the sequential scan over using an index.
This implies that the index (shortcut) has basically little to no effect in
terms of runtime (i.e., the index shortcut does not lead to faster runtimes).

```
                            QUERY PLAN                             
-------------------------------------------------------------------
 Seq Scan on titles  (cost=0.00..168978.41 rows=4040479 width=113)
   Filter: (startyear > 1990)
(2 rows)
```

## (6) Cleanup - Remove the Index
================================================================================
```
DROP INDEX startyear_idx;
```

It is also possible to remove an index, e.g., if has little/no positive effect
on the query runtimes. Recall that indexes also consume additional memory and
may even slow down a system if they must be updated frequently (also the
shortcut must be kept up to date in order to improve the performance).