Why Indexes

Why indexes? We want our apps to go faster and delight our users. Indexes, when written well, make database queries run faster and consume fewer resources. As the number of users and data in an application grow, it's crucial to have good indexing.

To get a tangible feel for how databases use indexes we'll pretend we are the database. We'll use the classic analogy of a phonebook. Let's build a mental model.

Making Plans

When someone calls with a query, we don't read pages randomly hoping to stumble across the answer. We create a plan.

First, we need to know what useful elements for searching are available. Is it sorted? Is there a table of contents? Are there chapters and verses?

Second, we use those elements to create a few plans. If it's sorted we could skim the first word of each page. If we have a glossary we could look up the word and then go to the page number. We pick the plan that will be the least amount of work.

Finally, we execute our plan. Now we actually do the lookups, thumb through the pages, and read the information.

No Good Plans

It's our first day on the job and someone calls us up with a query. The comment at the top will be the English version of the request.

-- Please get me the phone number for Joe Jones.
SELECT phone_number
FROM phone_book
WHERE
  firstname = 'Joe'
  AND lastname = 'Jones'

The entries in the phone book are completely random, unsorted. The best plan we can come up with is to read through the phone book until we find the person 'Joe Jones'.

If 'Joe Jones' is on the first page it will be a quick query. If he's on the last page our customer will be waiting a long time for the result. If the 'Joe Jones' doesn't exist at all we'll still have to read the whole phone book to verify that he's not in it.

This is a tremendous amount of work, so it's a bad plan. But with an unsorted book, it is the only plan we can come up with.

Plan Options

We need to create some options for ourselves. What plans could we come up with if we sorted the phone book by the last name?

  • Thumb to about the 1/3rd of the phone book to get close to the J's
  • If we're before the J's scan forward, if we're past the J's scan backward
  • Once we find J scan for Jo, then scan for Jon, then Jone
  • Once we're at Jones we scan for Joe until we find him, and return his phone number

We did a lot less work than reading the whole phone book.

Our basic query to find a phone number is performing much better than before. We're going to keep the sorting.

Deletes & Updates

How does our now sorted book affect our speed when updating or deleting records?

-- I'm Joe Jones. Please change my phone number to 555-555-5555.
UPDATE phone_book
SET phone_number = '555-555-5555'
WHERE
  firstname = 'joe'
  AND lastname = 'jones'

When we plan for this request we'll recognize we do the same thing as the 'Get Phone Number' request. We can seek to find 'Joe Jones' but rather than return the phone number we will change his phone number.

Now for deleting.

-- I'm Joe Jones. I'm moving away so please delete me from the phone book.
DELETE FROM phone_book
WHERE
  firstname = 'joe'
  AND lastname = 'jones'

Once again we can do the seek to 'Joe Jones'. But now we delete him from the phone book. Both our update and delete statements where we can seek with names have gotten much better.

What Does The Index Look Like?

Our index for sorting by lastname, then firstname would look like this.

CREATE INDEX IX_phone_book_lastname_firstname -- Name the index
ON phone_book -- Put the index on the phone_book table
(lastname, firstname) -- Sort by lastname, then by firstname
INCLUDE (phone_number) -- Store the phone_number in the index but don't sort it

If we could query the whole index it would have this structure.

| lastname | firstname | phone_number |
|----------|-----------|--------------|
| Aaberg   | Aaron     | 555-555-1234 |
| Aaberg   | Bob       | 555-555-1234 |
| ...      | ...       | 555-555-2345 |

Index Tradeoffs

Our IX_phone_book_lastname_firstname index has made our main queries for Find, Update and Delete requests a lot faster. But are there any queries that are actually made slower because of the index? Let's think about inserts.

-- I'm new in town. My name is Joe Jones. My phone number is 555-555-5555
INSERT INTO phone_book
    (firstname, lastname, phone_number)
VALUES
    ('Joe', 'Jones', '555-555-5555')

This is the first query we've looked at that is actually harder because of the index. To understand why let's think about both plans.

Pre-Index

  • Add Joe Jones to the last page of the phone book.

Post-Index

  • Seek to where Joe Jones fits in sort order.
  • If there's no blank space on the page shuffle the other people onto previous or next pages to make room.
  • Add him to the right spot.

The 2nd step of this one is strange. If the pages of our phone book were airtight, leaving no room for extra inserts this would be an awful task. We'd have to shift everyone to the right on every insert. Let's not get too hung up on this right now. Let's say we found a good balance of leaving spaces in our phone book so that we rarely have to reshuffle.

This is a much more complex issue in reality. You have to be careful in SQL to make sure you don't end up 'page shuffling' every time you insert a row. But we don't want to get hung up on it right now since it's more advanced. If you want to take a detour to understand it, check out [link] SQL Index Fill Factor and Page Shuffling.

Regardless of the possible page shuffling, the insert is harder now with the index. This is a pure tradeoff that we'll always face when indexing. Maintaining the index order has a cost.

Making the Tradeoff

In our case with the IX_phone_book_lastname_firstname the cost is well worth it. We avoid full phone book scans on our most common Find, Update and Delete statements. That is a huge gain compared to the cost of the seeking we do on our most common Insert.

Another tradeoff to note is the storage space required for our index. We'll talk a bit more about that when we get into multiple indexes. For now, we'll say the storage is also worth avoiding those full phone book table scans.

Other Queries

So far our queries have only been about finding individual people. And they've all used firstname and lastname in the WHERE clauses. That is why they've all benefited so much from our index that sorts by name. We're dealing with the same "family" of queries.

We're going to expand our understanding by looking at a new query, that doesn't fit in the "family" of queries so far. Consider this query.

-- How many people in the phone book have the firstname Beau?
SELECT COUNT(*)
FROM phone_book
WHERE
  firstname = 'Beau'

Try to answer the following.

  • Can this query use our lastname, firstname index? Why or why not.
  • Are there new indexes that would yield better plans for this query?

Spoilers ahead. Make a guess at these before reading on.

--

--

--

--

--

--

Can it use our lastname, firstname index?

No. This is a key thing to know about indexes. The order of columns matter. Remember our index goes lastname, firstname. There is no way to seek all the firstnames with that index. Let's visualize that.

| lastname | firstname | phone_number |
|----------|-----------|--------------|
| Aaberg   | Aaron     | 555-555-1233 |
| Aaberg   | Beau      | 555-555-1234 |
| ...      | ...       | 555-555-2346 |
| ...      | ...       | 555-555-2347 |
| Connor   | Beau      | 555-555-2348 |
| ...      | ...       | 555-555-2349 |
| Zelko    | Beau      | 555-555-2341 |

The last name sorting is no help. There could be a Beau on Aaberg or on Zelko. We have to start at the beginning and stop at the end. The index is not useful for the query.

Can We Make a Good Index for the Query?

Yes. We can make another copy of our phone book that is perfectly suited to answer this query.

CREATE INDEX IX_phone_book_firstname -- Name the index
ON phone_book -- Put the index on the phone_book table
(firstname) -- Sort by firstname only
-- Note: no included columns

Let's visualize the index.

| firstname |
|---------- |
| ...       |
| Aaron     |
| Beau      |
| Beau      |
| ...       |
| Beau      |
| Bob       |
| ...       |

Our plan becomes pretty clear.

  • Seek to our first Beau and start counting
  • Stop counting when we see any non-Beau

Notice how we only have firstname. We left off lastname and phone_number because they are completely for that query. By leaving them off we save storage and can actually read the index faster. We can fit more firstnames on a page so we read fewer pages.

Summary

  • Good indexes create good plans for queries
  • Good plans execute quickly and can scale to more users
  • Indexes have tradeoffs
    • Inserts and Updates must maintain index ordering
    • Indexes cost storage because they copy the table data
  • We can have multiple indexes on a table to serve different queries

The best way to get a knack for indexes is to practice. Write queries check the plans, then add/remove indexes and recheck to see what the SQL engine does. After some time you'll have a gut feel for what indexes are needed just by looking at a query.

Next, we'll talk about a holistic approach to indexing a real app. We'll look at techniques for finding hotspots that need tuning. We'll cover when indexes are not the right answer for performance. We'll look at some general guidelines for indexing.