Database Indexing Explained: A Complete Guide for Technical Interviews

Database Indexing Explained: A Complete Guide for Technical Interviews

If you've ever wondered why some database queries return results in milliseconds while others take forever, the answer often lies in indexing. As someone who's spent years optimizing database performance and interviewing engineers, I can tell you that understanding indexing is crucial—not just for building efficient systems, but for acing technical interviews.

Think of a database index like the index at the back of a textbook. Instead of flipping through every page to find information about "B-trees," you check the index, find the page numbers, and jump directly there. Database indexes work similarly, creating shortcuts to your data.

How Database Indexes Work Under the Hood

Most modern databases use B-tree indexes as their default structure. A B-tree is a self-balancing tree that maintains sorted data and allows searches, insertions, and deletions in logarithmic time—O(log n) instead of O(n) for a full table scan.

Here's what happens when you create an index:

-- Creating an index on the email column
CREATE INDEX idx_user_email ON users(email);

-- This query now uses the index for fast lookups
SELECT * FROM users WHERE email = 'john@example.com';

When you execute that SELECT query, the database engine:

  • Traverses the B-tree index to find the email value

  • Retrieves the row pointer from the index

  • Fetches the actual row data using that pointer
  • This process typically requires only 3-4 disk reads even for tables with millions of rows, compared to potentially millions of reads for a full table scan.

    Types of Database Indexes You Should Know

    Clustered vs Non-Clustered Indexes

    A clustered index determines the physical storage order of data in the table. Think of it as a dictionary—words are physically arranged alphabetically. Each table can have only one clustered index because data can only be sorted one way physically.

    A non-clustered index is like a separate reference guide. The actual data stays in its original order, but the index provides pointers to find specific rows quickly.

    -- Primary key creates a clustered index by default
    CREATE TABLE orders (
        order_id INT PRIMARY KEY,  -- Clustered index
        customer_id INT,
        order_date DATE
    );
    
    

    -- This creates a non-clustered index
    CREATE INDEX idx_customer_date ON orders(customer_id, order_date);

    Composite Indexes

    Composite indexes span multiple columns and follow the "leftmost prefix" rule. If you create an index on (lastname, firstname, age), it can efficiently handle queries filtering by:

    • lastname only

    • lastname + firstname

    • lastname + firstname + age


    But it won't help with queries filtering only by firstname or age.

    Unique and Partial Indexes

    Unique indexes enforce data integrity while providing performance benefits. Partial indexes (available in PostgreSQL and some other databases) only index rows meeting specific conditions, saving space and maintenance overhead.

    Database Index Performance Optimization Strategies

    The key to effective indexing is understanding your query patterns. I've seen developers create indexes that never get used and miss obvious optimization opportunities.

    Query Execution Plans

    Always examine execution plans to understand how your queries use indexes:

    -- PostgreSQL
    EXPLAIN ANALYZE SELECT * FROM orders 
    WHERE customer_id = 12345 AND order_date > '2023-01-01';
    
    

    -- Look for:
    -- - Index Scan vs Seq Scan
    -- - Actual vs estimated rows
    -- - Execution time

    Index Selectivity

    High selectivity means an index helps narrow down results significantly. Indexing a boolean column with 50% true/false distribution provides little benefit, but indexing unique email addresses provides maximum benefit.

    Calculate selectivity as: SELECT COUNT(DISTINCT column) / COUNT(*) FROM table

    A selectivity above 0.1 (10% unique values) typically justifies an index.

    Covering Indexes

    A covering index includes all columns needed for a query, eliminating the need to access the table data:

    -- Instead of accessing the table, this index covers everything
    CREATE INDEX idx_covering ON orders(customer_id) 
    INCLUDE (order_date, total_amount);
    
    

    -- This query uses only the index
    SELECT order_date, total_amount
    FROM orders WHERE customer_id = 12345;

    Common Database Indexing Mistakes in Technical Interviews

    I've interviewed hundreds of engineers, and these mistakes come up repeatedly:

    Over-indexing

    Every index consumes storage space and slows down INSERT, UPDATE, and DELETE operations. I've seen production databases with 20+ indexes per table, most unused. Start with indexes supporting your most frequent queries, then add others based on performance monitoring.

    Ignoring Index Maintenance

    Indexes can become fragmented over time, especially with frequent updates. Most databases provide commands to rebuild or reorganize indexes:

    -- SQL Server
    ALTER INDEX idx_user_email ON users REBUILD;
    
    

    -- PostgreSQL
    REINDEX INDEX idx_user_email;

    Function-based Query Issues

    Functions in WHERE clauses often prevent index usage:

    -- This won't use an index on email
    SELECT * FROM users WHERE UPPER(email) = 'JOHN@EXAMPLE.COM';
    
    

    -- Create a functional index instead
    CREATE INDEX idx_email_upper ON users(UPPER(email));

    Premature Optimization

    Don't create indexes before you understand your data access patterns. Profile your application under realistic load, identify slow queries, then create targeted indexes.

    Advanced Indexing Concepts for Senior Engineers

    As you progress in your career, you'll encounter more sophisticated indexing scenarios:

    Hash Indexes

    Perfect for equality comparisons but useless for range queries. PostgreSQL supports hash indexes for specific use cases where you only need exact matches.

    Bitmap Indexes

    Excellent for data warehouse scenarios with low-cardinality columns and complex WHERE clauses combining multiple conditions.

    Spatial Indexes

    R-tree indexes for geographic data, enabling efficient "find all restaurants within 5 miles" queries.

    Monitoring Index Usage

    Regularly audit your indexes to identify unused ones:

    -- PostgreSQL: Find unused indexes
    SELECT schemaname, tablename, indexname, idx_scan
    FROM pg_stat_user_indexes
    WHERE idx_scan = 0
    AND indexrelname NOT LIKE '%_pkey';

    Remember, indexing is both an art and a science. The best approach depends on your specific data patterns, query workload, and performance requirements. In interviews, demonstrate that you understand these trade-offs rather than just memorizing index types.

    The goal isn't to create the most indexes—it's to create the right indexes that genuinely improve your application's performance while minimizing overhead.

    Practice this on Goliath Prep — AI-graded mock interviews with instant feedback. Try it free at app.goliathprep.com

    Practice Interview Questions with AI

    Goliath Prep gives you AI-powered mock interviews with instant feedback across 29+ technologies.

    Start Practicing Free →