Database Indexing Explained: A Complete Guide for Technical Interviews
Database Indexing Explained: A Complete Guide for Technical Interviews
If you've ever wondered why some database queries return results in milliseconds while others take forever, the answer often lies in indexing. As someone who's spent years optimizing database performance and interviewing engineers, I can tell you that understanding indexing is crucial—not just for building efficient systems, but for acing technical interviews.
Think of a database index like the index at the back of a textbook. Instead of flipping through every page to find information about "B-trees," you check the index, find the page numbers, and jump directly there. Database indexes work similarly, creating shortcuts to your data.
How Database Indexes Work Under the Hood
Most modern databases use B-tree indexes as their default structure. A B-tree is a self-balancing tree that maintains sorted data and allows searches, insertions, and deletions in logarithmic time—O(log n) instead of O(n) for a full table scan.
Here's what happens when you create an index:
-- Creating an index on the email column
CREATE INDEX idx_user_email ON users(email);
-- This query now uses the index for fast lookups
SELECT * FROM users WHERE email = 'john@example.com';
When you execute that SELECT query, the database engine:
This process typically requires only 3-4 disk reads even for tables with millions of rows, compared to potentially millions of reads for a full table scan.
Types of Database Indexes You Should Know
Clustered vs Non-Clustered Indexes
A clustered index determines the physical storage order of data in the table. Think of it as a dictionary—words are physically arranged alphabetically. Each table can have only one clustered index because data can only be sorted one way physically.
A non-clustered index is like a separate reference guide. The actual data stays in its original order, but the index provides pointers to find specific rows quickly.
-- Primary key creates a clustered index by default
CREATE TABLE orders (
order_id INT PRIMARY KEY, -- Clustered index
customer_id INT,
order_date DATE
);
-- This creates a non-clustered index
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
Composite Indexes
Composite indexes span multiple columns and follow the "leftmost prefix" rule. If you create an index on (lastname, firstname, age), it can efficiently handle queries filtering by:
- lastname only
- lastname + firstname
- lastname + firstname + age
But it won't help with queries filtering only by firstname or age.
Unique and Partial Indexes
Unique indexes enforce data integrity while providing performance benefits. Partial indexes (available in PostgreSQL and some other databases) only index rows meeting specific conditions, saving space and maintenance overhead.
Database Index Performance Optimization Strategies
The key to effective indexing is understanding your query patterns. I've seen developers create indexes that never get used and miss obvious optimization opportunities.
Query Execution Plans
Always examine execution plans to understand how your queries use indexes:
-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 12345 AND order_date > '2023-01-01';
-- Look for:
-- - Index Scan vs Seq Scan
-- - Actual vs estimated rows
-- - Execution time
Index Selectivity
High selectivity means an index helps narrow down results significantly. Indexing a boolean column with 50% true/false distribution provides little benefit, but indexing unique email addresses provides maximum benefit.
Calculate selectivity as: SELECT COUNT(DISTINCT column) / COUNT(*) FROM table
A selectivity above 0.1 (10% unique values) typically justifies an index.
Covering Indexes
A covering index includes all columns needed for a query, eliminating the need to access the table data:
-- Instead of accessing the table, this index covers everything
CREATE INDEX idx_covering ON orders(customer_id)
INCLUDE (order_date, total_amount);
-- This query uses only the index
SELECT order_date, total_amount
FROM orders WHERE customer_id = 12345;
Common Database Indexing Mistakes in Technical Interviews
I've interviewed hundreds of engineers, and these mistakes come up repeatedly:
Over-indexing
Every index consumes storage space and slows down INSERT, UPDATE, and DELETE operations. I've seen production databases with 20+ indexes per table, most unused. Start with indexes supporting your most frequent queries, then add others based on performance monitoring.
Ignoring Index Maintenance
Indexes can become fragmented over time, especially with frequent updates. Most databases provide commands to rebuild or reorganize indexes:
-- SQL Server
ALTER INDEX idx_user_email ON users REBUILD;
-- PostgreSQL
REINDEX INDEX idx_user_email;
Function-based Query Issues
Functions in WHERE clauses often prevent index usage:
-- This won't use an index on email
SELECT * FROM users WHERE UPPER(email) = 'JOHN@EXAMPLE.COM';
-- Create a functional index instead
CREATE INDEX idx_email_upper ON users(UPPER(email));
Premature Optimization
Don't create indexes before you understand your data access patterns. Profile your application under realistic load, identify slow queries, then create targeted indexes.
Advanced Indexing Concepts for Senior Engineers
As you progress in your career, you'll encounter more sophisticated indexing scenarios:
Hash Indexes
Perfect for equality comparisons but useless for range queries. PostgreSQL supports hash indexes for specific use cases where you only need exact matches.
Bitmap Indexes
Excellent for data warehouse scenarios with low-cardinality columns and complex WHERE clauses combining multiple conditions.
Spatial Indexes
R-tree indexes for geographic data, enabling efficient "find all restaurants within 5 miles" queries.
Monitoring Index Usage
Regularly audit your indexes to identify unused ones:
-- PostgreSQL: Find unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey';
Remember, indexing is both an art and a science. The best approach depends on your specific data patterns, query workload, and performance requirements. In interviews, demonstrate that you understand these trade-offs rather than just memorizing index types.
The goal isn't to create the most indexes—it's to create the right indexes that genuinely improve your application's performance while minimizing overhead.
Practice this on Goliath Prep — AI-graded mock interviews with instant feedback. Try it free at app.goliathprep.com
Practice Interview Questions with AI
Goliath Prep gives you AI-powered mock interviews with instant feedback across 29+ technologies.
Start Practicing Free →