mysql indexes
TRANSCRIPT
MySQL Indexes
Why use indexes?Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in b-trees
B-tree is a self-balancing tree data structure that keeps data sorted and allows searches,
sequential access, insertions, and deletions in predictable time
B-tree
Time complexity:
Full table scan = O(n)
Using index = O(log(n))
Selectivity
Selectivity is the ratio of unique values within a certain column
The more unique the values, the higher the selectivity
The query engine likes highly selective key columns
The higher the selectivity, the faster the query engine can reduce the
size of the result set
Selectivity and CardinalityCardinality is number of unique values in the index.
In simple words:
Max cardinality: all values are unique
Min cardinality: all values are the same
Selectivity of index = cardinality/(number of records) * 100%
The perfect selectivity is 100%. Can be reached by unique indexes on NOT NULL
columns.
Query optimization
The main idea is not to try to tune your database, but optimize your query based on the data you have
Selectivity by exampleExample:
Table of 10,000 rows with column `gender` (number of males ~ number of females)Let’s count selectivity for the `gender` columnSelectivity = 2/10000 * 100% = 0.02% which is very low
When selectivity can be neglectedSelectivity can be neglected when values are distributed unevenly
Example:
If our query select rows with stat IN (0,1) then we can still use index.
As a general idea, we should create indexes on tables that are often queried for less than
15% of the table's rows
How MySQL uses indexes
• Data Lookups
• Sorting
• Avoiding reading “data”
• Special Optimizations
Data Lookups
SELECT * FROM employees WHERE lastname=“Smith”
The classical use of index on (lastname)
Can use Multiple column indexes
SELECT * FROM employees WHERE lastname=“Smith” AND
dept=“accounting”
Will use index on (dept, lastname)
Use casesIndex (a,b,c) - order of columns matters
Will use Index for lookup (all listed keyparts)
a>5
a=5 AND b>6
a=5 AND b=6 AND c=7
a=5 AND b IN (2,3) AND c>5
Will NOT use Index
b>5 – Leading column is not referenced
b=6 AND c=7 - Leading column is not referenced
Will use Part of the index
a>5 AND b=2 - range on first column; only use this key part
a=5 AND b>6 AND c=2 - range on second column, use 2 parts
The thing with rangesMySQL will stop using key parts in multi part index as soon as it met the real range (<,>, bETWEEN), it however is able to continue using key parts further to the right if IN(…) range is used
Sorting
SELECT * FROM players ORDER BY score DESC LIMIT 10
Will use index on SCORE column
Without index MySQL will do “filesort” (external sort) which is very expensive
Often Combined with using Index for lookup
SELECT * FROM players WHERE country=“US” ORDER BY score DESC LIMIT 10
Best served by Index on (country, score)
Use CasesIt becomes even more restricted!
KEY(a,b)
Will use Index for Sorting
ORDER BY a - sorting by leading column
a=5 ORDER BY b - EQ filtering by 1st and sorting by 2nd
ORDER BY a DESC, b DESC - Sorting by 2 columns in same order
a>5 ORDER BY a - Range on the column, sorting on the same
Will NOT use Index for Sorting
ORDER BY b - Sorting by second column in the index
a>5 ORDER BY b – Range on first column, sorting by second
a IN(1,2) ORDER BY b - In-Range on first column
ORDER BY a ASC, b DESC - Sorting in the different order
Sorting rules
You can’t sort in different order by 2 columns
You can only have Equality comparison (=) for columns which are not part of ORDER BY
Not even IN() works in this case
Avoid reading the data“Covering Index”
Applies to index use for specific query, not type of index.
Reading Index ONLY and not accessing the “data”
SELECT status FROM orders WHERE customer_id=123
KEY(customer_id, status)
Index is typically smaller than data
Access is a lot more sequential
Access through data pointers is often quite “random”
Aggregation functions
Index help MIN()/MAX() aggregate functions
But only these
SELECT MAX(id) FROM table;
SELECT MAX(salary) FROM employee GROUP BY dept_id
Will benefit from (dept_id, salary) index
“Using index for group-by”
JoinsMySQL Performs Joins as “Nested Loops”
SELECT * FROM posts p, comments c WHERE p.author=“Peter” AND c.post_id=p.id
Scan table `posts` finding all posts which have Peter as an author
For every such post go to `comments` table to fetch all comments
Very important to have all JOINs Indexed
Index is only needed on table which is being looked up
The index on posts.id is not needed for this query performance
Re-Design JOIN queries which can’t be well indexed
Multiple indexesMySQL Can use More than one index
“Index Merge”
SELECT * FROM table WHERE a=5 AND b=6
Can often use Indexes on (a) and (b) separately
Index on (a,b) is much better
SELECT * FROM table WHERE a=5 OR b=6
2 separate indexes is as good as it gets
Index (a,b) can’t be used for this query
String indexesThere is no difference… really
Sort order is defined for strings (collation)
“AAAA” < “AAAB”
Prefix LIKE is a special type of Range
LIKE “ABC%” means
“ABC[LOWEST]”<KEY<“ABC[HIGHEST]”
LIKE “%ABC” can’t be optimized by use of the index
Real case: ProblemLets take example from real world (Voltu first page campaigns list)
Real case: Timing
Initially it was like 1m 20sec seconds to run for the
first time
After mysql cached the response, it was about 20sec
Real case: QuerySELECT wk2_campaign.*, wk2_campaignGroup.category_id as group_category_id, wk2_campaignGroup.subcategory_id as group_subcategory_id, wk2_campaignGroup.summary as group_summary, IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) category_id FROM `wk2_campaign`LEFT JOIN wk2_resource_status ON( wk2_resource_status.id = wk2_campaign.CaID) LEFT JOIN campaign_has_group ON( wk2_campaign.CaID = campaign_has_group.campaign_id) LEFT JOIN wk2_campaignGroup ON( campaign_has_group.campaign_group_id = wk2_campaignGroup.GrID) LEFT JOIN si_private_campaigns pc ON( pc.campaign_id = wk2_campaign.CaID) WHERE(wk2_campaign.tracking_active = '1') AND ((IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) IS NOT NULL) AND (IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) NOT IN(SELECT id FROM campaign_categories WHERE name IN( 'Mobile Content Subscription'))) AND(countries REGEXP 'US')) AND( ((wk2_campaign.stat IN('0', '1')) AND( wk2_resource_status.resource_type = 'ca') AND( wk2_resource_status.status = '1') AND(wk2_campaign.access != '0') AND(wk2_campaign.external_id IS NULL) AND( wk2_campaign.name IS NOT NULL ) AND(wk2_campaign.countries IS NOT NULL) AND( trim(wk2_campaign.countries) IS NOT NULL )) OR(pc.campaign_id IS NOT NULL));
Steps to optimize1.Add missing indexes for the joined tables
2.Check the selectivity for different columns of the main table wk2_campaign
The `tracking_active`, `stat` columns have the best selectivity (the low number of possible values) which can be indexed fast and boost query response time.
Steps to optimize3. Add index on these columns:
ALTER TABLE wk2_campaign ADD INDEX(tracking_active, stat);
4. We needed just to move some conditions so that they would fit the index
Result of optimizationWith these manipulations we made the query use only indexes
The explain select of this query:
Query run before after Performance increase
First time 1m 20s 0m 2s 4000%
Subsequent (cached by mysql)
20s 0.26s 7692%
Another example with “or”BeforeSELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id = caid) WHERE (name LIKE '%buscape%' OR caid LIKE 'buscape%') OR mobile_app_id LIKE '%buscape%' OR caid in
('89630','89632');
130 rows in set (7.43 sec)
AfterSELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id =
caid) WHERE (name LIKE '%buscape%' OR caid LIKE 'buscape%') UNION SELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id =
caid) WHERE mobile_app_id LIKE '%buscape%' UNION SELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id =
caid) WHERE caid in ('89630','89632');130 rows in set (4.12 sec)
> SELECT text FROM questions LIMIT 5;> EXPLAIN