Hi,
I'm currently implementing the compression feature for InnoDB tables for one big table containing many blobs and text. I use InnoDB 1.1.8-26 (MySQL 5.5) and it works pretty well for the compression since I can compress up to 86% (using key_block_size=16).
What I noticed, is that the statistics are now wrong. I've got around 300k lines in this table, with an average row length of 300KB. The table was around 90GB before compressing and 10GB after compressing. The average row length remains 300KB. I guess MySQL uses AVG_ROW_LENGTH and DATA_LENGTH columns from the table Information_schema.tables in order to calculate the value for the column TABLE_ROWS. Therefore, since the table length has evolved from 90GB to 10GB, the TABLE_ROWS decreased from 300k lines to only 33k lines. (My understanding is that MySQL performs DATA_LENGTH/AVG_ROW_LENGTH=TABLE_ROWS <=> 90GB/300KB=300 000 lines and 10GB/300KB=33 000 lines)
I believe that those datas are important to the query optimizer (query plan).
If I do "Explain select * from mytable", I can find back the value 33k in the ROWS column.
Since 33k lines looks really small regarding the real number of lines, I'm scared MySQL prefers doing a full table scan instead of using an index (thinking it's faster). Is there anything wrong from my understanding ? Is there any way to get this problem fixed ?
Should I report a bug to MySQL ?
Thank you for your help.
I'm currently implementing the compression feature for InnoDB tables for one big table containing many blobs and text. I use InnoDB 1.1.8-26 (MySQL 5.5) and it works pretty well for the compression since I can compress up to 86% (using key_block_size=16).
What I noticed, is that the statistics are now wrong. I've got around 300k lines in this table, with an average row length of 300KB. The table was around 90GB before compressing and 10GB after compressing. The average row length remains 300KB. I guess MySQL uses AVG_ROW_LENGTH and DATA_LENGTH columns from the table Information_schema.tables in order to calculate the value for the column TABLE_ROWS. Therefore, since the table length has evolved from 90GB to 10GB, the TABLE_ROWS decreased from 300k lines to only 33k lines. (My understanding is that MySQL performs DATA_LENGTH/AVG_ROW_LENGTH=TABLE_ROWS <=> 90GB/300KB=300 000 lines and 10GB/300KB=33 000 lines)
I believe that those datas are important to the query optimizer (query plan).
If I do "Explain select * from mytable", I can find back the value 33k in the ROWS column.
Since 33k lines looks really small regarding the real number of lines, I'm scared MySQL prefers doing a full table scan instead of using an index (thinking it's faster). Is there anything wrong from my understanding ? Is there any way to get this problem fixed ?
Should I report a bug to MySQL ?
Thank you for your help.