Hi everyone!
I previously posted a question concerning the performance of mass inserts on Innodb tables by using INSERT...SELECT, nevertheless it seems that the speed problem was a result of "random access I/O" caused by the SELECT part. More specifically the SELECT joins 2 tables and afterwards sorts the ouput by the primary key of the first table.
The join is optimized and the proper indexes are used, nevertheless it results on trying to access several different pages from Hard Disk, almost in a random pattern. The output of the SELECT contains few INTs, 1 varchar (255) and 2 Medium Blobs.
What would you recommend me to do in order to improve the performance?
1) Increase the number of records that I request on the SELECT part? This way I will increase the probability of retrieving more table rows from the Hard disk pages that I request. The trade-off on this case is that I request even more pages from HD and I need to sort more records afterwards.
2) Decrease the number of records that I request on the SELECT part? This way I request far less pages at each time, I can sort the outcome faster but at the same time I reduce also the throughput and the amount of data that I feed on the INSERT part.
Testing is extremely difficult since the DB is huge (+300GB) and the above scripts are actually used to reorder the DB based on a new primary key.
Are there any parameters, configurations or buffers that can help me increase speed? Do you have any other suggestions?
PS: I apologize for posting a new question but I think that this topic is significantly different from the previous one.
I previously posted a question concerning the performance of mass inserts on Innodb tables by using INSERT...SELECT, nevertheless it seems that the speed problem was a result of "random access I/O" caused by the SELECT part. More specifically the SELECT joins 2 tables and afterwards sorts the ouput by the primary key of the first table.
The join is optimized and the proper indexes are used, nevertheless it results on trying to access several different pages from Hard Disk, almost in a random pattern. The output of the SELECT contains few INTs, 1 varchar (255) and 2 Medium Blobs.
What would you recommend me to do in order to improve the performance?
1) Increase the number of records that I request on the SELECT part? This way I will increase the probability of retrieving more table rows from the Hard disk pages that I request. The trade-off on this case is that I request even more pages from HD and I need to sort more records afterwards.
2) Decrease the number of records that I request on the SELECT part? This way I request far less pages at each time, I can sort the outcome faster but at the same time I reduce also the throughput and the amount of data that I feed on the INSERT part.
Testing is extremely difficult since the DB is huge (+300GB) and the above scripts are actually used to reorder the DB based on a new primary key.
Are there any parameters, configurations or buffers that can help me increase speed? Do you have any other suggestions?
PS: I apologize for posting a new question but I think that this topic is significantly different from the previous one.