We are seeing a lot of storage fragmentation when appending to the same JSON column multiple times. From the documentation it's not clear this was the expected behavior so would like to understand what is wrong here.
The example to reproduce is fairly simple:
CREATE TABLE IF NOT EXISTS Documents (
id VARCHAR(255) PRIMARY KEY,
custom_data JSON);
Then you can run the following query many times (say 1000) for different JSON keys:
INSERT INTO Documents (id, custom_data) VALUES ('id', JSON_OBJECT('random_key', 'val')) ON DUPLICATE KEY UPDATE custom_data = IF(custom_data IS NOT NULL, JSON_SET(custom_data, CONCAT('$.random_key'), 'val'), JSON_OBJECT('random_key', 'val')), id = 'id';
The result after the queries is that the table uses 14MB
mysql> select * from INFORMATION_SCHEMA.INNODB_TABLESPACES where NAME="document/Documents" limit 1;
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
| SPACE | NAME | FLAG | ROW_FORMAT | PAGE_SIZE | ZIP_PAGE_SIZE | SPACE_TYPE | FS_BLOCK_SIZE | FILE_SIZE | ALLOCATED_SIZE | AUTOEXTEND_SIZE | SERVER_VERSION | SPACE_VERSION | ENCRYPTION | STATE |
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
| 17 | document/documents | 16417 | Dynamic | 16384 | 0 | Single | 4096 | 14680064 | 14680064 | 0 | 8.0.31 | 1 | N | normal |
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
1 row in set (0.01 sec)
while the actual data is fairly small,
mysql> select OCTET_LENGTH(custom_data) from Documents;
+---------------------------+
| OCTET_LENGTH(custom_data) |
+---------------------------+
| 333020 |
+---------------------------+
Also running an optimize table reduces the size by a lot
mysql> optimize table Documents;
+--------------------+----------+----------+-------------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+--------------------+----------+----------+-------------------------------------------------------------------+
| document.documents | optimize | note | Table does not support optimize, doing recreate + analyze instead |
| document.documents | optimize | status | OK |
+--------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (0.06 sec)
mysql> select * from INFORMATION_SCHEMA.INNODB_TABLESPACES where NAME="document/Documents" limit 1;
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
| SPACE | NAME | FLAG | ROW_FORMAT | PAGE_SIZE | ZIP_PAGE_SIZE | SPACE_TYPE | FS_BLOCK_SIZE | FILE_SIZE | ALLOCATED_SIZE | AUTOEXTEND_SIZE | SERVER_VERSION | SPACE_VERSION | ENCRYPTION | STATE |
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
| 18 | document/documents | 16417 | Dynamic | 16384 | 0 | Single | 4096 | 475136 | 475136 | 0 | 8.0.31 | 1 | N | normal |
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
1 row in set (0.01 sec)
Thanks in advance for any help.
The example to reproduce is fairly simple:
CREATE TABLE IF NOT EXISTS Documents (
id VARCHAR(255) PRIMARY KEY,
custom_data JSON);
Then you can run the following query many times (say 1000) for different JSON keys:
INSERT INTO Documents (id, custom_data) VALUES ('id', JSON_OBJECT('random_key', 'val')) ON DUPLICATE KEY UPDATE custom_data = IF(custom_data IS NOT NULL, JSON_SET(custom_data, CONCAT('$.random_key'), 'val'), JSON_OBJECT('random_key', 'val')), id = 'id';
The result after the queries is that the table uses 14MB
mysql> select * from INFORMATION_SCHEMA.INNODB_TABLESPACES where NAME="document/Documents" limit 1;
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
| SPACE | NAME | FLAG | ROW_FORMAT | PAGE_SIZE | ZIP_PAGE_SIZE | SPACE_TYPE | FS_BLOCK_SIZE | FILE_SIZE | ALLOCATED_SIZE | AUTOEXTEND_SIZE | SERVER_VERSION | SPACE_VERSION | ENCRYPTION | STATE |
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
| 17 | document/documents | 16417 | Dynamic | 16384 | 0 | Single | 4096 | 14680064 | 14680064 | 0 | 8.0.31 | 1 | N | normal |
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
1 row in set (0.01 sec)
while the actual data is fairly small,
mysql> select OCTET_LENGTH(custom_data) from Documents;
+---------------------------+
| OCTET_LENGTH(custom_data) |
+---------------------------+
| 333020 |
+---------------------------+
Also running an optimize table reduces the size by a lot
mysql> optimize table Documents;
+--------------------+----------+----------+-------------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+--------------------+----------+----------+-------------------------------------------------------------------+
| document.documents | optimize | note | Table does not support optimize, doing recreate + analyze instead |
| document.documents | optimize | status | OK |
+--------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (0.06 sec)
mysql> select * from INFORMATION_SCHEMA.INNODB_TABLESPACES where NAME="document/Documents" limit 1;
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
| SPACE | NAME | FLAG | ROW_FORMAT | PAGE_SIZE | ZIP_PAGE_SIZE | SPACE_TYPE | FS_BLOCK_SIZE | FILE_SIZE | ALLOCATED_SIZE | AUTOEXTEND_SIZE | SERVER_VERSION | SPACE_VERSION | ENCRYPTION | STATE |
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
| 18 | document/documents | 16417 | Dynamic | 16384 | 0 | Single | 4096 | 475136 | 475136 | 0 | 8.0.31 | 1 | N | normal |
+-------+--------------------+-------+------------+-----------+---------------+------------+---------------+-----------+----------------+-----------------+----------------+---------------+------------+--------+
1 row in set (0.01 sec)
Thanks in advance for any help.