Skip to content

MariaDB JSON text indexing

It is not new that we can store a JSON content in a normal table text field. This has always been the case in the past. But two key features were missing : filtering based on JSON content attributes and indexing of the JSON content. With MariaDB 10.1 CONNECT storage Engine we offer support for external content including JSON files. The MariaDB CONNECT storage Engine also comes with a set of JSON related UDFs. This allows us to do the following thing :

MariaDB > CREATE TABLE jassets (
       item_name varchar(32) primary key, /* A common attribute for all items */
       json_cols varchar(512)  /* JSON data will be stored here */
     );
MariaDB > INSERT INTO jassets VALUES ('MariaDB T-shirt', Json_Object('blue' color, 'XL' size));
MariaDB > INSERT INTO jassets VALUES ('Thinkpad Laptop', Json_Object('black' color, 500 price));
...
MariaDB > SELECT * FROM jassets;
+-----------------+-------------------------------+
| item_name       | json_cols                     |
+-----------------+-------------------------------+
| MariaDB T-shirt | {"color":"blue","size":"XL"}  |
| Thinkpad Laptop | {"color":"black","price":500} |
...
+-----------------+-------------------------------+

Through the JSON UDFs We can select a limited number of field's values from the JSON content. If we want to restrict access only to records having a specific value for one of the JSON attribute we could write a request like :

MariaDB [test]> SELECT item_name, JsonGet_String(json_cols, 'color') AS color FROM jassets 
    ->   where JsonGet_String(json_cols, 'color') = 'blue';
+-------------------+-------+
| item_name         | color |
+-------------------+-------+
| MariaDB T-shirt   | blue  |
+-------------------+-------+

This is quite inefficient. MariaDB need to do a full table scan and for each row we need to parse the JSON content to evaluate the JSON extract function (defined as an UDF for the MariaDB CONNECT Storage Engine ). Is there any possible speedup ?

First in MariaDB we can define a virtual column implementing this JSON function call. We can make this virtual column persistent. We can then put an index on this persistent column (For MariaDB being persistent is mandatory to be able to put an index 🙁 )

ALTER TABLE jassets ADD column color varchar(10) AS JsonGet_String(json_cols, 'color') PERSISTENT;
CREATE INDEX colori  ON jassets (color);

We can then issue a request like :

MariaDB [test]> SELECT item_name, color FROM jassets where color='blue';
+-------------------+-------+
| item_name         | color |
+-------------------+-------+
| MariaDB T-shirt   | blue  |
+-------------------+-------+

This is much more efficient as only records satisfying the where clause through the index will be accessed. This does not requires any full table scan. JSON Parsing will still be required to extract the color value from the JSON content that match the where clause.

It must be noticed that having a JSON field with multiple associated indexes can turn very inefficient on updates. It must be noted that an innoDB table can contain a maximum of 64 secondary indexes. This can be a limitation for complex JSON structures. After a modification of the JSON content there is currently no way to discriminate the only impacted indexes. All of them need to be updated.

This technique of storing JSON into a text field works with any storage engine. This can be of interest for some specific workload requiring continuous high insert rate like what we can obtain with TokuDB and it fractal tree indexing technology.

Here is the Documentation on MariaDB CONNECT Storage Engine support for JSON.

Leave a Reply

Your email address will not be published. Required fields are marked *