MariaDB and Native JSON support ?

A question raised by my previous post is : What about MariaDB and native JSON support ? In my previous post I mentioned the possibility to use the MariaDB CONNECT storage Engine to store and access JSON content in normal text field. Of course having a native JSON datatype brings more value. It introduces JSON validation, a more efficient access to attribute through an optimized storage format.

What about MariaDB and native JSON support ?
Native JSON support, indexing pure virtual column and optimizer improvements to smartly decide to use this index will come with MariaDB 10.2.
These JSON features are part of MariaDB 10.2 roadmap. Here is the MariaDB 10.2 roadmap.
For a more detailed features list for MariaDB 10.2 : Here is a full list of MariaDB 10.2 community voted features
The MariaDB engineering team has taken it time to evaluate various aspect of native JSON implementation. Two different implementations exist. Beside Oracle’s MySQL implementation Facebook has developed its own implementation called DocStore that is part of the Facebook MySQL branch. Both have been evaluated by MariaDB engineering. Another area that must be addressed is the current evolution of the standard regarding SQL/JSON

Oracle MySQL 5.7 already supports a Native JSON data type. This JSON datatype comes with an extensive set of functions to create/modify/search this JSON content. Oracle MySQL 5.7 has also introduced virtual columns (call generated columns). MySQL 5.7 Generated columns can be materialized or not. MariaDB already had virtual columns but Oracle MySQL 5.7 is currently ahead of MariaDB 10.1 as it is possible to add an index on a pure virtual column that is not persistant. MySQL 5.7 optimizer has also been improved to detect the use in the where clause of an expression corresponding to the virtual column. These virtual columns with indexing capabilities allows optimized filtering of native JSON datatype through a WHERE clause.

Neither MariaDB 10.2 nor Oracle MySQL 5.7 implement inplace update of JSON content. This is an area where a fully JSON based datastore like MongoDB performs better. This advantage is balanced by extra space consumption to introduce padding. This extra padding space is required to avoid having to move data around and to avoid to have to update many indexes for a single field update. This area definitely needs to be benchmark and depends on JSON content size and access pattern.

However for certain use cases Native JSON has some drawbacks compare to to storing JSON into a text field. The insert rate is much lower as JSON needs to be parsed and converted to the internal storage format. If the client application is completely handling the JSON content we are wasting server resources.

So for MariaDB fans and users be patient native JSON is coming with MariaDB 10.2. And of course feel free to comment. Our engineering and development is not done behind closed doors and you can be part of it. ;-)

MariaDB JSON text indexing

It is not new that we can store a JSON content in a normal table text field. This has always been the case in the past. But two key features were missing : filtering based on JSON content attributes and indexing of the JSON content. With MariaDB 10.1 CONNECT storage Engine we offer support for external content including JSON files. The MariaDB CONNECT storage Engine also comes with a set of JSON related UDFs. This allows us to do the following thing.

MariaDB > CREATE TABLE jassets (
       item_name varchar(32) primary key, /* A common attribute for all items */
       json_cols varchar(512)  /* JSON data will be stored here */
     );
MariaDB > INSERT INTO jassets VALUES ('MariaDB T-shirt', Json_Object('blue' color, 'XL' size));
MariaDB > INSERT INTO jassets VALUES ('Thinkpad Laptop', Json_Object('black' color, 500 price));
...
MariaDB > SELECT * FROM jassets;
+-----------------+-------------------------------+
| item_name       | json_cols                     |
+-----------------+-------------------------------+
| MariaDB T-shirt | {"color":"blue","size":"XL"}  |
| Thinkpad Laptop | {"color":"black","price":500} |
...
+-----------------+-------------------------------+

Through the JSON UDFs We can select a limited number of field’s values from the JSON content. If we want to restrict access only to records having a specific value for one of the JSON attribute we could write a request like :

MariaDB [test]> SELECT item_name, JsonGet_String(json_cols, 'color') AS color FROM jassets 
    ->   where JsonGet_String(json_cols, 'color') = 'blue';
+-------------------+-------+
| item_name         | color |
+-------------------+-------+
| MariaDB T-shirt   | blue  |
+-------------------+-------+

This is quite inefficient. MariaDB need to do a full table scan and for each row we need to parse the JSON content to evaluate the JSON extract function (defined as an UDF for the MariaDB CONNECT Storage Engine ). Is there any possible speedup ?

First in MariaDB we can define a virtual column implementing this JSON function call. We can make this virtual column persistent. We can then put an index on this persistent column (For MariaDB being persistent is mandatory to be able to put an index :-( )

ALTER TABLE jassets ADD column color varchar(10) AS JsonGet_String(json_cols, 'color') PERSISTENT;
CREATE INDEX colori  ON jassets (color);

We can then issue a request like :

MariaDB [test]> SELECT item_name, color FROM jassets where color='blue';
+-------------------+-------+
| item_name         | color |
+-------------------+-------+
| MariaDB T-shirt   | blue  |
+-------------------+-------+

This is much more efficient as only records satisfying the where clause through the index will be accessed. This does not requires any full table scan. JSON Parsing will still be required to extract the color value from the JSON content that match the where clause.

It must be noticed that having a JSON field with multiple associated indexes can turn very inefficient on updates. It must be noted that an innoDB table can contain a maximum of 64 secondary indexes. This can be a limitation for complex JSON structures. After a modification of the JSON content there is currently no way to discriminate the only impacted indexes. All of them need to be updated.

This technique of storing JSON into a text field works with any storage engine. This can be of interest for some specific workload requiring continuous high insert rate like what we can obtain with TokuDB and it fractal tree indexing technology.

Here is the Documentation on MariaDB CONNECT Storage Engine support for JSON.

Using JSON’s Arrays for MariaDB Dynamic Columns

The JSON format includes the concept of array. A JSON object cant contain an attribute of array type. We have seen that we can use the MariaDB CONNECT Storage Engine provided UDFs (user defined functions) to implement dynamic columns.

Let us create a table with a text column containing a a JSON string and let us insert some JSON content into this column. In JSON an attribute can be an array. Here the size attribute is a JSON array. We will use this array to manage the list of available sizes for a product.

MariaDB > CREATE TABLE jassets (
       item_name varchar(32) primary key, 
       json_cols varchar(512) 
     );
MariaDB > INSERT INTO jassets VALUES ('T-shirt', 
       Json_Object('blue' color, Json_Array('XL','M') json_size));
MariaDB > INSERT INTO jassets VALUES ('Thinkpad Laptop', Json_Object('black' color, 500 price));
MariaDB > SELECT * FROM jassets;
+-----------------+------------------------------------+
| item_name       | json_cols                          |
+-----------------+------------------------------------+
| T-shirt         | {"color":"blue","size":["XL","M"]} |
| Thinkpad Laptop | {"color":"black","price":500}      |
+-----------------+------------------------------------+

Through the available MariaDB JSON UDFs It is possible to Add or remove values to this array :

MariaDB > UPDATE jassets SET json_cols = json_array_add(json_cols, "S", 0, 'size')  WHERE item_name='T-shirt';
MariaDB > SELECT * FROM jassets;
+-----------------+----------------------------------------+
| item_name       | json_cols                              |
+-----------------+----------------------------------------+
| T-shirt         | {"color":"blue","size":["S","XL","M"]} |
| Thinkpad Laptop | {"color":"black","price":500}          |
+-----------------+----------------------------------------+
MariaDB >
MariaDB > UPDATE jassets SET json_cols = json_array_delete(json_cols, 2, 'size')  WHERE item_name='T-shirt';
MariaDB > SELECT *  from jassets;
+-----------------+------------------------------------+
| item_name       | json_cols                          |
+-----------------+------------------------------------+
| T-shirt         | {"color":"blue","size":["S","XL"]} |
| Thinkpad Laptop | {"color":"black","price":500}      |
+-----------------+------------------------------------+

This new way of addressing the dynamic column problem using JSON arrays gives great flexibility. This capability to use arrays does not exist currently in MariaDB dynamic columns.
This hybrid approach mixing relational fixed column content with JSON schemaless content overcomes the limitations of a pure NoSQL databases. It relies on the very strongly proven and consistent capabilities a relational database.

As I said previously this JSON access through MariaDB CONNECT storage engine should not be confused with the native JSON support available in Oracle/MySQL 5.7 (also in Facebook/MySQL Docstore).

Using JSON as Dynamic Columns with MariaDB

MariaDB CONNECT storage engine handles access to JSON files through standard SQL. It comes with a set of UDFs (user defined functions) to manipulate the JSON format. This JSON content can be stored in a normal text column. This approach can be used to implement dynamic columns. The dynamic column concept was first introduced with MariaDB 5.3.

This schema-less flexibility offered by a free content JSON column gives MariaDB a NoSQL flavor. This creates more flexibility for developers than the usual relational database static column definition.
Let us create a table with a column json_cols of type varchar(512). This column can contain a JSON string that will embed dynamic attributes.

MariaDB > CREATE TABLE jassets (
       item_name varchar(32) primary key, /* A common attribute for all items */
       json_cols varchar(512)  /* JSON data will be stored here */
     );

Let us insert some JSON content into this column. This content represent content that would not fit in a strictly typed predefined column set. This schemaless JSON content can be freely modified as in NoSQL database. This is a very classic pattern for modeling products and attributes in an ecomerce web site.

MariaDB > INSERT INTO jassets VALUES ('MariaDB T-shirt', Json_Object('blue' color, 'XL' size));
MariaDB > INSERT INTO jassets VALUES ('Thinkpad Laptop', Json_Object('black' color, 500 price));
MariaDB > SELECT * FROM jassets;
+-----------------+-------------------------------+
| item_name       | json_cols                     |
+-----------------+-------------------------------+
| MariaDB T-shirt | {"color":"blue","size":"XL"}  |
| Thinkpad Laptop | {"color":"black","price":500} |
+-----------------+-------------------------------+

We cant access a subset of the attributes embedded in this JSON column’s content.

SELECT item_name, JsonGet_String(json_cols, 'color') AS color FROM jassets;
+-----------------+-------+
| item_name       | color |
+-----------------+-------+
| MariaDB T-shirt | blue  |
| Thinkpad Laptop | black |
+-----------------+-------+
2 rows in set (0.00 sec)

It possible to add or remove a attributes from the dynamic column

MariaDB > UPDATE jassets SET json_cols=Json_Object_Delete(json_cols, 'price')
    -> WHERE JsonGet_String(json_cols, 'color')='black';
MariaDB > UPDATE jassets SET json_cols=Json_Object_Add(json_cols, '3 years' warranty)
    -> WHERE item_name='Thinkpad Laptop';
 
MariaDB > select * from jassets;
+-----------------+----------------------------------------+
| item_name       | json_cols                              |
+-----------------+----------------------------------------+
| MariaDB T-shirt | {"color":"blue","size":"XL"}           |
| Thinkpad Laptop | {"color":"black","warranty":"3 years"} |
+-----------------+----------------------------------------+
2 rows in set (0.00 sec)

We can also list all attributes names of the dynamic columns, or get them together with their values in JSON format:

SELECT item_name, Json_Object_List(json_cols) FROM jassets;
+-----------------+-----------------------------+
| item_name       | Json_Object_List(json_cols) |
+-----------------+-----------------------------+
| MariaDB T-shirt | ["color","size"]            |
| Thinkpad Laptop | ["color","warranty"]        |
+-----------------+-----------------------------+

As we have seen before we can also get the full JSON value of the column as it is plain text. This is very useful for a client application that directly manipulate JSON (JAVASCRIPT app for example).

SELECT item_name, json_cols FROM jassets;
+-----------------+----------------------------------------+
| item_name       | json_cols                              |
+-----------------+----------------------------------------+
| MariaDB T-shirt | {"color":"blue","size":"XL"}           |
| Thinkpad Laptop | {"color":"black","warranty":"3 years"} |
+-----------------+----------------------------------------+

This new way of addressing the dynamic column problem with JSON format gives great flexibility. This approach is open to the use of a JSON friendly language like JAVASCRIPT on the client side. In some use cases the full JSON content will be sent to the client were it will be created, validated and modified.

This JSON access through MariaDB CONNECT storage engine should not be confused with the native JSON support available in MySQL 5.7 (also in Facebook Docstore). Storing JSON in a binary optimized format or storing JSON in a standart text have respective advantages and drawbacks.

The advantage of the text format is that it works with any storage engine : innoDB, Tokudb, RocksDB.
If the JSON content is handled by the client there will be no overhead paid on the MariaDB server for validation and parsing. The binary format also has a space consumption impact.

On the contrary the binary format might be more efficient for server side manipulations.

MariaDB CONNECT Storage Engine JSON Autodiscovery

The MariaDB CONNECT storage engine offers access to JSON file and allows you to see a external JSON file as a MariaDB table.
A nice feature of the CONNECT storage Engine is its capability to auto discover a table structure when the table correspond to external data. In our case the CONNECT storage engine will automatically define the columns based on the JSON file hierarchical structure.
This is possible thanks to a feature of MariaDB Storage Engine API. This auto discovery feature allows a create statement to delegate the discovery of the table structure to the chosen storage engine.

The CONNECT storage Engine also use this auto discovery feature for other table types like the ODBC table. This alleviates the developer of the burden of defining all the columns that have already been defined in the target data source.

Let us start from the file “biblio3.json”. This file consists in an array of objects. Some object’s properties have object or array values :

[
{
  "ISBN": "9782212090819",
  "LANG": "fr",
  "SUBJECT": "applications",
  "AUTHOR": [
    {
      "FIRSTNAME": "Jean-Christophe",
      "LASTNAME": "Bernadac"
    },
    {
      "FIRSTNAME": "François",
      "LASTNAME": "Knab"
    }
  ],
  "TITLE": "Construire une application XML",
  "PUBLISHER": {
    "NAME": "Eyrolles",
    "PLACE": "Paris"
  },
  "DATEPUB": 1999
  },
  {
    "ISBN": "9782840825685",
    "LANG": "fr",
    "SUBJECT": "applications",
    "AUTHOR": [
      {
        "FIRSTNAME": "William J.",
        "LASTNAME": "Pardi"
      }
    ],
    "TITLE": "XML en Action",
    "TRANSLATED": {
      "PREFIX": "adapté de l'anglais par",
      "TRANSLATOR": {
        "FIRSTNAME": "James",
        "LASTNAME": "Guerin"
      }
    },
    "PUBLISHER": {
      "NAME": "Microsoft Press",
      "PLACE": "Paris"
    },
    "DATEPUB": 1999
  }
]

To use the CONNECT storage engine JSON autodiscovery capabilities the  ’create table‘ is minimalist. There is no need to describe what should be retrieved from the JSON file and how it should be processed. :

MariaDB > CREATE TABLE jsample engine=CONNECT table_type=JSON 
file_name='/var/lib/mysql/json/biblio3.jsn';

We can check how the table has been created with the SHOW CREATE TABLE statement:

MariaDB > show create table jsample;
...
CREATE TABLE `jsample` (
  `ISBN` char(13) NOT NULL,
  `LANG` char(2) NOT NULL,
  `SUBJECT` char(12) NOT NULL,
  `AUTHOR` varchar(256) DEFAULT NULL,
  `TITLE` char(30) NOT NULL,
  `TRANSLATED` varchar(256) DEFAULT NULL,
  `PUBLISHER` varchar(256) DEFAULT NULL,
  `DATEPUB` int(4) NOT NULL
) ENGINE=CONNECT DEFAULT CHARSET=latin1 `TABLE_TYPE`='JSON' `FILE_NAME`='biblio3.json';

Autodiscovery does a mapping from JSON hierarchical data to tabular data. Doing a CREATE TABLE with auto discovery is a good way to help build a more precise create table once the JSON structure has been discovered. After autodiscovery the column names come from the objects keys and the column definition is deducted from the objects values.

The column sizes is calculated from the file as the maximum length of the corresponding column when it was a normal value. For columns that are JSON arrays or objects, the column is specified as a VARCHAR string of length 256, supposedly big enough to contain the sub-object’s concatenated values. Nullable is set to true if the column is null or missing in some rows or if its JPATH contains arrays.

MariaDB [test]> select * from jsample;
+---------------+------+--------------+--------------------------+--------------------------------+--------------------------+--------------+-----------------------+---------+
| ISBN          | LANG | SUBJECT      | AUTHOR                   | TITLE                          | TRANSLATION              | TRANSLATOR   | PUBLISHER             | DATEPUB |
+---------------+------+--------------+--------------------------+--------------------------------+--------------------------+--------------+-----------------------+---------+
| 9782212090819 | fr   | applications | Jean-Christophe Bernadac | Construire une application XML | NULL                     | NULL         | Eyrolles Paris        |    1999 |
| 9782840825685 | fr   | applications | William J. Pardi         | XML en Action                  | adapté de l'anglais par  | James Guerin | Microsoft Press Paris |    1999 |
+---------------+------+--------------+--------------------------+--------------------------------+--------------------------+--------------+-----------------------+---------+
2 rows in set (0.00 sec)

It is important to notice that CONNECT cannot guess what you want to do with arrays. Here the AUTHOR array is left undefined, which means that only its first value will be retrieved. You can specify through a CONNECT table creation parameter if you want to present this array in a normalized or denormalized way (“Expand=AUTHOR” in the option list) with all the array elements presented.

It is also important to notice that by default the autodiscovery feature only looks at the first level of the JSON structure.
If a more complex definition is desired, you can ask CONNECT to analyze the JPATH up to a given level using the LEVEL option in the option list. The level value is the number of sub-objects that are taken in the JPATH. For instance:

create table jsample2 engine=connect table_type=JSON 
file_name='biblio3.json' 
option_list='level=1';

This will define the table as:

CREATE TABLE `jsample2` (
  `ISBN` char(13) NOT NULL,
  `LANG` char(2) NOT NULL,
  `SUBJECT` char(12) NOT NULL,
  `AUTHOR_FIRSTNAME` char(15) NOT NULL `FIELD_FORMAT`='AUTHOR::FIRSTNAME',
  `AUTHOR_LASTNAME` char(8) NOT NULL `FIELD_FORMAT`='AUTHOR::LASTNAME',
  `TITLE` char(30) NOT NULL,
  `TRANSLATED_PREFIX` char(23) DEFAULT NULL `FIELD_FORMAT`='TRANSLATED:PREFIX',
  `TRANSLATED_TRANSLATOR` varchar(256) DEFAULT NULL `FIELD_FORMAT`='TRANSLATED:TRANSLATOR',
  `PUBLISHER_NAME` char(15) NOT NULL `FIELD_FORMAT`='PUBLISHER:NAME',
  `PUBLISHER_PLACE` char(5) NOT NULL `FIELD_FORMAT`='PUBLISHER:PLACE',
  `DATEPUB` int(4) NOT NULL
) ENGINE=CONNECT DEFAULT CHARSET=latin1 
`TABLE_TYPE`='JSON' `FILE_NAME`='biblio3.json' `OPTION_LIST`='level=1';

In this table definition the column names come from concatenation of attributes in the JSON hierarchy. For example AUTHOR_FIRSTNAME comes from the concatenation of AUTHOR and FIRSTNAME attribute name.

MariaDB [test]> select * from jsample2;
+---------------+------+--------------+------------------+-----------------+--------------------------------+--------------------------+----------------------+---------------------+-----------------+-----------------+---------+
| ISBN          | LANG | SUBJECT      | AUTHOR_FIRSTNAME | AUTHOR_LASTNAME | TITLE                          | TRANSLATION              | TRANSLATOR_FIRSTNAME | TRANSLATOR_LASTNAME | PUBLISHER_NAME  | PUBLISHER_PLACE | DATEPUB |
+---------------+------+--------------+------------------+-----------------+--------------------------------+--------------------------+----------------------+---------------------+-----------------+-----------------+---------+
| 9782212090819 | fr   | applications | Jean-Christophe  | Bernadac        | Construire une application XML | NULL                     | NULL                 | NULL                | Eyrolles        | Paris           |    1999 |
| 9782840825685 | fr   | applications | William J.       | Pardi           | XML en Action                  | adapté de l'anglais par  | James                | Guerin              | Microsoft Press | Paris           |    1999 |
+---------------+------+--------------+------------------+-----------------+--------------------------------+--------------------------+----------------------+---------------------+-----------------+-----------------+---------+
2 rows in set (0.00 sec)

This method can be used as a quick way to make a “template” table definition that can later be edited to make the desired definition. Column names are constructed from all the object keys of their path in order to have distinct column names. This can be manually edited to have the desired names, provided their JPATH key names are not modified.

Here we want the array of authors to be transformed into a concatenated string.

CREATE TABLE `jsample6` (
    -> `ISBN` char(13) NOT NULL,
    -> `AUTHOR` char(128) DEFAULT NULL `field_format`='AUTHOR:[" and "]'
    -> ) ENGINE=CONNECT DEFAULT CHARSET=latin1 `TABLE_TYPE`='JSON' `FILE_NAME`='biblio.json' ;
Query OK, 0 rows affected (0.00 sec)

This give this flat representation of the author array attribute :

MariaDB [test]> select * from jsample6;
+---------------+---------------------------------------------+
| ISBN          | AUTHOR                                      |
+---------------+---------------------------------------------+
| 9782212090819 | Jean-Christophe Bernadac and François Knab  |
| 9782840825685 | William J. Pardi                            |
+---------------+---------------------------------------------+
2 rows in set (0.00 sec)

MariaDB CONNECT Storage Engine JSON Autodiscovery capability can be a very easy way to search or compute aggregates on a JSON structured file. This can be very useful as loading this hierarchical data in tables can be quite complex. You get the benefit of well known SQL query language.

This JSON file access through MariaDB CONNECT storage engine should not be confused with the native JSON support available in MySQL 5.7 (also in Facebook Docstore). MariaDB CONNECT storage engine target is to provide access to data sources that remain external to MariaDB.

Happy to see this MySQL JSON momentum !

Last week at Percona Live Facebook has presented for the first time Docstore which is a native JSON implementation in MySQL. Oracle has also presented their MySQL 5.7 lab release that includes the implementation of a native JSON type.
This is an important move as MySQL was behind other other RDMS regarding JSON (PostgreSQL already had a nice and complete implementation). JSON being widely adopted in various area (JS dev, data exchange and database format, …) this move was awaited.

The 2 implementations of Native JSON type are different but they both provide validation of input data and a binary format JSON storage.

If we look at the native JSON Oracle implementation the native JSON format is optimized for property retrieval. To achieve good performance the binary JSON storage format includes a preamble with a lookup table. This lookup table will speedup key/value pair access when executing functions like JSN_EXTRACT.
Speeding up JSN_EXTRACT is important for indexing JSON fields. For the Oracle implementation the JSON Indexes are indexes on a non-materialized Virtual Columns(functional indexes). These virtual column are a call to JSN_EXTRACT for the JSON property we want to index.

The Facebook native JSON implementation use a docpath syntax that is understood by the MySQL parser. It is possible to have a secondary composite index containing a JSON index part. The optimizer can use the secondary index for covering index, index only range scan … There is not yet a full optimizer support but this is a good start.
Contrary to Facebook, Oracle has not modified the parser and all is wrapped by JSON helper functions.

The main difference I sees between these 2 implementation is the fact that with oracle implementation the secondary index is bound to a virtual column that need to be in the DDL. With the Oracle implementation it is not possible to have a composite index mixing a regular column. So for the moment the Facebook implementation seems to be more flexible on this part.
What I like about the Oracle implementation is that the provided JSON function work both for native JSON format or for JSON text. That might explain why the udf function lab release has been remove from downloads.

So will it still make sense to store JSON as a text field ?
It depends on use case. Having no validation, no binary format conversion in and out to present the JSON payload to the application makes it faster for insert. For small JSON document the speedup resulting from lookup might be marginal when running JSN_EXTRACT. Building the index entry is just done once so access through the index will not have to redo the JSON_EXTRACT.

What I really like about this addition of the JSON format to MySQL is that we are able to have an hybrid approach mixing tabular data with JSON column. Contrary to JSON only, schema less datastore like MongoDB this hybrid data store might turn out to be the best choice for a wide range of application and might please developers.

Select from a mariaDB table into JSON format ?

Can we output content of a mariaDB table into JSON using the CONNECT Storage Engine ?
MariaDB and MySQL are not currently JSON friendly databases. The usage of JSON in MySQL is almost inexistent with the excetion of the explain JSON output and of MariaDB dynamic column dump.
There also exist a few udf (in MySQL lab area) to help manipulate JSON strings(searching, modification …).

A basic question often raised is how to output the content of a MariaDB table in JSON format. Can the MariaDB CONNECT storage engine help do that ?

Suppose we have a simple table :

MariaDB > create table book
(
  id       integer,
  isbn     varchar(20),
  lang     varchar(20),
  subject  varchar(80),
  title    varchar(80),
  datepub  date
)  engine=innodb;

Then insert a few rows.

insert into book values (1,'9782212090819','fr','applications','Construire une application XML','2008-7-04');
insert into book values (1,'9782840825685','fr','applications','XML en Action','1999-7-04');

There is no direct way to query this innoDB table and get the result in a JSON format. Many users have written PHP(or other language) scripts to do this in a more or less generic way.
Let us create a CONNECT/JSON table based on this innoDB table.

MariaDB > create table bookj  
  engine=CONNECT table_type=JSON 
  file_name='/var/lib/mysql/json/bookj.jsn'  
  as select * from book;

This create table will produce a json file with the JSON representation of the table content :

bookj.jsn :
[
	{
		"id": 1,
		"isbn": "9782212090819",
		"lang": "fr",
		"subject": "applications",
		"title": "Construire une application XML",
		"datepub": 1215129600
	},
	{
		"id": 2,
		"isbn": "9782840825685",
		"lang": "fr",
		"subject": "applications",
		"title": "XML en Action",
		"datepub": 931046400
	}
]

But how can we directly request the JSON representation through a select ?
We will create a second CONNECT/JSON table based on the same underlying file produced by previous step.

create table bookj2 (
	ROW varchar(500) field_format='*'
)
engine=CONNECT table_type=JSON file_name='/var/lib/mysql/json/bookj.jsn';

The field_format=’*’ is a JSONPath meaning that we take the JSON as is and do not try to navigate properties.

MariaDB > select * from bookj2;
+------------------------------------------------------------------------------------------------------------------------------------+
| ROW                                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------+
| {"id":1,"isbn":"9782212090819","lang":"fr","subject":"applications","title":"Construire une application XML","datepub":1215129600} |
| {"id":2,"isbn":"9782840825685","lang":"fr","subject":"applications","title":"XML en Action","datepub":931046400}                   |
+------------------------------------------------------------------------------------------------------------------------------------+

That is not really simple but that do the job. The structure produced is a flat structure without array or embeded JSON.
In the real world the equivalent of a JSON hierarchy is a set of tables linked through foreign keys. With MariaDB this can also include dynamic column for multi values columns.

MariaDB CONNECT Storage Engine and JSONPath

MariaDB CONNECT Storage Engine allows to access a JSON file as if it was a local table.
The mapping between a hierarchical structure and a relational structure is not ‘natural’ and needs to be defined. To achieve that we need a specific syntax. We associate with each column a ‘field_format’ attribute. This ‘field_format’ column attribute defines what property in the JSON hierarchy we want to map to the MariaDB table column.

With field_format=’AUTHOR:*’ the AUTHOR column which is an array of objects will remain in JSON format (We use the same JSON file example as in the previous post).

MariaDB > create table bibop1 (
     ISBN char(15),
     AUTHOR char(200) field_format='AUTHOR:*',
     PUBLISHER char(20) field_format='PUBLISHER:NAME')
     engine=CONNECT table_type=JSON file_name='/var/lib/mysql/json/biblio3.jsn';
 
MariaDB > select * from bibop1;
+-------------------------------------------------------------------------------------|
| ISBN          | AUTHOR                                            | PUBLISHER       |
+-------------------------------------------------------------------------------------|
| 9782212090819 | [{"FIRSTNAME":"Jean-Christophe","LASTNAME":"Bernadac"},
…

With field_format=’AUTHOR:[]‘ the AUTHOR column will be represented as a coma separated list. Each JSON object is represented as a concatenated string of its properties.

+---------------+------------------------------------------------------------------------|
| ISBN          | AUTHOR                                               | PUBLISHER       |
+---------------+------------------------------------------------------------------------|
| 9782212090819 | Jean-Christophe Bernadac, John Knab, Charles Dickens | Eyrolles        |

To obtain a normalize view of Author first name and last name we associate column AuthorFN with field_format=’AUTHOR:[X]:FIRSTNAME’ and obtain a presentation with each author split on a line.

+---------------+-----------------+----------+
| ISBN          | AuthorFN        | AuthorLN |
+---------------+-----------------+----------+
| 9782212090819 | Jean-Christophe | Bernadac |

Here is a more exhaustive view of the various possibilities offered for the mapping:
We will see later that we can also use this syntax to modify a JSON document including array embedded in an objects.

Specification Array Type Limit Description
[n] All N.A Take the nth value of the array. Ignore it if n is 0.
[X] or [x] All Expand. Generate one row for each array value.
["string"] String Concatenate all values separated by the specified string.
[+] Numeric Make the sum of all the array values.
[*] Numeric Make the product of all array values.
[!] Numeric Make the average of all the array values.
[>] or [<] All Return the greatest or least value of the array.
[#] All N.A Return the number of values in the array.
[] All Expand if under an expanded object. Otherwise sum if numeric, else concatenation separated by “, “.
All N.A If an array, expand it if under an expanded object or take the first value of it.

We all know how XPAH works for XML documents. There has been some work done to define a JSONPath syntax to access JSON document content. The scope of what has been defined by JSONPath is broader than what we need for the MariaDB CONNECT storage Engine to define the mapping between table columns and JSON object properties. Here is the syntax available with JSONPath (with equivalent XPATH syntax):

XPath JSONPath Description
/ $ the root object/element
. @ the current object/element
/ . or [] child operator
.. n/a parent operator
// .. recursive descent. JSONPath borrows this syntax from E4X.
* * wildcard. All objects/elements regardless their names.
@ n/a attribute access. JSON structures don’t have attributes.
[] [] subscript operator. XPath uses it to iterate over element collections and for predicates. In Javascript and JSON it is the native array operator.
| [,] Union operator in XPath results in a combination of node sets. JSONPath allows alternate names or array indices as a set.
n/a [start:end:step] array slice operator borrowed from ES4.
[] ?() applies a filter (script) expression.
n/a () script expression, using the underlying script engine.
() n/a grouping in Xpath

A more complete description of What has been proposed for JSONPath is here
MySQL JSON UDF functions version 0.4.0 use part of this syntax for function like JSON_EXTRACT or JSON_SEARCH.
With MySQL JSON udf you can also express a path as a variable list of keyparts. Each keypart being a JSON property or an index in a JSON array : select json_extract(doc, keypart1, keypart2 ...)
PostgreSQL with its native JSON also uses a variable list of path elements to define a JSON path. For example
json_extract_path_text('{"f2":{"f3":1},"f4":{"f5":99,"f6":"foo"}}','f4', 'f6') where the keypath is 'f4', 'f6' return foo.

We will discover at Percona Live what syntax and set of helper functions have been chosen by Facebook and Oracle for their respective MySQL Native JSON type implementation. In fact I do not know if the two implementations are different. We will see.

The JSONPath syntax also allows to define queries on the JSON data. The following JSONPath query

$..book[?(@.price<10)]

will filter all books cheaper than 10 when applied to the following JSON document :

{ "store": {
    "book": [
      { "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      { "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      },
      { "category": "fiction",
        "author": "Herman Melville",
        "title": "Moby Dick",
        "isbn": "0-553-21311-3",
        "price": 8.99
      },
      { "category": "fiction",
        "author": "J. R. R. Tolkien",
        "title": "The Lord of the Rings",
        "isbn": "0-395-19395-8",
        "price": 22.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
  }
}

With CONNECT we will first do the mapping of JSON properties to table columns. We will then use plain SQL to do the query.

As MariaDB CONNECT Storage Engine JSON functionality is still experimental your feedback on bugs, improvements, features is welcome. To fill bug it is https://mariadb.atlassian.net/projects/MDEV

Additional info :
MariaDB Knowledge Base Documentation for CONNECT Storage Engine CONNECT
github repository for MariaDB 10.0 including CONNECT Storage Engine
Presentation at future Percona Live 2015 SantaClara conference :
MariaDB CONNECT Storage Engine to simplify heterogeneous data access (JSON focus)

MariaDB CONNECT storage engine now offers access to JSON

The MariaDB CONNECT storage engine now offers access to JSON file and allows you to see a external JSON file as a MariaDB table. JSON (JavaScript Object Notation) is a lightweight data-interchange format widely used on the Internet.
JSON like XML represents data hierarchically. The mapping from hierarchical data to tabular data needs to be specified.

To use the CONNECT storage engine JSON capabilities the  ’create table‘ describes what should be retrieved from the file and how it should be processed. Let us start from the file “biblio3.json”. This file consists in an array of objects. Some object’s properties have object or array values :

[
{
  "ISBN": "9782212090819",
  "LANG": "fr",
  "SUBJECT": "applications",
  "AUTHOR": [
    {
      "FIRSTNAME": "Jean-Christophe",
      "LASTNAME": "Bernadac"
    },
    {
      "FIRSTNAME": "François",
      "LASTNAME": "Knab"
    }
  ],
  "TITLE": "Construire une application XML",
  "PUBLISHER": {
    "NAME": "Eyrolles",
    "PLACE": "Paris"
  },
  "DATEPUB": 1999
  },
  {
    "ISBN": "9782840825685",
    "LANG": "fr",
    "SUBJECT": "applications",
    "AUTHOR": [
      {
        "FIRSTNAME": "William J.",
        "LASTNAME": "Pardi"
      }
    ],
    "TITLE": "XML en Action",
    "TRANSLATED": {
      "PREFIX": "adapté de l'anglais par",
      "TRANSLATOR": {
        "FIRSTNAME": "James",
        "LASTNAME": "Guerin"
      }
    },
    "PUBLISHER": {
      "NAME": "Microsoft Press",
      "PLACE": "Paris"
    },
    "DATEPUB": 1999
  }
]

We can see this JSON hierarchical collection of objects that way :

                      |                  …                             |
            <BOOK:ISBN,LANG,SUBJECT>                     <BOOK:ISBN,LANG,SUBJECT>
        ______________|_______________          ________________________|__________________
       |        |         |          |          |            |         |        |         |
   <AUTHOR> <TITLE> <PUBLISHER> <DATEPUB>   <AUTHOR> <TRANSLATOR> <TITLE> <PUBLISHER> <DATEPUB>
    ____|____            ___|____          _____|_        ___|___            ___|____ 
   |    |    |          |        |        |       |      |       |          |        | 
<FIRST> | <LAST>     <NAME>   <PLACE>   <FIRST> <LAST> <FIRST> <LAST>     <NAME> <PLACE>
        |
     <AUTHOR> 
    ____|____   
   |         |
<FIRST>   <LAST>

The create table options allow to specifying how JSON file’s properties will be mapped to table columns :

MariaDB > create table bibop1 (
     ISBN char(15),
     AUTHOR char(200) field_format='AUTHOR:*',
     PUBLISHER char(20) field_format='PUBLISHER:NAME')
     engine=CONNECT table_type=JSON file_name='/var/lib/mysql/json/biblio3.jsn';
 
MariaDB > select * from bibop1;
+-------------------------------------------------------------------------------------|
| ISBN          | AUTHOR                                            | PUBLISHER       |
+-------------------------------------------------------------------------------------|
| 9782212090819 | [{"FIRSTNAME":"Jean-Christophe","LASTNAME":"Bernadac"},
                   {"FIRSTNAME":"John","LASTNAME":"Knab"},
                   {"FIRSTNAME":"Charles","LASTNAME":"Dickens"}]
                                                                    | Eyrolles        |
| 9782840825685 | [{"FIRSTNAME":"William J.","LASTNAME":"Pardi"}]
                                                                    | Microsoft Press |
+-------------------------------------------------------------------------------------|

In that table part of the JSON format is preserved as a JSON text. This happens as we have defined the AUTHOR column with field_format=’AUTHOR:*’

MariaDB > create table bibop2 (
     ISBN char(15),
     AUTHOR char(200) field_format='AUTHOR:[]',
     PUBLISHER char(20) field_format='PUBLISHER:NAME')
     engine=CONNECT table_type=JSON file_name='/var/lib/mysql/json/biblio3.jsn';

In that case the array is represented by values comma separated(we could choose a different separator like a word ‘AND’). This happens as we have defined the AUTHOR column with field_format=’AUTHOR:[]‘

MariaDB > select * from bibop2;
+---------------+------------------------------------------------------------------------|
| ISBN          | AUTHOR                                               | PUBLISHER       |
+---------------+------------------------------------------------------------------------|
| 9782212090819 | Jean-Christophe Bernadac, John Knab, Charles Dickens | Eyrolles        |
| 9782840825685 | William J. Pardi                                     | Microsoft Press |
+---------------+------------------------------------------------------------------------|

We can also choose to normalize the presentation with one author name by row.We can define the AuthorFN column with field_format=’AUTHOR:[X]:FIRSTNAME’

MariaDB > create table bibop3 (
     ISBN char(15),
     AuthorFN char(128) field_format='AUTHOR:[X]:FIRSTNAME',
     AuthorLN char(128) field_format='AUTHOR:[X]:LASTNAME')
     engine=CONNECT table_type=JSON file_name='/var/lib/mysql/json/biblio3.jsn';
 
MariaDB > select * from bibop3;
+---------------+-----------------+----------+
| ISBN          | AuthorFN        | AuthorLN |
+---------------+-----------------+----------+
| 9782212090819 | Jean-Christophe | Bernadac |
| 9782212090819 | John            | Knab     |
| 9782212090819 | Charles         | Dickens  |
| 9782840825685 | William J.      | Pardi    |
+---------------+-----------------+----------+

All these CONNECT tables can exist simultaneously. They can be considered as view on the same underlying raw JSON file. The CONNECT storage engine keeps with JSON the same principle it does with other data sources : the data remains fully external.

MySQL going to challenge MongoDB with native JSON ?

When looking at Percona live Santa Clara 2015 agenda I saw two intriguing conferences :
- JSON support in MySQL 5.7
– Docstore: document database for MySQL at Facebook

Both conference are talking about supporting a native JSON type in MySQL ! This sounds really great. I have never seen any blog post or github repository around this subject. This is a good surprise. Having in MySQL a native JSON datatype with set of functions for handling JSON documents, with indexing capabilities, and with good underlying storage sounds very cool!

MongoDB and Couchbase as documents stores have open the road to JSON usage by a database. They built their database on a JSON storage format. PostgreSQL has also introduced a native JSON type with associated functions.

But MongoDB storage foundation were not that solid. Memory-mapped files were the critical piece of the storage engine in MongoDB. The locking with was done at database level. The atomicity of a commit was at a document level. All this impacted concurrency capabilities of the data store.
The NoSQL paradigm of MongoDB also means no join capabilities. It has lead to highly denormalised data model. Excessive denormalisation can hurt as it incurs high space consumption. Efficient indexing required padding to avoid too much work rebuilding index when data changed.  The schema-less nature of the database were the schema is in the code is problematic for application evolution.

Then came TokuDB that build TokuMX to solve some of the flaws of MongoDB. The nice idea was to use the efficient storage capabilities of TokuDB. The storage was more solid with true recoverability, transactions, excellent compression and online operations.

MongoDB maybe inspired by what TokuMX achieved has recently decided to change their product’s architecture. MongoDB can now use the WiredTiger storage engine.  This provides document-level locking and compression. But as we all know changing the foundation a house while you live in is not that easy.

The MySQL approach is different. The foundation are very solid. The InnoDB transactional storage engine has huge qualities : high concurrency, reliability, safe crash recovery, on line operations, reliable replication capabilities …

Building a document store on these foundations might lead to some good end result. We have to wait for feedback from the field . Benchmarks need to be run to show how much one can expect from the MySQL /JSON approach. With the newly added JSON support in MySQL, you can combine the flexibility of NoSQL with the strength of a relational database. I am a big fan of this hybrid approach. This dual approach mixing relational content with the flexibility of schemaless  JSON might turn very attractive in a developer perspective.