MariaDB CONNECT Storage Engine JSON Autodiscovery

The MariaDB CONNECT storage engine offers access to JSON file and allows you to see a external JSON file as a MariaDB table.
A nice feature of the CONNECT storage Engine is its capability to auto discover a table structure when the table correspond to external data. In our case the CONNECT storage engine will automatically define the columns based on the JSON file hierarchical structure.
This is possible thanks to a feature of MariaDB Storage Engine API. This auto discovery feature allows a create statement to delegate the discovery of the table structure to the chosen storage engine.

The CONNECT storage Engine also use this auto discovery feature for other table types like the ODBC table. This alleviates the developer of the burden of defining all the columns that have already been defined in the target data source.

Let us start from the file “biblio3.json”. This file consists in an array of objects. Some object’s properties have object or array values :

[
{
  "ISBN": "9782212090819",
  "LANG": "fr",
  "SUBJECT": "applications",
  "AUTHOR": [
    {
      "FIRSTNAME": "Jean-Christophe",
      "LASTNAME": "Bernadac"
    },
    {
      "FIRSTNAME": "François",
      "LASTNAME": "Knab"
    }
  ],
  "TITLE": "Construire une application XML",
  "PUBLISHER": {
    "NAME": "Eyrolles",
    "PLACE": "Paris"
  },
  "DATEPUB": 1999
  },
  {
    "ISBN": "9782840825685",
    "LANG": "fr",
    "SUBJECT": "applications",
    "AUTHOR": [
      {
        "FIRSTNAME": "William J.",
        "LASTNAME": "Pardi"
      }
    ],
    "TITLE": "XML en Action",
    "TRANSLATED": {
      "PREFIX": "adapté de l'anglais par",
      "TRANSLATOR": {
        "FIRSTNAME": "James",
        "LASTNAME": "Guerin"
      }
    },
    "PUBLISHER": {
      "NAME": "Microsoft Press",
      "PLACE": "Paris"
    },
    "DATEPUB": 1999
  }
]

To use the CONNECT storage engine JSON autodiscovery capabilities the  ’create table‘ is minimalist. There is no need to describe what should be retrieved from the JSON file and how it should be processed. :

MariaDB > CREATE TABLE jsample engine=CONNECT table_type=JSON 
file_name='/var/lib/mysql/json/biblio3.jsn';

We can check how the table has been created with the SHOW CREATE TABLE statement:

MariaDB > show create table jsample;
...
CREATE TABLE `jsample` (
  `ISBN` char(13) NOT NULL,
  `LANG` char(2) NOT NULL,
  `SUBJECT` char(12) NOT NULL,
  `AUTHOR` varchar(256) DEFAULT NULL,
  `TITLE` char(30) NOT NULL,
  `TRANSLATED` varchar(256) DEFAULT NULL,
  `PUBLISHER` varchar(256) DEFAULT NULL,
  `DATEPUB` int(4) NOT NULL
) ENGINE=CONNECT DEFAULT CHARSET=latin1 `TABLE_TYPE`='JSON' `FILE_NAME`='biblio3.json';

Autodiscovery does a mapping from JSON hierarchical data to tabular data. Doing a CREATE TABLE with auto discovery is a good way to help build a more precise create table once the JSON structure has been discovered. After autodiscovery the column names come from the objects keys and the column definition is deducted from the objects values.

The column sizes is calculated from the file as the maximum length of the corresponding column when it was a normal value. For columns that are JSON arrays or objects, the column is specified as a VARCHAR string of length 256, supposedly big enough to contain the sub-object’s concatenated values. Nullable is set to true if the column is null or missing in some rows or if its JPATH contains arrays.

MariaDB [test]> select * from jsample;
+---------------+------+--------------+--------------------------+--------------------------------+--------------------------+--------------+-----------------------+---------+
| ISBN          | LANG | SUBJECT      | AUTHOR                   | TITLE                          | TRANSLATION              | TRANSLATOR   | PUBLISHER             | DATEPUB |
+---------------+------+--------------+--------------------------+--------------------------------+--------------------------+--------------+-----------------------+---------+
| 9782212090819 | fr   | applications | Jean-Christophe Bernadac | Construire une application XML | NULL                     | NULL         | Eyrolles Paris        |    1999 |
| 9782840825685 | fr   | applications | William J. Pardi         | XML en Action                  | adapté de l'anglais par  | James Guerin | Microsoft Press Paris |    1999 |
+---------------+------+--------------+--------------------------+--------------------------------+--------------------------+--------------+-----------------------+---------+
2 rows in set (0.00 sec)

It is important to notice that CONNECT cannot guess what you want to do with arrays. Here the AUTHOR array is left undefined, which means that only its first value will be retrieved. You can specify through a CONNECT table creation parameter if you want to present this array in a normalized or denormalized way (“Expand=AUTHOR” in the option list) with all the array elements presented.

It is also important to notice that by default the autodiscovery feature only looks at the first level of the JSON structure.
If a more complex definition is desired, you can ask CONNECT to analyze the JPATH up to a given level using the LEVEL option in the option list. The level value is the number of sub-objects that are taken in the JPATH. For instance:

create table jsample2 engine=connect table_type=JSON 
file_name='biblio3.json' 
option_list='level=1';

This will define the table as:

CREATE TABLE `jsample2` (
  `ISBN` char(13) NOT NULL,
  `LANG` char(2) NOT NULL,
  `SUBJECT` char(12) NOT NULL,
  `AUTHOR_FIRSTNAME` char(15) NOT NULL `FIELD_FORMAT`='AUTHOR::FIRSTNAME',
  `AUTHOR_LASTNAME` char(8) NOT NULL `FIELD_FORMAT`='AUTHOR::LASTNAME',
  `TITLE` char(30) NOT NULL,
  `TRANSLATED_PREFIX` char(23) DEFAULT NULL `FIELD_FORMAT`='TRANSLATED:PREFIX',
  `TRANSLATED_TRANSLATOR` varchar(256) DEFAULT NULL `FIELD_FORMAT`='TRANSLATED:TRANSLATOR',
  `PUBLISHER_NAME` char(15) NOT NULL `FIELD_FORMAT`='PUBLISHER:NAME',
  `PUBLISHER_PLACE` char(5) NOT NULL `FIELD_FORMAT`='PUBLISHER:PLACE',
  `DATEPUB` int(4) NOT NULL
) ENGINE=CONNECT DEFAULT CHARSET=latin1 
`TABLE_TYPE`='JSON' `FILE_NAME`='biblio3.json' `OPTION_LIST`='level=1';

In this table definition the column names come from concatenation of attributes in the JSON hierarchy. For example AUTHOR_FIRSTNAME comes from the concatenation of AUTHOR and FIRSTNAME attribute name.

MariaDB [test]> select * from jsample2;
+---------------+------+--------------+------------------+-----------------+--------------------------------+--------------------------+----------------------+---------------------+-----------------+-----------------+---------+
| ISBN          | LANG | SUBJECT      | AUTHOR_FIRSTNAME | AUTHOR_LASTNAME | TITLE                          | TRANSLATION              | TRANSLATOR_FIRSTNAME | TRANSLATOR_LASTNAME | PUBLISHER_NAME  | PUBLISHER_PLACE | DATEPUB |
+---------------+------+--------------+------------------+-----------------+--------------------------------+--------------------------+----------------------+---------------------+-----------------+-----------------+---------+
| 9782212090819 | fr   | applications | Jean-Christophe  | Bernadac        | Construire une application XML | NULL                     | NULL                 | NULL                | Eyrolles        | Paris           |    1999 |
| 9782840825685 | fr   | applications | William J.       | Pardi           | XML en Action                  | adapté de l'anglais par  | James                | Guerin              | Microsoft Press | Paris           |    1999 |
+---------------+------+--------------+------------------+-----------------+--------------------------------+--------------------------+----------------------+---------------------+-----------------+-----------------+---------+
2 rows in set (0.00 sec)

This method can be used as a quick way to make a “template” table definition that can later be edited to make the desired definition. Column names are constructed from all the object keys of their path in order to have distinct column names. This can be manually edited to have the desired names, provided their JPATH key names are not modified.

Here we want the array of authors to be transformed into a concatenated string.

CREATE TABLE `jsample6` (
    -> `ISBN` char(13) NOT NULL,
    -> `AUTHOR` char(128) DEFAULT NULL `field_format`='AUTHOR:[" and "]'
    -> ) ENGINE=CONNECT DEFAULT CHARSET=latin1 `TABLE_TYPE`='JSON' `FILE_NAME`='biblio.json' ;
Query OK, 0 rows affected (0.00 sec)

This give this flat representation of the author array attribute :

MariaDB [test]> select * from jsample6;
+---------------+---------------------------------------------+
| ISBN          | AUTHOR                                      |
+---------------+---------------------------------------------+
| 9782212090819 | Jean-Christophe Bernadac and François Knab  |
| 9782840825685 | William J. Pardi                            |
+---------------+---------------------------------------------+
2 rows in set (0.00 sec)

MariaDB CONNECT Storage Engine JSON Autodiscovery capability can be a very easy way to search or compute aggregates on a JSON structured file. This can be very useful as loading this hierarchical data in tables can be quite complex. You get the benefit of well known SQL query language.

This JSON file access through MariaDB CONNECT storage engine should not be confused with the native JSON support available in MySQL 5.7 (also in Facebook Docstore). MariaDB CONNECT storage engine target is to provide access to data sources that remain external to MariaDB.

Leave a Reply

  

  

  


9 − five =

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">