Skip to content

MariaDB CONNECT storage engine now offers access to JSON

The MariaDB CONNECT storage engine now offers access to JSON file and allows you to see a external JSON file as a MariaDB table. JSON (JavaScript Object Notation) is a lightweight data-interchange format widely used on the Internet.
JSON like XML represents data hierarchically. The mapping from hierarchical data to tabular data needs to be specified.

To use the CONNECT storage engine JSON capabilities the  'create table' describes what should be retrieved from the file and how it should be processed. Let us start from the file “biblio3.json”. This file consists in an array of objects. Some object's properties have object or array values :

[
{
  "ISBN": "9782212090819",
  "LANG": "fr",
  "SUBJECT": "applications",
  "AUTHOR": [
    {
      "FIRSTNAME": "Jean-Christophe",
      "LASTNAME": "Bernadac"
    },
    {
      "FIRSTNAME": "François",
      "LASTNAME": "Knab"
    }
  ],
  "TITLE": "Construire une application XML",
  "PUBLISHER": {
    "NAME": "Eyrolles",
    "PLACE": "Paris"
  },
  "DATEPUB": 1999
  },
  {
    "ISBN": "9782840825685",
    "LANG": "fr",
    "SUBJECT": "applications",
    "AUTHOR": [
      {
        "FIRSTNAME": "William J.",
        "LASTNAME": "Pardi"
      }
    ],
    "TITLE": "XML en Action",
    "TRANSLATED": {
      "PREFIX": "adapté de l'anglais par",
      "TRANSLATOR": {
        "FIRSTNAME": "James",
        "LASTNAME": "Guerin"
      }
    },
    "PUBLISHER": {
      "NAME": "Microsoft Press",
      "PLACE": "Paris"
    },
    "DATEPUB": 1999
  }
]

We can see this JSON hierarchical collection of objects that way :

                      |                  …                             |
            <BOOK:ISBN,LANG,SUBJECT>                     <BOOK:ISBN,LANG,SUBJECT>
        ______________|_______________          ________________________|__________________
       |        |         |          |          |            |         |        |         |
   <AUTHOR> <TITLE> <PUBLISHER> <DATEPUB>   <AUTHOR> <TRANSLATOR> <TITLE> <PUBLISHER> <DATEPUB>
    ____|____            ___|____          _____|_        ___|___            ___|____ 
   |    |    |          |        |        |       |      |       |          |        | 
<FIRST> | <LAST>     <NAME>   <PLACE>   <FIRST> <LAST> <FIRST> <LAST>     <NAME> <PLACE>
        |
     <AUTHOR> 
    ____|____   
   |         |
<FIRST>   <LAST>

The create table options allow to specifying how JSON file's properties will be mapped to table columns :

MariaDB > create table bibop1 (
     ISBN char(15),
     AUTHOR char(200) field_format='AUTHOR:*',
     PUBLISHER char(20) field_format='PUBLISHER:NAME')
     engine=CONNECT table_type=JSON file_name='/var/lib/mysql/json/biblio3.jsn';
 
MariaDB > select * from bibop1;
+-------------------------------------------------------------------------------------|
| ISBN          | AUTHOR                                            | PUBLISHER       |
+-------------------------------------------------------------------------------------|
| 9782212090819 | [{"FIRSTNAME":"Jean-Christophe","LASTNAME":"Bernadac"},
                   {"FIRSTNAME":"John","LASTNAME":"Knab"},
                   {"FIRSTNAME":"Charles","LASTNAME":"Dickens"}]
                                                                    | Eyrolles        |
| 9782840825685 | [{"FIRSTNAME":"William J.","LASTNAME":"Pardi"}]
                                                                    | Microsoft Press |
+-------------------------------------------------------------------------------------|

In that table part of the JSON format is preserved as a JSON text. This happens as we have defined the AUTHOR column with field_format='AUTHOR:*'

MariaDB > create table bibop2 (
     ISBN char(15),
     AUTHOR char(200) field_format='AUTHOR:[]',
     PUBLISHER char(20) field_format='PUBLISHER:NAME')
     engine=CONNECT table_type=JSON file_name='/var/lib/mysql/json/biblio3.jsn';

In that case the array is represented by values comma separated(we could choose a different separator like a word 'AND'). This happens as we have defined the AUTHOR column with field_format='AUTHOR:[]'

MariaDB > select * from bibop2;
+---------------+------------------------------------------------------------------------|
| ISBN          | AUTHOR                                               | PUBLISHER       |
+---------------+------------------------------------------------------------------------|
| 9782212090819 | Jean-Christophe Bernadac, John Knab, Charles Dickens | Eyrolles        |
| 9782840825685 | William J. Pardi                                     | Microsoft Press |
+---------------+------------------------------------------------------------------------|

We can also choose to normalize the presentation with one author name by row.We can define the AuthorFN column with field_format='AUTHOR:[X]:FIRSTNAME'

MariaDB > create table bibop3 (
     ISBN char(15),
     AuthorFN char(128) field_format='AUTHOR:[X]:FIRSTNAME',
     AuthorLN char(128) field_format='AUTHOR:[X]:LASTNAME')
     engine=CONNECT table_type=JSON file_name='/var/lib/mysql/json/biblio3.jsn';
 
MariaDB > select * from bibop3;
+---------------+-----------------+----------+
| ISBN          | AuthorFN        | AuthorLN |
+---------------+-----------------+----------+
| 9782212090819 | Jean-Christophe | Bernadac |
| 9782212090819 | John            | Knab     |
| 9782212090819 | Charles         | Dickens  |
| 9782840825685 | William J.      | Pardi    |
+---------------+-----------------+----------+

All these CONNECT tables can exist simultaneously. They can be considered as view on the same underlying raw JSON file. The CONNECT storage engine keeps with JSON the same principle it does with other data sources : the data remains fully external.

Leave a Reply

Your email address will not be published. Required fields are marked *