MariaDB CONNECT Storage Engine and JSONPath

MariaDB CONNECT Storage Engine allows to access a JSON file as if it was a local table.
The mapping between a hierarchical structure and a relational structure is not ‘natural’ and needs to be defined. To achieve that we need a specific syntax. We associate with each column a ‘field_format’ attribute. This ‘field_format’ column attribute defines what property in the JSON hierarchy we want to map to the MariaDB table column.

With field_format=’AUTHOR:*’ the AUTHOR column which is an array of objects will remain in JSON format (We use the same JSON file example as in the previous post).

MariaDB > create table bibop1 (
     ISBN char(15),
     AUTHOR char(200) field_format='AUTHOR:*',
     PUBLISHER char(20) field_format='PUBLISHER:NAME')
     engine=CONNECT table_type=JSON file_name='/var/lib/mysql/json/biblio3.jsn';
 
MariaDB > select * from bibop1;
+-------------------------------------------------------------------------------------|
| ISBN          | AUTHOR                                            | PUBLISHER       |
+-------------------------------------------------------------------------------------|
| 9782212090819 | [{"FIRSTNAME":"Jean-Christophe","LASTNAME":"Bernadac"},
…

With field_format=’AUTHOR:[]‘ the AUTHOR column will be represented as a coma separated list. Each JSON object is represented as a concatenated string of its properties.

+---------------+------------------------------------------------------------------------|
| ISBN          | AUTHOR                                               | PUBLISHER       |
+---------------+------------------------------------------------------------------------|
| 9782212090819 | Jean-Christophe Bernadac, John Knab, Charles Dickens | Eyrolles        |

To obtain a normalize view of Author first name and last name we associate column AuthorFN with field_format=’AUTHOR:[X]:FIRSTNAME’ and obtain a presentation with each author split on a line.

+---------------+-----------------+----------+
| ISBN          | AuthorFN        | AuthorLN |
+---------------+-----------------+----------+
| 9782212090819 | Jean-Christophe | Bernadac |

Here is a more exhaustive view of the various possibilities offered for the mapping:
We will see later that we can also use this syntax to modify a JSON document including array embedded in an objects.

Specification Array Type Limit Description
[n] All N.A Take the nth value of the array. Ignore it if n is 0.
[X] or [x] All Expand. Generate one row for each array value.
["string"] String Concatenate all values separated by the specified string.
[+] Numeric Make the sum of all the array values.
[*] Numeric Make the product of all array values.
[!] Numeric Make the average of all the array values.
[>] or [<] All Return the greatest or least value of the array.
[#] All N.A Return the number of values in the array.
[] All Expand if under an expanded object. Otherwise sum if numeric, else concatenation separated by “, “.
All N.A If an array, expand it if under an expanded object or take the first value of it.

We all know how XPAH works for XML documents. There has been some work done to define a JSONPath syntax to access JSON document content. The scope of what has been defined by JSONPath is broader than what we need for the MariaDB CONNECT storage Engine to define the mapping between table columns and JSON object properties. Here is the syntax available with JSONPath (with equivalent XPATH syntax):

XPath JSONPath Description
/ $ the root object/element
. @ the current object/element
/ . or [] child operator
.. n/a parent operator
// .. recursive descent. JSONPath borrows this syntax from E4X.
* * wildcard. All objects/elements regardless their names.
@ n/a attribute access. JSON structures don’t have attributes.
[] [] subscript operator. XPath uses it to iterate over element collections and for predicates. In Javascript and JSON it is the native array operator.
| [,] Union operator in XPath results in a combination of node sets. JSONPath allows alternate names or array indices as a set.
n/a [start:end:step] array slice operator borrowed from ES4.
[] ?() applies a filter (script) expression.
n/a () script expression, using the underlying script engine.
() n/a grouping in Xpath

A more complete description of What has been proposed for JSONPath is here
MySQL JSON UDF functions version 0.4.0 use part of this syntax for function like JSON_EXTRACT or JSON_SEARCH.
With MySQL JSON udf you can also express a path as a variable list of keyparts. Each keypart being a JSON property or an index in a JSON array : select json_extract(doc, keypart1, keypart2 ...)
PostgreSQL with its native JSON also uses a variable list of path elements to define a JSON path. For example
json_extract_path_text('{"f2":{"f3":1},"f4":{"f5":99,"f6":"foo"}}','f4', 'f6') where the keypath is 'f4', 'f6' return foo.

We will discover at Percona Live what syntax and set of helper functions have been chosen by Facebook and Oracle for their respective MySQL Native JSON type implementation. In fact I do not know if the two implementations are different. We will see.

The JSONPath syntax also allows to define queries on the JSON data. The following JSONPath query

$..book[?(@.price<10)]

will filter all books cheaper than 10 when applied to the following JSON document :

{ "store": {
    "book": [
      { "category": "reference",
        "author": "Nigel Rees",
        "title": "Sayings of the Century",
        "price": 8.95
      },
      { "category": "fiction",
        "author": "Evelyn Waugh",
        "title": "Sword of Honour",
        "price": 12.99
      },
      { "category": "fiction",
        "author": "Herman Melville",
        "title": "Moby Dick",
        "isbn": "0-553-21311-3",
        "price": 8.99
      },
      { "category": "fiction",
        "author": "J. R. R. Tolkien",
        "title": "The Lord of the Rings",
        "isbn": "0-395-19395-8",
        "price": 22.99
      }
    ],
    "bicycle": {
      "color": "red",
      "price": 19.95
    }
  }
}

With CONNECT we will first do the mapping of JSON properties to table columns. We will then use plain SQL to do the query.

As MariaDB CONNECT Storage Engine JSON functionality is still experimental your feedback on bugs, improvements, features is welcome. To fill bug it is https://mariadb.atlassian.net/projects/MDEV

Additional info :
MariaDB Knowledge Base Documentation for CONNECT Storage Engine CONNECT
github repository for MariaDB 10.0 including CONNECT Storage Engine
Presentation at future Percona Live 2015 SantaClara conference :
MariaDB CONNECT Storage Engine to simplify heterogeneous data access (JSON focus)

Leave a Reply

  

  

  


seven − = 4

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">