Support of Nested Data Structures in Composer
Composer supports aggregations for nested (or hierarchical) data structures for some data stores.
Support for this feature by Composer connectors is shown in the following table.
Key:Y - Supported; N - Not Supported; N/A - not applicable
Connector | Supported? | |
---|---|---|
Amazon Redshift | N/A | |
Amazon S3 | N/A | |
Apache Drill | N/A | |
Apache Phoenix | N/A | |
Apache Phoenix Query Server (QS) | N/A | |
Apache Solr | N | |
BigQuery | N/A | |
Cloudera Impala | N/A | |
Cloudera Search | N | |
Couchbase | N/A | |
Dremio | N/A | |
Elasticsearch 6.0 | Y | |
Elasticsearch 7.0 | Y | |
File Upload | N/A | |
HDFS | N/A | |
Hive | N/A | |
MemSQL | N/A | |
Microsoft SQL Server | N/A | |
MongoDB | Y | |
MySQL | N/A | |
Oracle | N/A | |
PostgreSQL | N/A | |
Presto | N/A | |
Real Time Sales | N/A | |
SAP Hana | N/A | |
SAP IQ | N/A | |
Snowflake | N/A | |
Spark SQL | N/A | |
Teradata | N/A | |
TIBCO DV | N/A | |
File Upload (Upload API) | N/A | |
Vertica | N/A |
There are two ways to store nested structure:
Store all hierarchy as a single document, for example, in JSON format (nested documents).
Store hierarchy items as separate documents and additional info on hierarchical links internally (block join).
Nested Documents
Hierarchical structure can be represented in JSON format. In MongoDB and Elasticsearch, storing such structures is supported.
Consider the following example. We need to store a hierarchy of divisions by country with two divisions in country. Also we need to store some general country information, for example, foundation year.
In this case, the following JSON is sent to the index document:
{ "country":"Germany", "foundation year":2008, "divisions":[ { "city":"Berlin", "sales":200, "manager":{ "first name":"Robert", "last name":"Simmons", "years in company":4 } }, { "city":"Munich", "sales":200, "manager":{ "first name":"Robert", "last name":"Simmons", "years in company":4 } } ] }
In MongoDB, you can store such documents and then query them as is, without any restrictions. However, the performance may be slow if the document contains a lot of arrays.
In Elasticsearch, we recommend using the "nested" type for complex objects before the document is indexed.
Block Join Support
There is another way to store hierarchical structures. All hierarchy items are stored as separate elements, with information about the hierarchical links stored internally. Apache Solr supports this approach.
Consider the following example. We need to store a hierarchy of divisions by country with two divisions in country. Also we need to store some general country information, for example, foundation year.
In this case, the following JSON is sent to the index document:
{ "country":"Germany", "foundationYear":2008, "_childDocuments_":[ { "city":"Berlin", "sales":200, "managerFirstName":"Robert", "managerLastName":"Simmons", "managerYearsInCompany":4 }, { "city":"Munich", "sales":200, "managerFirstName":"Robert", "managerLastName":"Simmons", "managerYearsInCompany":4 } ] }
As a result, there are three documents in the index. Information on hierarchical linking of these objects is stored internally in Solr.
{ "country":"Germany", "foundationYear":2008 }, { "city":"Berlin", "sales":200, "managerFirstName":"Robert", "managerLastName":"Simmons", "managerYearsInCompany":4 }, { "city":"Munich", "sales":200, "managerFirstName":"Robert", "managerLastName":"Simmons", "managerYearsInCompany":4 }
You must specify what fields are used in parent documents. To do this, you must select the checkbox in the Parent Field column on the Fields tab while creating or modifying the data source configuration .
Comments
0 comments
Please sign in to leave a comment.