OPTIONS

MongoDB Connector for BI

Note

The MongoDB Business Intelligence Connector and associated utilities are only available with MongoDB Enterprise.

The MongoDB Connector for BI (Business Intelligence) allows users to visualize their MongoDB Enterprise data using existing relational business intelligence tools such as Tableau.

These tools expect to connect to a data source and find data in tabular form following a fixed schema. This presents a challenge when working with MongoDB’s flexible schema and rich, multi-dimensional, documents.

The MongoDB Connector for BI acts as a layer that passes queries and data between a mongod or mongos instance and your reporting tool. It uses a foreign data wrapper with PostgreSQL to provide a relational SQL view into your MongoDB data.

Using this model, the MongoDB Connector for BI components store no data, and purely serve to bridge your MongoDB server with business intelligence tools.

Diagram of the MongoDB Business Intelligence Connector.

Installation

Note

The MongoDB Connector for BI works only with MongoDB versions 3.2 and greater.

To set up the MongoDB Connector for BI with a business intelligence tool such as Tableau, follow the steps in this section.

1

Install the Business Intelligence Connector.

Download the Business Intelligence Connector for your version of Red Hat Enterprise Linux or CentOS from the MongoDB Download Center.

You can check your operating system version with the following command:

cat /etc/redhat-release

Extract the tarball you downloaded, and install the containing RPM packages:

sudo yum install *.rpm
2

Create at least one Business Intelligence Connector user.

Replace biuser in the following command with your desired Business Intelligence username, and fill in the connection string that points to your MongoDB instance.

mongobiuser create biuser mongodb://your.mongodbhost.com:27017/db1

mongobiuser will prompt you to set a password for biuser by default.

3

Generate your schema definition file.

Provide a MongoDB server hostname, replace dbname with the name of your database, and optionally replace collname with the name of a collection. If you do not specify a collection, mongodrdl will generate a schema from all collections in dbname.

mongodrdl --host your.mongohost.com -d dbname [-c collname] -o schema.drdl

You can examine the generated schema.drdl file, and remove any fields or tables that are not part of your intended schema.

4

Load the generated schema into the Business Intelligence Connector.

mongobischema import biuser schema.drdl

When prompted, you must provide the password you set with mongobiuser create. You can also provide the --password flag to mongobischema on the command line.

5

Install business intelligence PostgreSQL drivers.

On the Windows or OSX machine where you will be running your business intelligence tool, install the appropriate PostgreSQL driver for that tool.

If you are using Tableau for example, you can install the PostgreSQL driver from their drivers download page, and follow the instructions there.

6

Connect to the Business Intelligence Connector

Launch your business intelligence tool, and connect to the Business Intelligence Connector using the following settings:

Host The hostname of the server on which you installed the Business Intelligence Connector in step 1.
Port 27032 by default.
Username The username you specified when creating your Business Intelligence Connector user with mongobiuser in step 2.
Password The password you specified when creating your Business Intelligence Connector user with mongobiuser in step 2.
Database Same as your username.

Schema Configuration

Business Intelligence tools connect to a data source and, given a fixed tabular schema, allow the user to visually explore their data. However, because MongoDB uses a flexible schema, these tools currently cannot use MongoDB as a native data source.

To use MongoDB with a relational business intelligence tool such as Tableau, you must define a relational mapping for your MongoDB data.

The schema generation tool mongodrdl will output such a schema in the Document Relational Definition Language (DRDL) format, but you should verify that the schema it proposes matches your data requirements. Examine both the table structure and field types to ensure that you will be able to generate your desired reports.

You may have to edit the schema definition files by hand to remove unneeded fields; add fields that mongodrdl did not discover within the subset of documents that it sampled; and modify erroneous field types.

You may edit DRDL files in the following ways:

  • Add aggregation pipeline stages,
  • Add fields,
  • Remove fields,
  • Remove tables,
  • Rename tables, so long as the collection field remains the same,
  • Rename fields using the sqlname option

New in version 1.1: Rename fields using the sqlname option.

Document Relational Definition Language

The Document Relational Definition Language (DRDL) defines a relational view of a MongoDB schema.

mongodrdl produces these files using one or more MongoDB collections as a guideline, and mongobischema imports them into the Business Intelligence Connector.

File Format

Using the YAML format, the DRDL file lists the databases, tables, and columns to be “exported” for use in Business Intelligence tools.

schema:
- db: <database name>
  tables:
  - table: <SQL table name>
    collection: <MongoDB collection name>
    pipeline:
    - <optional pipeline elements>
    columns:
    - name: <MongoDB field name>
      sqlname: <mapped SQL column name>
      mongotype: <MongoDB field type>
      sqltype: <mapped SQL column type>

Example

Given documents of the following shape in the collection abc in the database test:

{
    "_id": ObjectId("....")
    "close": 7.45,
    "detail": { "a": 2, "b": 3 }
}

You can use mongodrdl to generate a schema based on this collection by running the following command:

mongodrdl -d test -c abc -o schema.drdl

The generated schema file schema.drdl will look similar to the following:

schema:
- db: test
  tables:
  - table: abc
    collection: abc
    pipeline: []
    columns:
    - name: _id
      mongotype: float64
      sqltype: numeric
    - name: close
      mongotype: float64
      sqltype: numeric
    - name: detail.a
      mongotype: float64
      sqltype: numeric
    - name: detail.b
      mongotype: float64
      sqltype: numeric

Field Types

The MongoDB Connector for BI will correctly map fields that always contain the same data type into the relational model. Schema generation will deal specially with the following cases:

Dates
The connector will treat any field of type data_date as the SQL timestamp type.
Timestamps
The connector will treat any field of type data_timestamp as a varchar string type.
Geospatial

New in version 1.1.0.

If a collection contains a geospatial index, the connector will map the indexed field into an array of numeric longitude-latitude coordinates.

See Geospatial Data for an example.

Heterogeneous Fields

If a field can contain multiple types, mongodrdl will pick the type that appears most frequently. If a value does not match the type specified in the DRDL schema, the MongoDB Connector for BI will report an error. However, in the case where schema specifies a string, the MongoDB Connector for BI can convert the value into a string.

If a field can contain either a string or an array of strings, the generated schema will always specify that the field contains an array of strings.

Embedded Documents

The MongoDB Connector for BI will map embedded documents to simple fields that have a . separator character, making them appear similar to the way you would reference them using dot notation in a MongoDB query.

While Tableau properly quotes identifiers, within “ad-hoc” SQL expressions you must double-quote every identifier that contains . characters or mixed-case characters.

For example, consider the following document:

{
    "_id": 1,
    "familyName": "Partridge",
    "hometown" : "Hollywood Hills",
    "address" : { "street": "123 Main Street",
                  "city" : "Hollywood",
                  "state" : "CA",
                  "zip" : "90210" },
    "members_since" : ISODate("2002-04-12T00:00:00Z")
}

This will result in the following fields in the generated schema:

_id numeric
familyName varchar
hometown varchar
address.street varchar
address.city varchar
address.state varchar
address.zip varchar
members_since timestamp

Arrays

The MongoDB Connector for BI exposes arrays to business intelligence tools using two collections: one without the array, and the other with the array “pre-joined” to its parent.

For example, consider the following document in the collection families:

{
   "_id": 1,
   "familyName": "Partridge",
   "hometown" : "Hollywood Hills",
   "familyMembers" : [
     {
         "firstname" : "Shirley",
         "age" : 42,
         "attributes" : [
             { "name" : "instrument", "value" : "singer" },
             { "name" : "role", "value" : "mom" }
         ]
     },
     {
         "firstname" : "Keith",
         "age" : 18,
         "attributes" : [
             { "name" : "instrument", "value" : "guitar" },
             { "name" : "role", "value" : "son" }
         ]
     },
     {
         "firstname" : "Laurie",
         "age" : 16,
         "attributes" : [
             { "name" : "instrument", "value" : "keyboard" },
             { "name" : "role", "value" : "sister" }
         ]
     }]
}

This will result in the following three tables:

families
_id numeric
familyName varchar
hometown varchar
families_familyMembers
_id numeric
familyMembers.age numeric
familyMembers.firstname varchar
familyMembers_idx numeric
familyName varchar
hometown varchar
families_familyMembers_attributes
_id numeric
familyMembers.age numeric
familyMembers.attributes.name varchar
familyMembers.attributes.value varchar
familyMembers.attributes_idx numeric
familyMembers.firstname varchar
familyMembers_idx numeric
familyName varchar
hometown varchar

Aggregation Pipelines

The MongoDB Connector for BI can use aggregation pipelines as part of the schema to transform documents from the collection into the proper form for the relational tables.

For example, consider a simple document in the collection simpleFamilies:

{
   "_id": 1,
   "familyName": "Partridge",
   "familyMembers" : [ "Shirley", "Keith", "Laurie"]
}

mongodrdl will generate a schema with the tables simpleFamilies and simpleFamilies_familyMembers.

The table simpleFamilies_familyMembers enumerates each family members, and has the following pipeline:

pipeline:
- $unwind:
    includeArrayIndex: familyMembers_idx
    path: $familyMembers

This pipeline uses $unwind to create a new record for each member of familyMembers. The schema will track the array index in the field familyMembers_idx.

Geospatial Data

New in version 1.1.0.

If a collection contains a geospatial index, the connector will map the indexed field into an array of numeric longitude-latitude coordinates.

For example, given the following collection:

db.points.createIndex( { pos : "2dsphere" } )
db.points.insert({
    pos : { type: "Point", coordinates: [ -73.97, 40.77 ] },
    name: "Central Park",
    category : "Parks"
})

mongodrdl will generate the following schema:

schema:
- db: test
  tables:
  - table: points
    collection: points
    pipeline: []
    columns:
    - name: _id
      mongotype: bson.ObjectId
      sqlname: _id
      sqltype: varchar
    - name: category
      mongotype: string
      sqlname: category
      sqltype: varchar
    - name: name
      mongotype: string
      sqlname: name
      sqltype: varchar
    - name: pos.coordinates
      mongotype: geo.2darray
      sqlname: pos.coordinates
      sqltype: numeric[]

Was this page helpful?

Yes No

Thank you for your feedback!

We're sorry! You can Report a Problem to help us improve this page.