- Additional Products >
- MongoDB Connector for BI
MongoDB Connector for BI¶
On this page
Note
The MongoDB Business Intelligence Connector and associated utilities are only available with MongoDB Enterprise.
The MongoDB Connector for BI (Business Intelligence) allows users to visualize their MongoDB Enterprise data using existing relational business intelligence tools such as Tableau.
These tools expect to connect to a data source and find data in tabular form following a fixed schema. This presents a challenge when working with MongoDB’s flexible schema and rich, multi-dimensional, documents.
The MongoDB Connector for BI acts as a layer that passes queries and data between a mongod or mongos instance and your reporting tool. It uses a foreign data wrapper with PostgreSQL to provide a relational SQL view into your MongoDB data.
Using this model, the MongoDB Connector for BI components store no data, and purely serve to bridge your MongoDB server with business intelligence tools.
Installation¶
Note
The MongoDB Connector for BI works only with MongoDB versions 3.2 and greater.
To set up the MongoDB Connector for BI with a business intelligence tool such as Tableau, follow the steps in this section.
Install the Business Intelligence Connector.¶
Download the Business Intelligence Connector for your version of Red Hat Enterprise Linux or CentOS from the MongoDB Download Center.
You can check your operating system version with the following command:
cat /etc/redhat-release
Extract the tarball you downloaded, and install the containing RPM packages:
sudo yum install *.rpm
Create at least one Business Intelligence Connector user.¶
Replace biuser in the following command with your desired Business Intelligence username, and fill in the connection string that points to your MongoDB instance.
mongobiuser create biuser mongodb://your.mongodbhost.com:27017/db1
mongobiuser will prompt you to set a password for biuser by default.
Generate your schema definition file.¶
Provide a MongoDB server hostname, replace dbname with the name of your database, and optionally replace collname with the name of a collection. If you do not specify a collection, mongodrdl will generate a schema from all collections in dbname.
mongodrdl --host your.mongohost.com -d dbname [-c collname] -o schema.drdl
You can examine the generated schema.drdl file, and remove any fields or tables that are not part of your intended schema.
Load the generated schema into the Business Intelligence Connector.¶
mongobischema import biuser schema.drdl
When prompted, you must provide the password you set with mongobiuser create. You can also provide the --password flag to mongobischema on the command line.
Install business intelligence PostgreSQL drivers.¶
On the Windows or OSX machine where you will be running your business intelligence tool, install the appropriate PostgreSQL driver for that tool.
If you are using Tableau for example, you can install the PostgreSQL driver from their drivers download page, and follow the instructions there.
Connect to the Business Intelligence Connector¶
Launch your business intelligence tool, and connect to the Business Intelligence Connector using the following settings:
Host | The hostname of the server on which you installed the Business Intelligence Connector in step 1. |
Port | 27032 by default. |
Username | The username you specified when creating your Business Intelligence Connector user with mongobiuser in step 2. |
Password | The password you specified when creating your Business Intelligence Connector user with mongobiuser in step 2. |
Database | Same as your username. |
Schema Configuration¶
Business Intelligence tools connect to a data source and, given a fixed tabular schema, allow the user to visually explore their data. However, because MongoDB uses a flexible schema, these tools currently cannot use MongoDB as a native data source.
To use MongoDB with a relational business intelligence tool such as Tableau, you must define a relational mapping for your MongoDB data.
The schema generation tool mongodrdl will output such a schema in the Document Relational Definition Language (DRDL) format, but you should verify that the schema it proposes matches your data requirements. Examine both the table structure and field types to ensure that you will be able to generate your desired reports.
You may have to edit the schema definition files by hand to remove unneeded fields; add fields that mongodrdl did not discover within the subset of documents that it sampled; and modify erroneous field types.
You may edit DRDL files in the following ways:
- Add aggregation pipeline stages,
- Add fields,
- Remove fields,
- Remove tables,
- Rename tables, so long as the collection field remains the same,
- Rename fields using the sqlname option
New in version 1.1: Rename fields using the sqlname option.
Document Relational Definition Language¶
The Document Relational Definition Language (DRDL) defines a relational view of a MongoDB schema.
mongodrdl produces these files using one or more MongoDB collections as a guideline, and mongobischema imports them into the Business Intelligence Connector.
File Format¶
Using the YAML format, the DRDL file lists the databases, tables, and columns to be “exported” for use in Business Intelligence tools.
schema:
- db: <database name>
tables:
- table: <SQL table name>
collection: <MongoDB collection name>
pipeline:
- <optional pipeline elements>
columns:
- name: <MongoDB field name>
sqlname: <mapped SQL column name>
mongotype: <MongoDB field type>
sqltype: <mapped SQL column type>
Example¶
Given documents of the following shape in the collection abc in the database test:
{
"_id": ObjectId("....")
"close": 7.45,
"detail": { "a": 2, "b": 3 }
}
You can use mongodrdl to generate a schema based on this collection by running the following command:
mongodrdl -d test -c abc -o schema.drdl
The generated schema file schema.drdl will look similar to the following:
schema:
- db: test
tables:
- table: abc
collection: abc
pipeline: []
columns:
- name: _id
mongotype: float64
sqltype: numeric
- name: close
mongotype: float64
sqltype: numeric
- name: detail.a
mongotype: float64
sqltype: numeric
- name: detail.b
mongotype: float64
sqltype: numeric
Field Types¶
The MongoDB Connector for BI will correctly map fields that always contain the same data type into the relational model. Schema generation will deal specially with the following cases:
- Dates
- The connector will treat any field of type data_date as the SQL timestamp type.
- Timestamps
- The connector will treat any field of type data_timestamp as a varchar string type.
- Geospatial
New in version 1.1.0.
If a collection contains a geospatial index, the connector will map the indexed field into an array of numeric longitude-latitude coordinates.
See Geospatial Data for an example.
- Heterogeneous Fields
If a field can contain multiple types, mongodrdl will pick the type that appears most frequently. If a value does not match the type specified in the DRDL schema, the MongoDB Connector for BI will report an error. However, in the case where schema specifies a string, the MongoDB Connector for BI can convert the value into a string.
If a field can contain either a string or an array of strings, the generated schema will always specify that the field contains an array of strings.
Embedded Documents¶
The MongoDB Connector for BI will map embedded documents to simple fields that have a . separator character, making them appear similar to the way you would reference them using dot notation in a MongoDB query.
While Tableau properly quotes identifiers, within “ad-hoc” SQL expressions you must double-quote every identifier that contains . characters or mixed-case characters.
For example, consider the following document:
{
"_id": 1,
"familyName": "Partridge",
"hometown" : "Hollywood Hills",
"address" : { "street": "123 Main Street",
"city" : "Hollywood",
"state" : "CA",
"zip" : "90210" },
"members_since" : ISODate("2002-04-12T00:00:00Z")
}
This will result in the following fields in the generated schema:
_id | numeric |
familyName | varchar |
hometown | varchar |
address.street | varchar |
address.city | varchar |
address.state | varchar |
address.zip | varchar |
members_since | timestamp |
Arrays¶
The MongoDB Connector for BI exposes arrays to business intelligence tools using two collections: one without the array, and the other with the array “pre-joined” to its parent.
For example, consider the following document in the collection families:
{
"_id": 1,
"familyName": "Partridge",
"hometown" : "Hollywood Hills",
"familyMembers" : [
{
"firstname" : "Shirley",
"age" : 42,
"attributes" : [
{ "name" : "instrument", "value" : "singer" },
{ "name" : "role", "value" : "mom" }
]
},
{
"firstname" : "Keith",
"age" : 18,
"attributes" : [
{ "name" : "instrument", "value" : "guitar" },
{ "name" : "role", "value" : "son" }
]
},
{
"firstname" : "Laurie",
"age" : 16,
"attributes" : [
{ "name" : "instrument", "value" : "keyboard" },
{ "name" : "role", "value" : "sister" }
]
}]
}
This will result in the following three tables:
- families
_id numeric familyName varchar hometown varchar - families_familyMembers
_id numeric familyMembers.age numeric familyMembers.firstname varchar familyMembers_idx numeric familyName varchar hometown varchar - families_familyMembers_attributes
_id numeric familyMembers.age numeric familyMembers.attributes.name varchar familyMembers.attributes.value varchar familyMembers.attributes_idx numeric familyMembers.firstname varchar familyMembers_idx numeric familyName varchar hometown varchar
Aggregation Pipelines¶
The MongoDB Connector for BI can use aggregation pipelines as part of the schema to transform documents from the collection into the proper form for the relational tables.
For example, consider a simple document in the collection simpleFamilies:
{
"_id": 1,
"familyName": "Partridge",
"familyMembers" : [ "Shirley", "Keith", "Laurie"]
}
mongodrdl will generate a schema with the tables simpleFamilies and simpleFamilies_familyMembers.
The table simpleFamilies_familyMembers enumerates each family members, and has the following pipeline:
pipeline:
- $unwind:
includeArrayIndex: familyMembers_idx
path: $familyMembers
This pipeline uses $unwind to create a new record for each member of familyMembers. The schema will track the array index in the field familyMembers_idx.
Geospatial Data¶
New in version 1.1.0.
If a collection contains a geospatial index, the connector will map the indexed field into an array of numeric longitude-latitude coordinates.
For example, given the following collection:
db.points.createIndex( { pos : "2dsphere" } )
db.points.insert({
pos : { type: "Point", coordinates: [ -73.97, 40.77 ] },
name: "Central Park",
category : "Parks"
})
mongodrdl will generate the following schema:
schema:
- db: test
tables:
- table: points
collection: points
pipeline: []
columns:
- name: _id
mongotype: bson.ObjectId
sqlname: _id
sqltype: varchar
- name: category
mongotype: string
sqlname: category
sqltype: varchar
- name: name
mongotype: string
sqlname: name
sqltype: varchar
- name: pos.coordinates
mongotype: geo.2darray
sqlname: pos.coordinates
sqltype: numeric[]
Thank you for your feedback!
We're sorry! You can Report a Problem to help us improve this page.