Import from Neo4j using GraphML
This section describes the process of importing data from Neo4j to OrientDB using GraphML. For general information on the possible Neo4j to OrientDB migration strategies, please refer to the Import from Neo4j section.
Neo4j can export in GraphML, an XML-based file format for graphs. Given that OrientDB can read GraphML, you can use this file format to import data from Neo4j into OrientDB, using the Console or the Java API.
Note:
For large and complex datasets, the preferred way to migrate from Neo4j is using the Neo4j to OrientDB Importer.
Neo4j and Cypher are registered trademark of Neo Technology, Inc.
Exporting GraphML
In order to export data from Neo4j into GraphML, you need to install the Neo4j Shell Tools plugin. Once you have this package installed, you can use the export-graphml
utility to export the database.
Change into the Neo4j home directory:
$
cd /path/to/neo4j-community-2.3.2
Download the Neo4j Shell Tools:
$
curl http://dist.neo4j.org/jexp/shell/neo4j-shell-tools_2.3.2.zip \ -o neo4j-shell-tools.zip
Unzip the
neo4j-shell-tools.zip
file into thelib
directory:$
unzip neo4j-shell-tools.zip -d lib
Restart the Neo4j Server. In the event that it's not running,
start
it:$
./bin/neo4j restart
Once you have Neo4j restarted with the Neo4j Shell Tools, launch the Neo4j Shell tool, located in the
bin/
directory:$
./bin/neo4j-shell
Welcome to the Neo4j Shell! Enter 'help' for a list of commands NOTE: Remote Neo4j graph database service 'shell' at port 1337 neo4j-sh (0)$Export the database into GraphML:
neo4j-sh (0)$
export-graphml -t -o /tmp/out.graphml
Wrote to GraphML-file /tmp/out.graphml 0. 100%: nodes = 302 rels = 834 properties = 4221 time 59 sec total 59 sec
This exports the database to the path /tmp/out.graphml
.
Importing GraphML
There are three methods available in importing the GraphML file into OrientDB: through the Console, through Gremlin or through the Java API.
Importing through the OrientDB Console
For more recent versions of OrientDB, you can import data from GraphML through the OrientDB Console. If you have version 2.0 or greater, this is the recommended method given that it can automatically translate the Neo4j labels into classes.
Log into the OrientDB Console.
$
$ORIENTDB_HOME/bin/console.sh
In OrientDB, create a database to receive the import:
orientdb>
CREATE DATABASE PLOCAL:/tmp/db/test
Creating database [plocal:/tmp/db/test] using the storage type [plocal]... Database created successfully. Current database is: plocal:/tmp/db/testImport the data from the GraphML file:
orientdb {db=test}>
IMPORT DATABASE /tmp/out.graphml
Importing GRAPHML database database from /tmp/out.graphml... Transaction 8 has been committed in 12ms
This imports the Neo4j database into OrientDB on the test
database.
Importing through the Gremlin Console
For older versions of OrientDB, you can import data from GraphML through the Gremlin Console. If you have a version 1.7 or earlier, this is the method to use. It is not recommended on more recent versions, given that it doesn't consider labels declared in Neo4j. In this case, everything imports as the base vertex and edge classes, (that is, V
and E
). This means that, after importing through Gremlin you need to refactor you graph elements to fit a more structured schema.
To import the GraphML file into OrientDB, complete the following steps:
Launch the Gremlin Console:
$
$ORIENTDB_HOME/bin/gremlin.sh
\,,,/ (o o) -----oOOo-(_)-oOOo-----From the Gremlin Console, create a new graph, specifying the path to your Graph database, (here
/tmp/db/test
):gremlin>
g = new OrientGraph("plocal:/tmp/db/test");
==>orientgraph[plocal:/db/test]Load the GraphML file into the graph object (that is,
g
):gremlin>
g.loadGraphML("/tmp/out.graphml");
==>nullExit the Gremlin Console:
gremlin>
quit
This imports the GraphML file into your OrientDB database.
Importing through the Java API
OrientDB Console calls the Java API. Using the Java API directly allows you greater control over the import process. For instance,
new OGraphMLReader(new OrientGraph("plocal:/temp/bettergraph")).inputGraph("/temp/neo4j.graphml");
This line imports the GraphML file into OrientDB.
Defining Custom Strategies
Beginning in version 2.1, OrientDB allows you to modify the import process through custom strategies for vertex and edge attributes. It supports the following strategies:
com.orientechnologies.orient.graph.graphml.OIgnoreGraphMLImportStrategy
Defines attributes to ignore.com.orientechnologies.orient.graph.graphml.ORenameGraphMLImportStrategy
Defines attributes to rename.
Examples
Ignore the vertex attribute
type
:new OGraphMLReader(new OrientGraph("plocal:/temp/bettergraph")).defineVertexAttributeStrategy("__type__", new OIgnoreGraphMLImportStrategy()).inputGraph("/temp/neo4j.graphml");
Ignore the edge attribute
weight
:new OGraphMLReader(new OrientGraph("plocal:/temp/bettergraph")).defineEdgeAttributeStrategy("weight", new OIgnoreGraphMLImportStrategy()).inputGraph("/temp/neo4j.graphml");
Rename the vertex attribute
type
in justtype
:new OGraphMLReader(new OrientGraph("plocal:/temp/bettergraph")).defineVertexAttributeStrategy("__type__", new ORenameGraphMLImportStrategy("type")).inputGraph("/temp/neo4j.graphml");
Import Tips and Tricks
Dealing with Memory Issues
In the event that you experience memory issues while attempting to import from Neo4j, you might consider reducing the batch size. By default, the batch size is set to 1000
. Smaller value causes OrientDB to process the import in smaller units.
Import with adjusted batch size through the Console:
orientdb {db=test}>
IMPORT DATABASE /tmp/out.graphml batchSize=100
Import with adjusted batch size through the Java API:
new OGraphMLReader(new OrientGraph("plocal:/temp/bettergraph")).setBatchSize(100).inputGraph("/temp/neo4j.graphml");
Storing the Vertex ID's
By default, OrientDB updates the import to use its own ID's for vertices. If you want to preserve the original vertex ID's from Neo4j, use the storeVertexIds
option.
Import with the original vertex ID's through the Console:
orientdb {db=test}>
IMPORT DATABASE /tmp/out.graphml storeVertexIds=true
Import with the original vertex ID's through the Java API:
new OGraphMLReader(new OrientGraph("plocal:/temp/bettergraph")).setStoreVertexIds(true).inputGraph("/temp/neo4j.graphml");
Example
A complete example of a migration from Neo4j to OrientDB using the GraphML method can be found in the section Tutorial: Importing the movie Database from Neo4j.