"and started the arangodb daemon with the following command.\n",
"\n",
"> arangod\n",
"\n",
"We recommend to use the ArangoDB WebUI to perform the queries, the default url is *localhost:8529*, or you can use the *arangosh* shell to practise using the ArangoDB if your operating system has no GUI."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Document store : collections and documents\n",
"\n",
"*Relational databases* contain *tables* of *records* (as *rows*).\n",
"\n",
"An **ArangoDB document database** contains **collections** that contain **documents**. The documents follow the JSON format, and are usually stored in a binary format."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"<img src = \"http://json.org/object.gif\">\n",
"<img src = \"http://json.org/array.gif\">\n",
"<img src = \"http://json.org/value.gif\">\n",
"\n",
"Below is an example of json document containing information of a student and corresponding scores.\n",
"# Query 10: Inner join between two collections\n",
" FOR doc1 in collection1\n",
" FOR doc2 in collection2\n",
" Filter doc1.id==doc2.id\n",
" return {doc1:doc1,doc2:doc2}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Graph store : nodes and edges\n",
"\n",
"An ArangoDB graph database contains a set of node collections and edge collections."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.1 Loading the example graphs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"arangosh> var examples = require(\"@arangodb/graph-examples/example-graph.js\");\n",
"arangosh> var g = examples.loadGraph(\"knows_graph\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2 Traversing the graphs\n",
"\n",
"FOR vertex[, edge[, path]]\n",
" IN [min[..max]]\n",
" OUTBOUND|INBOUND|ANY startVertex\n",
" GRAPH graphName\n",
" [OPTIONS options]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Query 11: find the friends of a given person. \n",
"\n",
"// get a random person p\n",
"Let p= (For person in persons Sort rand() limit 1 return person)\n",
"\n",
"// find the friends of p\n",
"FOR v,e,path\n",
"IN 1..1 any p[0]._id\n",
"GRAPH \"knows_graph\"\n",
"RETURN {p,v,e}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Query 12: Filtering\n",
"# Filtering vertex \n",
"// get person bob\n",
"Let p= (For person in persons Filter person._key=='bob' return person)\n",
"\n",
"// find the friends of p\n",
"FOR v,e\n",
"IN 1..1 any p[0]._id\n",
"GRAPH \"knows_graph\"\n",
"Filter v._key=='alice'\n",
"RETURN {p,v,e}\n",
"\n",
"# Filtering path\n",
"// get person bob\n",
"Let p= (For person in persons Filter person._key=='bob' return person)\n",
"\n",
"// find the friends of p\n",
"FOR v,e,path\n",
"IN 1..2 any p[0]._id\n",
"GRAPH \"knows_graph\"\n",
"Filter length(path.edges)>1\n",
"RETURN {p,v,e,path}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Query 13: Graph functions -- Shortest Path\n",
"\n",
"// find the friends of p\n",
"FOR v,e\n",
"IN Any SHORTEST_PATH\n",
"'persons/charlie' to 'persons/alice'\n",
"GRAPH \"knows_graph\"\n",
"RETURN {v,e}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.3 Visualization "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## 3. Your turn - exploring the movie datasets\n",
"\n",
"Download the IMDB dataset in the Dump and import them."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# import the IMDB dataset\n",
"arangorestore dump --server.database handson"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Questions\n",
"(1) How many unique types of vertices and unique labels of edges are there in two collections respectively? HINT: UNIQUE function\n",
"\n",
"(2) Some documents in collection imdb_vertices are associated with a \"releaseDate\" field. What is the newest movie in the collection? HINT: MAX function\n",
"\n",
"(3) Update a edge between \"imdb_vertices/crime\" and \"imdb_vertices/5541\" in collection imdb_edges with a label \"has_movie\", if the edge isn't exist, create one and insert it into the edge collection. HINT: keyword: UPSERT\n",
"\n",
"(4) For documents in collection imdb_vertices, find the ids that don't include any number, save them with a label into a new collection named \"genre\". (HINTs: use regex expression SUBSTRING(doc._id,14)=~ \"[a-zA-Z]\", create the genre collections beforehand)\n",
"\n",
"(5) Find actors whose name include \"David\", return documents that have the \"birthplace\" attribute. HINT: keyword like and HAS function\n",
"\n",
"(6) Find the actor who have acted in the most number of movies. HINT: keyword COLLECT\n",
"\n",
"(7) Regarding different movie genres, find the Top-5 genres with most number of movies in all time. HINT: keyword COLLECT\n",
"\n",
"(8) Return the number of persons who are both actor and director. HINT: SELF-JOIN ON imdb_edges\n",
"\n",
"(9) Given a movie \"Forrest Gump\", check its all associated actors. return their real names and role names. HINT: graph traversal\n",
"\n",
"(10) Given a actor \"Tom Hanks\", find the directors who have cooperated with him more than twice. HINT: graph traversal\n",
"HINT: graph traversal and COLLECT.\n",
"\n",
"(11) Think about a movie or actor you are interested in, visualize it in the ArangoDB and present some insights from the visulization."
]
}
],
"metadata": {
"anaconda-cloud": {},
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
%% Cell type:markdown id: tags:
# Hands-On Session: Multi-model Data query languages and processing paradigms in CIKM 2020
%% Cell type:markdown id: tags:
## Part 1: Multi-model queries in ArangoDB
%% Cell type:markdown id: tags:
### 1.1: ArangoDB Installation
%% Cell type:markdown id: tags:
To get started, please download and install the latest community builds (version 3.4) of ArangoDB from the official website:
and started the arangodb daemon with the following command.
> arangod
We recommend to use the ArangoDB WebUI to perform the queries, the default url is *localhost:8529*, or you can use the *arangosh* shell to practise using the ArangoDB if your operating system has no GUI.
%% Cell type:markdown id: tags:
## 1. Document store : collections and documents
*Relational databases* contain *tables* of *records* (as *rows*).
An **ArangoDB document database** contains **collections** that contain **documents**. The documents follow the JSON format, and are usually stored in a binary format.
%% Cell type:markdown id: tags:
<imgsrc = "http://json.org/object.gif">
<imgsrc = "http://json.org/array.gif">
<imgsrc = "http://json.org/value.gif">
Below is an example of json document containing information of a student and corresponding scores.
Download the IMDB dataset in the Dump and import them.
%% Cell type:code id: tags:
``` python
# import the IMDB dataset
arangorestoredump--server.databasehandson
```
%% Cell type:markdown id: tags:
#### Questions
(1) How many unique types of vertices and unique labels of edges are there in two collections respectively? HINT: UNIQUE function
(2) Some documents in collection imdb_vertices are associated with a "releaseDate" field. What is the newest movie in the collection? HINT: MAX function
(3) Update a edge between "imdb_vertices/crime" and "imdb_vertices/5541" in collection imdb_edges with a label "has_movie", if the edge isn't exist, create one and insert it into the edge collection. HINT: keyword: UPSERT
(4) For documents in collection imdb_vertices, find the ids that don't include any number, save them with a label into a new collection named "genre". (HINTs: use regex expression SUBSTRING(doc._id,14)=~ "[a-zA-Z]", create the genre collections beforehand)
(5) Find actors whose name include "David", return documents that have the "birthplace" attribute. HINT: keyword like and HAS function
(6) Find the actor who have acted in the most number of movies. HINT: keyword COLLECT
(7) Regarding different movie genres, find the Top-5 genres with most number of movies in all time. HINT: keyword COLLECT
(8) Return the number of persons who are both actor and director. HINT: SELF-JOIN ON imdb_edges
(9) Given a movie "Forrest Gump", check its all associated actors. return their real names and role names. HINT: graph traversal
(10) Given a actor "Tom Hanks", find the directors who have cooperated with him more than twice. HINT: graph traversal
HINT: graph traversal and COLLECT.
(11) Think about a movie or actor you are interested in, visualize it in the ArangoDB and present some insights from the visulization.