Skip to content
Snippets Groups Projects
Commit 28cab976 authored by Chao Zhang's avatar Chao Zhang
Browse files

ArangoDB installation

parent 3fd9ca11
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Hands-On Session: Multi-model Data query languages and processing paradigms in CIKM 2020
%% Cell type:markdown id: tags:
## Part 1: Multi-model queries in ArangoDB
%% Cell type:markdown id: tags:
### 1.1: ArangoDB Installation
%% Cell type:markdown id: tags:
To get started, please download and install the latest community builds (version 3.4) of ArangoDB from the official website:
* https://www.arangodb.com/docs/stable/installation.html
and started the arangodb daemon with the following command.
> arangod
We recommend to use the ArangoDB WebUI to perform the queries, the default url is *localhost:8529*, or you can use the *arangosh* shell to practise using the ArangoDB if your operating system has no GUI.
%% Cell type:markdown id: tags:
## 1. Document store : collections and documents
*Relational databases* contain *tables* of *records* (as *rows*).
An **ArangoDB document database** contains **collections** that contain **documents**. The documents follow the JSON format, and are usually stored in a binary format.
%% Cell type:markdown id: tags:
<img src = "http://json.org/object.gif">
<img src = "http://json.org/array.gif">
<img src = "http://json.org/value.gif">
Below is an example of json document containing information of a student and corresponding scores.
** Score Document**
```
{"_id":0,"name":"aimee Zank",
"scores":[{"score":1.463179736705023,"type":"exam"},
{"score":11.78273309957772,"type":"quiz"},
{"score":35.8740349954354,"type":"homework"}]
}
```
%% Cell type:markdown id: tags:
### 1.1 Loading the score documents
%% Cell type:code id: tags:
``` python
# create a database in arangosh shell
arangosh> db._createDatabase("handson");
arangosh> db._useDatabase("handson");
# import an example dataset in bash
arangoimp --file scores.json --collection scores --create-collection true --server.database handson
```
%% Cell type:markdown id: tags:
### 1.2 Arango Query Language (AQL) on documents
Basically, AQL return the results by using the following operations:
**FOR**: array iteration
**RETURN**: results projection
**FILTER**: results filtering
**SORT**: result sorting
**LIMIT**: result slicing
**LET**: variable assignment
**COLLECT**: result grouping
**INSERT**: insertion of new documents
**UPDATE**: (partial) update of existing documents
**REPLACE**: replacement of existing documents
**REMOVE**: removal of existing documents
**UPSERT**: insertion or update of existing documents
%% Cell type:code id: tags:
``` python
# Create a document:
INSERT {
"_key":"211",
"name": "Chao",
"surname": "Zhang",
"score": [60,80,90]
} INTO scores
# Retrieve a document:
Return document("scores","211")
# Update a document:
UPDATE "211" WITH { score: [90,90,90] } IN scores
# Delete a document:
REMOVE { _key: "211" } IN scores
```
%% Cell type:code id: tags:
``` python
# Query 1: return a score document in the collection.
For doc in scores Filter doc.name =="Leonida Lafond" return doc
```
%% Cell type:code id: tags:
``` python
# Query 2: (multiple conditions) return a score document in the collection.
For doc in scores Filter doc.name =="Leonida Lafond" and doc._key=='266197464913' return doc
```
%% Cell type:code id: tags:
``` python
# Query 3: (array operator 1) find types of scores.
For doc in scores limit 1 return doc.scores[*].type
```
%% Cell type:code id: tags:
``` python
# Query 4: (array operator 2) find students whose exam scores are greater than 90.
For doc in scores limit 1 return doc.scores[* Filter CURRENT.score>90].score
```
%% Cell type:code id: tags:
``` python
# Query 5: (array operator 3) compute the average score.
For doc in scores limit 1 return AVERAGE(doc.scores[*].score)
```
%% Cell type:code id: tags:
``` python
# Query 6: flatten
Return FLATTEN([ 1, 2, [ 3, 4 ], 5, [ 6, 7 ], [ 8, [ 9, 10 ] ] ])
```
%% Cell type:code id: tags:
``` python
# Query 7: sorting
For doc in scores
Sort first(doc.scores[*].score) DESC
Return doc
```
%% Cell type:code id: tags:
``` python
# Query 8: grouping (with or without count)
For doc in scores
COLLECT name=doc.name into g
return {name,g}
```
%% Cell type:code id: tags:
``` python
# Query 9: define a variable using Let
FOR doc in scores
LET average_score=AVERAGE(doc.scores[*].score)
SORT average_score DESC
RETURN { name:doc.name,average_score:average_score}
```
%% Cell type:code id: tags:
``` python
# Query 10: Inner join between two collections
FOR doc1 in collection1
FOR doc2 in collection2
Filter doc1.id==doc2.id
return {doc1:doc1,doc2:doc2}
```
%% Cell type:markdown id: tags:
## 2. Graph store : nodes and edges
An ArangoDB graph database contains a set of node collections and edge collections.
%% Cell type:markdown id: tags:
### 2.1 Loading the example graphs
%% Cell type:code id: tags:
``` python
arangosh> var examples = require("@arangodb/graph-examples/example-graph.js");
arangosh> var g = examples.loadGraph("knows_graph");
```
%% Cell type:markdown id: tags:
### 2.2 Traversing the graphs
FOR vertex[, edge[, path]]
IN [min[..max]]
OUTBOUND|INBOUND|ANY startVertex
GRAPH graphName
[OPTIONS options]
%% Cell type:code id: tags:
``` python
# Query 11: find the friends of a given person.
// get a random person p
Let p= (For person in persons Sort rand() limit 1 return person)
// find the friends of p
FOR v,e,path
IN 1..1 any p[0]._id
GRAPH "knows_graph"
RETURN {p,v,e}
```
%% Cell type:code id: tags:
``` python
# Query 12: Filtering
# Filtering vertex
// get person bob
Let p= (For person in persons Filter person._key=='bob' return person)
// find the friends of p
FOR v,e
IN 1..1 any p[0]._id
GRAPH "knows_graph"
Filter v._key=='alice'
RETURN {p,v,e}
# Filtering path
// get person bob
Let p= (For person in persons Filter person._key=='bob' return person)
// find the friends of p
FOR v,e,path
IN 1..2 any p[0]._id
GRAPH "knows_graph"
Filter length(path.edges)>1
RETURN {p,v,e,path}
```
%% Cell type:code id: tags:
``` python
# Query 13: Graph functions -- Shortest Path
// find the friends of p
FOR v,e
IN Any SHORTEST_PATH
'persons/charlie' to 'persons/alice'
GRAPH "knows_graph"
RETURN {v,e}
```
%% Cell type:markdown id: tags:
### 2.3 Visualization
%% Cell type:markdown id: tags:
## 3. Your turn - exploring the movie datasets
Download the IMDB dataset in the Dump and import them.
%% Cell type:code id: tags:
``` python
# import the IMDB dataset
arangorestore dump --server.database handson
```
%% Cell type:markdown id: tags:
#### Questions
(1) How many unique types of vertices and unique labels of edges are there in two collections respectively? HINT: UNIQUE function
(2) Some documents in collection imdb_vertices are associated with a "releaseDate" field. What is the newest movie in the collection? HINT: MAX function
(3) Update a edge between "imdb_vertices/crime" and "imdb_vertices/5541" in collection imdb_edges with a label "has_movie", if the edge isn't exist, create one and insert it into the edge collection. HINT: keyword: UPSERT
(4) For documents in collection imdb_vertices, find the ids that don't include any number, save them with a label into a new collection named "genre". (HINTs: use regex expression SUBSTRING(doc._id,14)=~ "[a-zA-Z]", create the genre collections beforehand)
(5) Find actors whose name include "David", return documents that have the "birthplace" attribute. HINT: keyword like and HAS function
(6) Find the actor who have acted in the most number of movies. HINT: keyword COLLECT
(7) Regarding different movie genres, find the Top-5 genres with most number of movies in all time. HINT: keyword COLLECT
(8) Return the number of persons who are both actor and director. HINT: SELF-JOIN ON imdb_edges
(9) Given a movie "Forrest Gump", check its all associated actors. return their real names and role names. HINT: graph traversal
(10) Given a actor "Tom Hanks", find the directors who have cooperated with him more than twice. HINT: graph traversal
HINT: graph traversal and COLLECT.
(11) Think about a movie or actor you are interested in, visualize it in the ArangoDB and present some insights from the visulization.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment