Skip to content
Snippets Groups Projects
Commit 3c7b500f authored by Chao Zhang's avatar Chao Zhang
Browse files

Update hands-on.ipynb

parent 5e5730e6
Branches master
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# Hands-On Session for the tutorial: Multi-model Data query languages and processing paradigms in CIKM 2020
%% Cell type:markdown id: tags:
## 1: ArangoDB Installation
%% Cell type:markdown id: tags:
To better follow this hands-on session, please install the ArangoDB in advance.
You can install the lateset version by following the official instructions if your computer satisfies the requirement of v3.7.0:
* https://www.arangodb.com/docs/stable/installation.html
Or you may download and install the previous community builds (e.g., v3.4.0 https://download.arangodb.com/arangodb34/index.html) of ArangoDB
We will use the ArangoDB WebUI to perform the queries, the default url is *localhost:8529*, and the default username is root with the empty password.
%% Cell type:markdown id: tags:
## 2 Document store in ArangoDB : collections and documents
*Relational databases* contain *tables* of *records* (as *rows*).
An **ArangoDB document database** contains **collections** that contain **documents**. The documents follow the JSON format, and are usually stored in a binary format.
Below is an example of json document containing information of a student and corresponding scores.
** Score Document**
```
{"_id":0,"name":"aimee Zank",
"scores":[{"score":1.463179736705023,"type":"exam"},
{"score":11.78273309957772,"type":"quiz"},
{"score":35.8740349954354,"type":"homework"}]
}
```
%% Cell type:markdown id: tags:
### 2.1 Warm-up: Loading the score documents
### The score file is kept here. https://version.helsinki.fi/chzhang/cikm-2020-hands-on-session-for-multi-model-queries/-/blob/master/scores.json
Please import the file to the server using arangoimport as follows:
./arangoimport --file PATH-TO/scores.json --collection scores --create-collection true
Notes:
arangoimport options: https://www.arangodb.com/docs/stable/programs-arangoimport-options.html
I recommend to add the arangoimport path to the system PATH variable:
Default Path in Mac for arangoimport: /Applications/ArangoDB3-CLI.app/Contents/Resources
Default Path in Linux for arangoimport: /usr/local/Cellar/arangodb
Default Path in Windows for arangoimport: C:\Program Files\ArangoDB
%% Cell type:markdown id: tags:
### 2.2 Querying the documents
%% Cell type:code id: tags:
``` python
# Query 1: return a score document in the collection.
For doc in scores Filter doc.name =="Leonida Lafond" return doc
```
%% Cell type:code id: tags:
``` python
# Query 2: (multiple conditions) return a score document in the collection.
For doc in scores Filter doc.name =="Leonida Lafond" and doc._key=='@key' return doc
```
%% Cell type:code id: tags:
``` python
# Query 3: (array operator 1) find types of scores.
For doc in scores limit 1 return doc.scores[*].type
```
%% Cell type:code id: tags:
``` python
# Query 4: (array operator 2) find students whose exam scores are greater than 90.
For doc in scores limit 1 return doc.scores[* Filter CURRENT.score>90].score
```
%% Cell type:code id: tags:
``` python
# Query 5: (array operator 3) compute the average score.
For doc in scores limit 1 return AVERAGE(doc.scores[*].score)
```
%% Cell type:code id: tags:
``` python
# Query 6: flatten
Return FLATTEN([ 1, 2, [ 3, 4 ], 5, [ 6, 7 ], [ 8, [ 9, 10 ] ] ])
```
%% Cell type:code id: tags:
``` python
# Query 7: sorting
For doc in scores
Sort first(doc.scores[*].score) DESC
Return doc
```
%% Cell type:code id: tags:
``` python
# Query 8: grouping (with or without count)
For doc in scores
COLLECT name=doc.name into g
return {name,g}
```
%% Cell type:code id: tags:
``` python
# Query 9: define a variable using Let
FOR doc in scores
LET average_score=AVERAGE(doc.scores[*].score)
SORT average_score DESC
RETURN { name:doc.name,average_score:average_score}
```
%% Cell type:code id: tags:
``` python
# Query 10: Inner join between two collections
FOR doc1 in scores
FOR doc2 in scores
Filter doc1.name==doc2.name
return {doc1:doc1,doc2:doc2}
```
%% Cell type:markdown id: tags:
## 3. Graph store : nodes and edges
An ArangoDB graph database contains a set of node collections and edge collections.
%% Cell type:markdown id: tags:
### 3.1 Loading the example graphs
%% Cell type:markdown id: tags:
arangosh> var examples = require("@arangodb/graph-examples/example-graph.js");
arangosh> var g = examples.loadGraph("knows_graph");
%% Cell type:markdown id: tags:
### 3.2 Traversing the graphs
FOR vertex[, edge[, path]]
IN [min[..max]]
OUTBOUND|INBOUND|ANY startVertex
GRAPH graphName
[OPTIONS options]
%% Cell type:code id: tags:
``` python
# Query 11: find the friends of a given person.
// get a random person p
Let p= (For person in persons Sort rand() limit 1 return person)
// find the friends of p
FOR v,e,path
IN 1..1 any p[0]._id
GRAPH "knows_graph"
RETURN {p,v,e}
```
%% Cell type:code id: tags:
``` python
# Query 12: Filtering
# Filtering vertex
// get person bob
Let p= (For person in persons Filter person._key=='bob' return person)
// find the friends of p
FOR v,e
IN 1..1 any p[0]._id
GRAPH "knows_graph"
Filter v._key=='alice'
RETURN {p,v,e}
# Filtering path
// get person bob
Let p= (For person in persons Filter person._key=='bob' return person)
// find the friends of p
FOR v,e,path
IN 1..2 any p[0]._id
GRAPH "knows_graph"
Filter length(path.edges)>1
RETURN {p,v,e,path}
```
%% Cell type:code id: tags:
``` python
# Query 13: Graph functions -- Shortest Path
// find the friends of p
FOR v,e
IN Any SHORTEST_PATH
'persons/charlie' to 'persons/alice'
GRAPH "knows_graph"
RETURN {v,e}
```
%% Cell type:markdown id: tags:
## 4. Your turn - Querying the multi-model datasets
%% Cell type:markdown id: tags:
### 4.1 Data description
The data consists of three files: KnowsGraph.csv, Order.json, and Person.csv. Specifically, Person is the tabular data with fields(id, firstName, lastname, gender, birthday, creationDate, locationIp, browserUsed). KnowsGraph is the linked data, each row is an edge starting from a PersonId to another PersonId. Order is the json data with fields (OrderId, PersonId, OrderDate,TotalPrice, [Orderline:[productId,title,price ]). Note that Orderline is an array including more than one product.
Download the data here https://version.helsinki.fi/chzhang/cikm-2020-hands-on-session-for-multi-model-queries/-/tree/master/Multi-model-data and import them using arangoimport as follows:
./arangoimport --file PATH-TO/Multi-model-data/Person.csv --type csv --translate "id=_key" --collection "Person" --server.username root --create-collection true
./arangoimport --file PATH-TO/Multi-model-data/Order.json --type json --translate "id=_key" --collection "Order" --server.username root --create-collection true
./arangoimport --file PATH-TO/Multi-model-data/KnowsGraph.csv --type csv --translate "from=_from" --translate "to=_to" --collection "KnowsGraph" --from-collection-prefix Person --to-collection-prefix Person --server.username root --create-collection true --create-collection-type edge
%% Cell type:markdown id: tags:
### 4.2 Hands-on exercises
There are five tasks for querying the multi-model data as follows:
**Q1: Get the top-10 best-selling products in all orders.** Hint: We assume the quantity of each product in the orderline is one. Use the wildcard [*] to access the Orderline array, then use the flatten operator to expand the sub-array of productId, finally collect the ids into groups.
**Q1: Get the top-10 best-selling products in all orders.** Hint: we assume the quantity of each product in the orderline is one. Use the wildcard [*] to access the Orderline array, then use the flatten operator to expand the sub-array of productId, finally collect the ids into groups.
**Q2: Calculate the total cost of female’s orders in year 2008.** Hint: involve a join between the Person table and Order files. OrderDate contains the year information.
**Q3: Given a start person (_key='2199023262543'), return the number of orders made by this person’s friends in 2009.** Hint: friends are the outbound vertices of the start person.
**Q4: Given PersonX (_key='2199023259756') and PersonY (_key='26388279077535'), find the shortest path between them, and also return TOP-5 best-selling products for all persons in that path including PersonX and PersonY.** Hint: use SHORTEST_PATH function, see Query 13 as an example.
**Q5: Find the top-2 persons who spend the highest amount of money in JSON orders. Then for each person, traverse her knows-graph with 3-hop to find the friends, and finally return the number of common friends for these two persons.** Hint: use INTERSECTION function to find the common items of two lists.
%% Cell type:code id: tags:
``` python
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment