" There are five tasks for querying the multi-model data as follows:\n",
" \n",
" **Q1: Get the top-10 best-selling products in all orders.** Hint: We assume the quantity of each product in the orderline is one. Use the wildcard [*] to access the Orderline array, then use the flatten operator to expand the sub-array of productId, finally collect the ids into groups.\n",
" **Q1: Get the top-10 best-selling products in all orders.** Hint: we assume the quantity of each product in the orderline is one. Use the wildcard [*] to access the Orderline array, then use the flatten operator to expand the sub-array of productId, finally collect the ids into groups.\n",
" \n",
" **Q2: Calculate the total cost of female’s orders in year 2008.** Hint: involve a join between the Person table and Order files. OrderDate contains the year information. \n",
" \n",
...
...
%% Cell type:markdown id: tags:
# Hands-On Session for the tutorial: Multi-model Data query languages and processing paradigms in CIKM 2020
%% Cell type:markdown id: tags:
## 1: ArangoDB Installation
%% Cell type:markdown id: tags:
To better follow this hands-on session, please install the ArangoDB in advance.
You can install the lateset version by following the official instructions if your computer satisfies the requirement of v3.7.0:
Or you may download and install the previous community builds (e.g., v3.4.0 https://download.arangodb.com/arangodb34/index.html) of ArangoDB
We will use the ArangoDB WebUI to perform the queries, the default url is *localhost:8529*, and the default username is root with the empty password.
%% Cell type:markdown id: tags:
## 2 Document store in ArangoDB : collections and documents
*Relational databases* contain *tables* of *records* (as *rows*).
An **ArangoDB document database** contains **collections** that contain **documents**. The documents follow the JSON format, and are usually stored in a binary format.
Below is an example of json document containing information of a student and corresponding scores.
## 4. Your turn - Querying the multi-model datasets
%% Cell type:markdown id: tags:
### 4.1 Data description
The data consists of three files: KnowsGraph.csv, Order.json, and Person.csv. Specifically, Person is the tabular data with fields(id, firstName, lastname, gender, birthday, creationDate, locationIp, browserUsed). KnowsGraph is the linked data, each row is an edge starting from a PersonId to another PersonId. Order is the json data with fields (OrderId, PersonId, OrderDate,TotalPrice, [Orderline:[productId,title,price ]). Note that Orderline is an array including more than one product.
Download the data here https://version.helsinki.fi/chzhang/cikm-2020-hands-on-session-for-multi-model-queries/-/tree/master/Multi-model-data and import them using arangoimport as follows:
There are five tasks for querying the multi-model data as follows:
**Q1: Get the top-10 best-selling products in all orders.** Hint: We assume the quantity of each product in the orderline is one. Use the wildcard [*] to access the Orderline array, then use the flatten operator to expand the sub-array of productId, finally collect the ids into groups.
**Q1: Get the top-10 best-selling products in all orders.** Hint: we assume the quantity of each product in the orderline is one. Use the wildcard [*] to access the Orderline array, then use the flatten operator to expand the sub-array of productId, finally collect the ids into groups.
**Q2: Calculate the total cost of female’s orders in year 2008.** Hint: involve a join between the Person table and Order files. OrderDate contains the year information.
**Q3: Given a start person (_key='2199023262543'), return the number of orders made by this person’s friends in 2009.** Hint: friends are the outbound vertices of the start person.
**Q4: Given PersonX (_key='2199023259756') and PersonY (_key='26388279077535'), find the shortest path between them, and also return TOP-5 best-selling products for all persons in that path including PersonX and PersonY.** Hint: use SHORTEST_PATH function, see Query 13 as an example.
**Q5: Find the top-2 persons who spend the highest amount of money in JSON orders. Then for each person, traverse her knows-graph with 3-hop to find the friends, and finally return the number of common friends for these two persons.** Hint: use INTERSECTION function to find the common items of two lists.