Skip to content
Snippets Groups Projects
hands-on.ipynb 12.1 KiB
Newer Older
Chao Zhang's avatar
Chao Zhang committed
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
Chao Zhang's avatar
Chao Zhang committed
    "# Hands-On Session for the tutorial: Multi-model Data query languages and processing paradigms in CIKM 2020\n"
Chao Zhang's avatar
Chao Zhang committed
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
Chao Zhang's avatar
Chao Zhang committed
    "## 1: ArangoDB Installation"
Chao Zhang's avatar
Chao Zhang committed
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
Chao Zhang's avatar
Chao Zhang committed
    "To better follow this hands-on session, please install the ArangoDB in advance. \n",
    "\n",
Chao Zhang's avatar
Chao Zhang committed
    "You can install the lateset version by following the official instructions if your computer satisfies the requirement of v3.7.0:\n",
Chao Zhang's avatar
Chao Zhang committed
    "\n",
    "* https://www.arangodb.com/docs/stable/installation.html\n",
    "\n",
Chao Zhang's avatar
Chao Zhang committed
    "Or you may download and install the previous community builds (e.g., v3.4.0 https://download.arangodb.com/arangodb34/index.html) of ArangoDB \n",
Chao Zhang's avatar
Chao Zhang committed
    "\n",
    "We will use the ArangoDB WebUI to perform the queries, the default url is *localhost:8529*, and the default username is root with the empty password."
Chao Zhang's avatar
Chao Zhang committed
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
Chao Zhang's avatar
Chao Zhang committed
    "## 2 Document store in ArangoDB : collections and documents\n",
Chao Zhang's avatar
Chao Zhang committed
    "\n",
    "*Relational databases* contain *tables* of *records* (as *rows*).\n",
    "\n",
Chao Zhang's avatar
Chao Zhang committed
    "An **ArangoDB document database** contains **collections** that contain **documents**. The documents follow the JSON format, and are usually stored in a binary format.\n",
Chao Zhang's avatar
Chao Zhang committed
    "\n",
    "Below is an example of json document containing information of a student and corresponding scores.\n",
    "\n",
    "** Score Document**\n",
    "```\n",
    "{\"_id\":0,\"name\":\"aimee Zank\",\n",
    " \"scores\":[{\"score\":1.463179736705023,\"type\":\"exam\"},\n",
    "           {\"score\":11.78273309957772,\"type\":\"quiz\"},\n",
    "           {\"score\":35.8740349954354,\"type\":\"homework\"}]\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
Chao Zhang's avatar
Chao Zhang committed
    "### 2.1  Warm-up: Loading the score documents\n",
Chao Zhang's avatar
Chao Zhang committed
    "\n",
Chao Zhang's avatar
Chao Zhang committed
    "### The score file is kept here. https://version.helsinki.fi/chzhang/cikm-2020-hands-on-session-for-multi-model-queries/-/blob/master/scores.json\n",
Chao Zhang's avatar
Chao Zhang committed
    "\n",
Chao Zhang's avatar
Chao Zhang committed
    "Please import the file to the server using arangoimport as follows:\n",
Chao Zhang's avatar
Chao Zhang committed
    "\n",
Chao Zhang's avatar
Chao Zhang committed
    "./arangoimport --file PATH-TO/scores.json --collection scores --create-collection true\n",
    "\n",
    "Notes:\n",
    "\n",
Chao Zhang's avatar
Chao Zhang committed
    "arangoimport options: https://www.arangodb.com/docs/stable/programs-arangoimport-options.html\n",
    "\n",
    "I recommend to add the arangoimport path to the system PATH variable:\n",
    "\n",
Chao Zhang's avatar
Chao Zhang committed
    "Default Path in Mac for arangoimport: /Applications/ArangoDB3-CLI.app/Contents/Resources\n",
Chao Zhang's avatar
Chao Zhang committed
    "\n",
Chao Zhang's avatar
Chao Zhang committed
    "Default Path in Linux for arangoimport: /usr/local/Cellar/arangodb\n",
    "\n",
    "Default Path in Windows for arangoimport: C:\\Program Files\\ArangoDB"
Chao Zhang's avatar
Chao Zhang committed
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
Chao Zhang's avatar
Chao Zhang committed
    "### 2.2 Querying the documents"
Chao Zhang's avatar
Chao Zhang committed
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 1: return a score document in the collection. \n",
    "\n",
    "For doc in scores Filter doc.name ==\"Leonida Lafond\" return doc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Query 2: (multiple conditions) return a score document in the collection. \n",
Chao Zhang's avatar
Chao Zhang committed
    "For doc in scores Filter doc.name ==\"Leonida Lafond\" and doc._key=='@key' return doc"
Chao Zhang's avatar
Chao Zhang committed
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 3: (array operator 1) find types of scores.\n",
    "For doc in scores limit 1 return doc.scores[*].type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 4: (array operator 2) find students whose exam scores are greater than 90.\n",
    "For doc in scores limit 1 return doc.scores[* Filter CURRENT.score>90].score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 5: (array operator 3) compute the average score.\n",
    "For doc in scores limit 1 return AVERAGE(doc.scores[*].score)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 6: flatten\n",
    "Return FLATTEN([ 1, 2, [ 3, 4 ], 5, [ 6, 7 ], [ 8, [ 9, 10 ] ] ])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 7: sorting \n",
    "For doc in scores\n",
    "    Sort first(doc.scores[*].score) DESC\n",
    "    Return doc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 8: grouping (with or without count)\n",
    "For doc in scores\n",
    "    COLLECT name=doc.name into g\n",
    "    return {name,g}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 9: define a variable using Let\n",
    "FOR doc in scores \n",
    "    LET average_score=AVERAGE(doc.scores[*].score)\n",
    "    SORT average_score DESC \n",
    "    RETURN { name:doc.name,average_score:average_score}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 10: Inner join between two collections\n",
Chao Zhang's avatar
Chao Zhang committed
    "    FOR doc1 in scores\n",
    "        FOR doc2 in scores\n",
    "            Filter doc1.name==doc2.name\n",
Chao Zhang's avatar
Chao Zhang committed
    "            return {doc1:doc1,doc2:doc2}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
Chao Zhang's avatar
Chao Zhang committed
    "## 3. Graph store : nodes and edges\n",
Chao Zhang's avatar
Chao Zhang committed
    "\n",
    "An ArangoDB graph database contains a set of node collections and edge collections."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
Chao Zhang's avatar
Chao Zhang committed
    "### 3.1 Loading the example graphs"
Chao Zhang's avatar
Chao Zhang committed
   ]
  },
  {
Chao Zhang's avatar
Chao Zhang committed
   "cell_type": "markdown",
Chao Zhang's avatar
Chao Zhang committed
   "metadata": {
Chao Zhang's avatar
Chao Zhang committed
    "collapsed": true
Chao Zhang's avatar
Chao Zhang committed
   },
   "source": [
    "arangosh> var examples = require(\"@arangodb/graph-examples/example-graph.js\");\n",
    "arangosh> var g = examples.loadGraph(\"knows_graph\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
Chao Zhang's avatar
Chao Zhang committed
    "### 3.2 Traversing the graphs\n",
Chao Zhang's avatar
Chao Zhang committed
    "\n",
    "FOR vertex[, edge[, path]]\n",
    "  IN [min[..max]]\n",
    "  OUTBOUND|INBOUND|ANY startVertex\n",
    "  GRAPH graphName\n",
    "  [OPTIONS options]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 11: find the friends of a given person. \n",
    "\n",
    "// get a random person p\n",
    "Let p= (For person in persons Sort rand() limit 1 return person)\n",
    "\n",
    "// find the friends of p\n",
    "FOR v,e,path\n",
    "IN  1..1 any p[0]._id\n",
    "GRAPH \"knows_graph\"\n",
    "RETURN {p,v,e}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 12: Filtering\n",
    "# Filtering vertex \n",
    "// get person bob\n",
    "Let p= (For person in persons Filter person._key=='bob' return person)\n",
    "\n",
    "// find the friends of p\n",
    "FOR v,e\n",
    "IN  1..1 any p[0]._id\n",
    "GRAPH \"knows_graph\"\n",
    "Filter v._key=='alice'\n",
    "RETURN {p,v,e}\n",
    "\n",
    "# Filtering path\n",
    "// get person bob\n",
    "Let p= (For person in persons Filter person._key=='bob' return person)\n",
    "\n",
    "// find the friends of p\n",
    "FOR v,e,path\n",
    "IN  1..2 any p[0]._id\n",
    "GRAPH \"knows_graph\"\n",
    "Filter length(path.edges)>1\n",
    "RETURN {p,v,e,path}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 13: Graph functions -- Shortest Path\n",
    "\n",
    "// find the friends of p\n",
    "FOR v,e\n",
    "IN Any SHORTEST_PATH\n",
    "'persons/charlie' to 'persons/alice'\n",
    "GRAPH \"knows_graph\"\n",
    "RETURN {v,e}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
Chao Zhang's avatar
Chao Zhang committed
    "## 4. Your turn - Querying the multi-model datasets"
Chao Zhang's avatar
Chao Zhang committed
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
Chao Zhang's avatar
Chao Zhang committed
    "### 4.1 Data description\n",
    "The data consists of three files: KnowsGraph.csv, Order.json, and Person.csv.  Specifically, Person is the tabular data with fields(id, firstName, lastname, gender, birthday, creationDate, locationIp, browserUsed). KnowsGraph is the linked data, each row is an edge starting from a PersonId to another PersonId. Order is the json data with fields (OrderId, PersonId, OrderDate,TotalPrice, [Orderline:[productId,title,price ]). Note that Orderline is an array including more than one product.\n",
Chao Zhang's avatar
Chao Zhang committed
    "\n",
Chao Zhang's avatar
Chao Zhang committed
    "Download the data here https://version.helsinki.fi/chzhang/cikm-2020-hands-on-session-for-multi-model-queries/-/tree/master/Multi-model-data and import them using arangoimport as follows: \n",
    "\n",
    "./arangoimport --file PATH-TO/Multi-model-data/Person.csv --type csv --translate \"id=_key\" --collection \"Person\" --server.username root --create-collection true\n",
    "\n",
    "./arangoimport --file PATH-TO/Multi-model-data/Order.json --type json --translate \"id=_key\" --collection \"Order\" --server.username root --create-collection true\n",
Chao Zhang's avatar
Chao Zhang committed
    "\n",
Chao Zhang's avatar
Chao Zhang committed
    "./arangoimport --file PATH-TO/Multi-model-data/KnowsGraph.csv --type csv --translate \"from=_from\" --translate \"to=_to\" --collection \"KnowsGraph\" --from-collection-prefix Person --to-collection-prefix Person --server.username root --create-collection true --create-collection-type edge"
Chao Zhang's avatar
Chao Zhang committed
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
Chao Zhang's avatar
Chao Zhang committed
    "### 4.2 Hands-on exercises\n",
Chao Zhang's avatar
Chao Zhang committed
    "  There are five tasks for querying the multi-model data as follows:\n",
    "  \n",
Chao Zhang's avatar
Chao Zhang committed
    "  **Q1: Get the top-10 best-selling products in all orders.** Hint: use the wildcard [*] to access the array and use the flatten operator to expand the sub-array.  Assume the quantity of each product in the orderline is one.\n",
Chao Zhang's avatar
Chao Zhang committed
    "  \n",
Chao Zhang's avatar
Chao Zhang committed
    "  **Q2: Calculate the total cost of female’s orders in year 2008.**  Hint: involve the Customer table and Order files, OrderDate contains the year information. \n",
Chao Zhang's avatar
Chao Zhang committed
    "  \n",
Chao Zhang's avatar
Chao Zhang committed
    "  **Q3: Given a start person (_key='2199023262543'), return the number of orders made by this person’s friends in 2009.** Hint: friends are the outbound vertices of the start person. \n",
Chao Zhang's avatar
Chao Zhang committed
    "  \n",
Chao Zhang's avatar
Chao Zhang committed
    "  **Q4: Given PersonX (_key='2199023259756') and PersonY (_key='26388279077535'), find the shortest path between them, and also return TOP-5 best-selling products for all persons in that path including PersonX and PersonY.** Hint: use SHORTEST_PATH function, see Query 13 as an example. \n",
Chao Zhang's avatar
Chao Zhang committed
    "  \n",
Chao Zhang's avatar
Chao Zhang committed
    "  **Q5: Find the top-2 persons who spend the highest amount of money in JSON orders. Then for each person, traverse her knows-graph with 3-hop to find the friends, and finally return the number of common friends for these two persons.** Hint: use INTERSECTION function to find the common items of two lists. \n",
Chao Zhang's avatar
Chao Zhang committed
    "\n",
Chao Zhang's avatar
Chao Zhang committed
    "  "
Chao Zhang's avatar
Chao Zhang committed
   ]
Chao Zhang's avatar
Chao Zhang committed
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
Chao Zhang's avatar
Chao Zhang committed
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}