{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hands-On Session for the tutorial: Multi-model Data query languages and processing paradigms in CIKM 2020\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1: ArangoDB Installation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "To better follow this hands-on session, please install the ArangoDB in advance. \n",
    "\n",
    "Please download and install the previous community builds (e.g., v3.4.0 https://download.arangodb.com/arangodb34/index.html) of ArangoDB \n",
    "\n",
    "Or you can install the lateset version by following the official instructions if your computer satisfies the requirement of v3.7.0:\n",
    "\n",
    "* https://www.arangodb.com/docs/stable/installation.html\n",
    "\n",
    "We recommend to use the ArangoDB WebUI to perform the queries, the default url is *localhost:8529*, and the default username is root with the empty password."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2 Document store in ArangoDB : collections and documents\n",
    "\n",
    "*Relational databases* contain *tables* of *records* (as *rows*).\n",
    "\n",
    "An **ArangoDB document database** contains **collections** that contain **documents**. The documents follow the JSON format, and are usually stored in a binary format.\n",
    "\n",
    "Below is an example of json document containing information of a student and corresponding scores.\n",
    "\n",
    "** Score Document**\n",
    "```\n",
    "{\"_id\":0,\"name\":\"aimee Zank\",\n",
    " \"scores\":[{\"score\":1.463179736705023,\"type\":\"exam\"},\n",
    "           {\"score\":11.78273309957772,\"type\":\"quiz\"},\n",
    "           {\"score\":35.8740349954354,\"type\":\"homework\"}]\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### 2.1  warm-up: Loading the score documents\n",
    "\n",
    "### The score files is kept here. https://version.helsinki.fi/chzhang/cikm-2020-hands-on-session-for-multi-model-queries/-/blob/master/scores.json\n",
    "\n",
    "### You may create a collection named scores in the webUI and upload the file manully\n",
    "\n",
    "### or import the file to the server using arangoimp as follows:\n",
    "arangoimp --file scores.json --collection scores --create-collection true"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Arango Query Language (AQL) basics on documents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Basically, AQL includes the following operations:\n",
    "\n",
    "  FOR: array iteration\n",
    "  \n",
    "  RETURN: results projection\n",
    "  \n",
    "  FILTER: results filtering\n",
    "  \n",
    "  SORT: result sorting\n",
    "  \n",
    "  LIMIT: result slicing\n",
    "  \n",
    "  LET: variable assignment\n",
    "  \n",
    "  COLLECT: result grouping\n",
    "  \n",
    "  INSERT: insertion of new documents\n",
    "  \n",
    "  UPDATE: (partial) update of existing documents\n",
    "  \n",
    "  REPLACE: replacement of existing documents\n",
    "  \n",
    "  REMOVE: removal of existing documents\n",
    "  \n",
    "  UPSERT: insertion or update of existing documents"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 Arango Query Language (AQL) on documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 1: return a score document in the collection. \n",
    "\n",
    "For doc in scores Filter doc.name ==\"Leonida Lafond\" return doc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Query 2: (multiple conditions) return a score document in the collection. \n",
    "For doc in scores Filter doc.name ==\"Leonida Lafond\" and doc._key=='266197464913' return doc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 3: (array operator 1) find types of scores.\n",
    "For doc in scores limit 1 return doc.scores[*].type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 4: (array operator 2) find students whose exam scores are greater than 90.\n",
    "For doc in scores limit 1 return doc.scores[* Filter CURRENT.score>90].score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 5: (array operator 3) compute the average score.\n",
    "For doc in scores limit 1 return AVERAGE(doc.scores[*].score)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 6: flatten\n",
    "Return FLATTEN([ 1, 2, [ 3, 4 ], 5, [ 6, 7 ], [ 8, [ 9, 10 ] ] ])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 7: sorting \n",
    "For doc in scores\n",
    "    Sort first(doc.scores[*].score) DESC\n",
    "    Return doc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 8: grouping (with or without count)\n",
    "For doc in scores\n",
    "    COLLECT name=doc.name into g\n",
    "    return {name,g}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 9: define a variable using Let\n",
    "FOR doc in scores \n",
    "    LET average_score=AVERAGE(doc.scores[*].score)\n",
    "    SORT average_score DESC \n",
    "    RETURN { name:doc.name,average_score:average_score}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 10: Inner join between two collections\n",
    "    FOR doc1 in collection1\n",
    "        FOR doc2 in collection2\n",
    "            Filter doc1.id==doc2.id\n",
    "            return {doc1:doc1,doc2:doc2}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Graph store : nodes and edges\n",
    "\n",
    "An ArangoDB graph database contains a set of node collections and edge collections."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1 Loading the example graphs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "arangosh> var examples = require(\"@arangodb/graph-examples/example-graph.js\");\n",
    "arangosh> var g = examples.loadGraph(\"knows_graph\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Traversing the graphs\n",
    "\n",
    "FOR vertex[, edge[, path]]\n",
    "  IN [min[..max]]\n",
    "  OUTBOUND|INBOUND|ANY startVertex\n",
    "  GRAPH graphName\n",
    "  [OPTIONS options]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 11: find the friends of a given person. \n",
    "\n",
    "// get a random person p\n",
    "Let p= (For person in persons Sort rand() limit 1 return person)\n",
    "\n",
    "// find the friends of p\n",
    "FOR v,e,path\n",
    "IN  1..1 any p[0]._id\n",
    "GRAPH \"knows_graph\"\n",
    "RETURN {p,v,e}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 12: Filtering\n",
    "# Filtering vertex \n",
    "// get person bob\n",
    "Let p= (For person in persons Filter person._key=='bob' return person)\n",
    "\n",
    "// find the friends of p\n",
    "FOR v,e\n",
    "IN  1..1 any p[0]._id\n",
    "GRAPH \"knows_graph\"\n",
    "Filter v._key=='alice'\n",
    "RETURN {p,v,e}\n",
    "\n",
    "# Filtering path\n",
    "// get person bob\n",
    "Let p= (For person in persons Filter person._key=='bob' return person)\n",
    "\n",
    "// find the friends of p\n",
    "FOR v,e,path\n",
    "IN  1..2 any p[0]._id\n",
    "GRAPH \"knows_graph\"\n",
    "Filter length(path.edges)>1\n",
    "RETURN {p,v,e,path}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Query 13: Graph functions -- Shortest Path\n",
    "\n",
    "// find the friends of p\n",
    "FOR v,e\n",
    "IN Any SHORTEST_PATH\n",
    "'persons/charlie' to 'persons/alice'\n",
    "GRAPH \"knows_graph\"\n",
    "RETURN {v,e}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### 3. Your turn - Querying the UniBench datasets\n",
    "\n",
    "Download the UniBench dataset here and import them as follows:\n",
    "\n",
    "arangorestore dump --server.database handson"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Questions\n",
    "(1) How many unique types of vertices and unique labels of edges are there in two collections respectively? HINT:  UNIQUE function\n",
    "\n",
    "(2) Some documents in collection imdb_vertices are associated with a \"releaseDate\" field. What is the newest movie in the collection? HINT: MAX function\n",
    "\n",
    "(3) Update a edge between \"imdb_vertices/crime\" and \"imdb_vertices/5541\" in collection imdb_edges with a label \"has_movie\", if the edge isn't exist, create one and insert it into the edge collection. HINT: keyword: UPSERT\n",
    "\n",
    "(4) For documents in collection imdb_vertices, find the ids that don't include any number, save them with a label  into a new collection named \"genre\". (HINTs: use regex expression SUBSTRING(doc._id,14)=~ \"[a-zA-Z]\", create the genre collections beforehand)\n",
    "\n",
    "(5) Find actors whose name include \"David\", return documents that have the \"birthplace\" attribute. HINT: keyword like and HAS function\n",
    "\n",
    "(6) Find the actor who have acted in the most number of movies. HINT: keyword COLLECT\n",
    "\n",
    "(7) Regarding different movie genres, find the Top-5 genres with most number of movies in all time. HINT: keyword COLLECT\n",
    "\n",
    "(8) Return the number of persons who are both actor and director. HINT: SELF-JOIN ON imdb_edges\n",
    "\n",
    "(9) Given a movie \"Forrest Gump\", check its all associated actors. return their real names and role names. HINT: graph traversal\n",
    "\n",
    "(10) Given a actor \"Tom Hanks\", find the directors who have cooperated with him more than twice. HINT: graph traversal\n",
    "HINT: graph traversal and COLLECT.\n",
    "\n",
    "(11) Think about a movie or actor you are interested in, visualize it in the ArangoDB and present some insights from the visulization."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}