{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Hands-On Session for the tutorial: Multi-model Data query languages and processing paradigms in CIKM 2020\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1: ArangoDB Installation" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "To better follow this hands-on session, please install the ArangoDB in advance. \n", "\n", "Please download and install the previous community builds (e.g., v3.4.0 https://download.arangodb.com/arangodb34/index.html) of ArangoDB \n", "\n", "Or you can install the lateset version by following the official instructions if your computer satisfies the requirement of v3.7.0:\n", "\n", "* https://www.arangodb.com/docs/stable/installation.html\n", "\n", "We recommend to use the ArangoDB WebUI to perform the queries, the default url is *localhost:8529*, and the default username is root with the empty password." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2 Document store in ArangoDB : collections and documents\n", "\n", "*Relational databases* contain *tables* of *records* (as *rows*).\n", "\n", "An **ArangoDB document database** contains **collections** that contain **documents**. The documents follow the JSON format, and are usually stored in a binary format.\n", "\n", "Below is an example of json document containing information of a student and corresponding scores.\n", "\n", "** Score Document**\n", "```\n", "{\"_id\":0,\"name\":\"aimee Zank\",\n", " \"scores\":[{\"score\":1.463179736705023,\"type\":\"exam\"},\n", " {\"score\":11.78273309957772,\"type\":\"quiz\"},\n", " {\"score\":35.8740349954354,\"type\":\"homework\"}]\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### 2.1 warm-up: Loading the score documents\n", "\n", "### The score files is kept here. https://version.helsinki.fi/chzhang/cikm-2020-hands-on-session-for-multi-model-queries/-/blob/master/scores.json\n", "\n", "### You may create a collection named scores in the webUI and upload the file manully\n", "\n", "### or import the file to the server using arangoimp as follows:\n", "arangoimp --file scores.json --collection scores --create-collection true" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 Arango Query Language (AQL) basics on documents" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Basically, AQL includes the following operations:\n", "\n", " FOR: array iteration\n", " \n", " RETURN: results projection\n", " \n", " FILTER: results filtering\n", " \n", " SORT: result sorting\n", " \n", " LIMIT: result slicing\n", " \n", " LET: variable assignment\n", " \n", " COLLECT: result grouping\n", " \n", " INSERT: insertion of new documents\n", " \n", " UPDATE: (partial) update of existing documents\n", " \n", " REPLACE: replacement of existing documents\n", " \n", " REMOVE: removal of existing documents\n", " \n", " UPSERT: insertion or update of existing documents" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3 Arango Query Language (AQL) on documents" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Query 1: return a score document in the collection. \n", "\n", "For doc in scores Filter doc.name ==\"Leonida Lafond\" return doc" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Query 2: (multiple conditions) return a score document in the collection. \n", "For doc in scores Filter doc.name ==\"Leonida Lafond\" and doc._key=='266197464913' return doc" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Query 3: (array operator 1) find types of scores.\n", "For doc in scores limit 1 return doc.scores[*].type" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Query 4: (array operator 2) find students whose exam scores are greater than 90.\n", "For doc in scores limit 1 return doc.scores[* Filter CURRENT.score>90].score" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Query 5: (array operator 3) compute the average score.\n", "For doc in scores limit 1 return AVERAGE(doc.scores[*].score)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Query 6: flatten\n", "Return FLATTEN([ 1, 2, [ 3, 4 ], 5, [ 6, 7 ], [ 8, [ 9, 10 ] ] ])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Query 7: sorting \n", "For doc in scores\n", " Sort first(doc.scores[*].score) DESC\n", " Return doc" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Query 8: grouping (with or without count)\n", "For doc in scores\n", " COLLECT name=doc.name into g\n", " return {name,g}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Query 9: define a variable using Let\n", "FOR doc in scores \n", " LET average_score=AVERAGE(doc.scores[*].score)\n", " SORT average_score DESC \n", " RETURN { name:doc.name,average_score:average_score}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Query 10: Inner join between two collections\n", " FOR doc1 in collection1\n", " FOR doc2 in collection2\n", " Filter doc1.id==doc2.id\n", " return {doc1:doc1,doc2:doc2}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Graph store : nodes and edges\n", "\n", "An ArangoDB graph database contains a set of node collections and edge collections." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1 Loading the example graphs" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "arangosh> var examples = require(\"@arangodb/graph-examples/example-graph.js\");\n", "arangosh> var g = examples.loadGraph(\"knows_graph\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 Traversing the graphs\n", "\n", "FOR vertex[, edge[, path]]\n", " IN [min[..max]]\n", " OUTBOUND|INBOUND|ANY startVertex\n", " GRAPH graphName\n", " [OPTIONS options]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Query 11: find the friends of a given person. \n", "\n", "// get a random person p\n", "Let p= (For person in persons Sort rand() limit 1 return person)\n", "\n", "// find the friends of p\n", "FOR v,e,path\n", "IN 1..1 any p[0]._id\n", "GRAPH \"knows_graph\"\n", "RETURN {p,v,e}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Query 12: Filtering\n", "# Filtering vertex \n", "// get person bob\n", "Let p= (For person in persons Filter person._key=='bob' return person)\n", "\n", "// find the friends of p\n", "FOR v,e\n", "IN 1..1 any p[0]._id\n", "GRAPH \"knows_graph\"\n", "Filter v._key=='alice'\n", "RETURN {p,v,e}\n", "\n", "# Filtering path\n", "// get person bob\n", "Let p= (For person in persons Filter person._key=='bob' return person)\n", "\n", "// find the friends of p\n", "FOR v,e,path\n", "IN 1..2 any p[0]._id\n", "GRAPH \"knows_graph\"\n", "Filter length(path.edges)>1\n", "RETURN {p,v,e,path}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Query 13: Graph functions -- Shortest Path\n", "\n", "// find the friends of p\n", "FOR v,e\n", "IN Any SHORTEST_PATH\n", "'persons/charlie' to 'persons/alice'\n", "GRAPH \"knows_graph\"\n", "RETURN {v,e}" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### 3. Your turn - Querying the UniBench datasets\n", "\n", "Download the UniBench dataset here and import them as follows:\n", "\n", "arangorestore dump --server.database handson" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Questions\n", "(1) How many unique types of vertices and unique labels of edges are there in two collections respectively? HINT: UNIQUE function\n", "\n", "(2) Some documents in collection imdb_vertices are associated with a \"releaseDate\" field. What is the newest movie in the collection? HINT: MAX function\n", "\n", "(3) Update a edge between \"imdb_vertices/crime\" and \"imdb_vertices/5541\" in collection imdb_edges with a label \"has_movie\", if the edge isn't exist, create one and insert it into the edge collection. HINT: keyword: UPSERT\n", "\n", "(4) For documents in collection imdb_vertices, find the ids that don't include any number, save them with a label into a new collection named \"genre\". (HINTs: use regex expression SUBSTRING(doc._id,14)=~ \"[a-zA-Z]\", create the genre collections beforehand)\n", "\n", "(5) Find actors whose name include \"David\", return documents that have the \"birthplace\" attribute. HINT: keyword like and HAS function\n", "\n", "(6) Find the actor who have acted in the most number of movies. HINT: keyword COLLECT\n", "\n", "(7) Regarding different movie genres, find the Top-5 genres with most number of movies in all time. HINT: keyword COLLECT\n", "\n", "(8) Return the number of persons who are both actor and director. HINT: SELF-JOIN ON imdb_edges\n", "\n", "(9) Given a movie \"Forrest Gump\", check its all associated actors. return their real names and role names. HINT: graph traversal\n", "\n", "(10) Given a actor \"Tom Hanks\", find the directors who have cooperated with him more than twice. HINT: graph traversal\n", "HINT: graph traversal and COLLECT.\n", "\n", "(11) Think about a movie or actor you are interested in, visualize it in the ArangoDB and present some insights from the visulization." ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 1 }