{
"metadata": {
"name": "",
"signature": "sha256:6116fc753692e334327b47bf860e67da1a56d3d94315d831dccd09e1979b07bf"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"# Blaze Expressions\n",
"\n",
"Blaze expressions convey intent from the user.\n",
"\n",
"Blaze compute functions interpret from expressions to backend.\n",
"\n",
"The interface in this notebook is not intended for interactive use. You may find *interactive expressions* (like `Data`) more useful for data exploration."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from blaze import symbol, compute, join"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"bank = symbol('bank', '''1000 * {id: int, \n",
" name: string, \n",
" balance: int,\n",
" lastseen: datetime}''')\n",
"bank # no data to see here"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stderr",
"text": [
"/home/mrocklin/Software/anaconda/lib/python2.7/site-packages/IPython/core/formatters.py:239: FormatterWarning: Exception in text/html formatter: Expression does not contain data resources\n",
" FormatterWarning,\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 2,
"text": [
"bank"
]
}
],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"deadbeats = bank[bank.balance < 0][['name', 'lastseen']]\n",
"deadbeats"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
"bank[bank.balance < 0][['name', 'lastseen']]"
]
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"deadbeats.dshape"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
"dshape(\"var * {name: string, lastseen: datetime}\")"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Compute recipes\n",
"\n",
"Blaze interprets expressions against backends by consulting a repository of small recipes.\n",
"\n",
"We look at some simple recipes for Python, Pandas, and Spark"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"L = [[1, 'Alice', 100],\n",
" [2, 'Bob', -200],\n",
" [3, 'Charlie', 300],\n",
" [4, 'Dennis', 400],\n",
" [5, 'Edith', -500]]\n",
"\n",
"from pandas import DataFrame\n",
"\n",
"df = DataFrame([[1, 'Alice', 100], \n",
" [2, 'Bob', -200],\n",
" [3, 'Charlie', 300],\n",
" [4, 'Denis', 400],\n",
" [5, 'Edith', -500]], columns=['id', 'name', 'balance'])\n",
"\n",
"import pyspark\n",
"\n",
"sc = pyspark.SparkContext('local', 'blaze-app')\n",
"rdd = sc.parallelize(L)\n",
"\n",
"bank = symbol('bank', '''1000 * {id: int, \n",
" name: string, \n",
" balance: int}''')\n",
"\n",
"deadbeats = bank[bank.balance < 0].name"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"compute(deadbeats, L)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
""
]
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"compute(deadbeats, df)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
"1 Bob\n",
"4 Edith\n",
"Name: name, dtype: object"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"compute(deadbeats, rdd)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
"PythonRDD[1] at RDD at PythonRDD.scala:43"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How Blaze handles numeric evaluation\n",
"\n",
"*or, how to stay sane while trying to engage the entire Python ecosystem*"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from blaze.compute.core import compute_up"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"compute_up.source(bank.head(), df)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"File: /home/mrocklin/workspace/blaze/blaze/compute/pandas.py\n",
"\n",
"@dispatch(Head, (Series, DataFrame))\n",
"def compute_up(t, df, **kwargs):\n",
" return df.head(t.n)\n",
"\n"
]
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"compute_up.source(bank.head(), L)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"File: /home/mrocklin/workspace/blaze/blaze/compute/python.py\n",
"\n",
"@dispatch(Head, Sequence)\n",
"def compute_up(t, seq, **kwargs):\n",
" if t.n < 100:\n",
" return tuple(take(t.n, seq))\n",
" else:\n",
" return take(t.n, seq)\n",
"\n"
]
}
],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"compute_up.source(bank.head(), rdd)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"File: /home/mrocklin/workspace/blaze/blaze/compute/spark.py\n",
"\n",
"@dispatch(Head, RDD)\n",
"def compute_up(t, rdd, **kwargs):\n",
" return rdd.take(t.n)\n",
"\n"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### N-Dimensional example"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"x = symbol('x', '1000 * 1000 * {measurement: float32, timestamp: datetime}')\n",
"x"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"x"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
"x"
]
}
],
"prompt_number": 13
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"expr = x[:500].measurement.sum(axis=1)\n",
"expr"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
"sum(x[:500].measurement, axis=(1,))"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"expr.dshape"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
"dshape(\"500 * float32\")"
]
}
],
"prompt_number": 15
}
],
"metadata": {}
}
]
}