API¶

This page contains a comprehensive list of functionality within blaze. Docstrings should provide sufficient understanding for any individual function or class.

Interactive Use¶

_Data

Expressions¶

`Projection`	Select a subset of fields from data.
`Selection`	Filter elements of expression based on predicate
`Label`	An expression with a name.
`ReLabel`	Table with same content but with new labels
`Map`	Map an arbitrary Python function across elements in a collection
`Apply`	Apply an arbitrary Python function onto an expression
`Coerce`	Coerce an expression to a different type.
`Coalesce`	SQL like coalesce.
`Cast`	Cast an expression to a different type.

`Sort`	Table in sorted order
`Distinct`	Remove duplicate elements from an expression
`Head`	First n elements of collection
`Merge`	Merge many fields together
`Join`	Join two tables on common columns
`Concat`	Stack tables on common columns
`IsIn`	Check if an expression contains values from a set.

`By`	Split-Apply-Combine Operator

Blaze Server¶

Server

Client

Additional Server Utilities¶

`expr_md5`
`to_tree`
`from_tree`

`data_spider`
`from_yaml`

Definitions¶

class blaze.expr.collections.Concat¶

Stack tables on common columns

Parameters:	lhs, rhs : Expr Collections to concatenate axis : int, optional The axis to concatenate on.

See also

blaze.expr.collections.Merge

Examples

>>> from blaze import symbol

Vertically stack tables:

>>> names = symbol('names', '5 * {name: string, id: int32}')
>>> more_names = symbol('more_names', '7 * {name: string, id: int32}')
>>> stacked = concat(names, more_names)
>>> stacked.dshape
dshape("12 * {name: string, id: int32}")

Vertically stack matrices:

>>> mat_a = symbol('a', '3 * 5 * int32')
>>> mat_b = symbol('b', '3 * 5 * int32')
>>> vstacked = concat(mat_a, mat_b, axis=0)
>>> vstacked.dshape
dshape("6 * 5 * int32")

Horizontally stack matrices:

>>> hstacked = concat(mat_a, mat_b, axis=1)
>>> hstacked.dshape
dshape("3 * 10 * int32")

blaze.expr.collections.concat(lhs, rhs, axis=0)¶

Stack tables on common columns

Parameters:	lhs, rhs : Expr Collections to concatenate axis : int, optional The axis to concatenate on.

See also

blaze.expr.collections.Merge

Examples

>>> from blaze import symbol

Vertically stack tables:

>>> names = symbol('names', '5 * {name: string, id: int32}')
>>> more_names = symbol('more_names', '7 * {name: string, id: int32}')
>>> stacked = concat(names, more_names)
>>> stacked.dshape
dshape("12 * {name: string, id: int32}")

Vertically stack matrices:

>>> mat_a = symbol('a', '3 * 5 * int32')
>>> mat_b = symbol('b', '3 * 5 * int32')
>>> vstacked = concat(mat_a, mat_b, axis=0)
>>> vstacked.dshape
dshape("6 * 5 * int32")

Horizontally stack matrices:

>>> hstacked = concat(mat_a, mat_b, axis=1)
>>> hstacked.dshape
dshape("3 * 10 * int32")

class blaze.expr.collections.Distinct¶

Remove duplicate elements from an expression

Parameters:	on : tuple of `Field` The subset of fields or names of fields to be distinct on.

Examples

>>> from blaze import symbol
>>> t = symbol('t', 'var * {name: string, amount: int, id: int}')
>>> e = distinct(t)

>>> data = [('Alice', 100, 1),
...         ('Bob', 200, 2),
...         ('Alice', 100, 1)]

>>> from blaze.compute.python import compute
>>> sorted(compute(e, data))
[('Alice', 100, 1), ('Bob', 200, 2)]

Use a subset by passing on:

>>> import pandas as pd
>>> e = distinct(t, 'name')
>>> data = pd.DataFrame([['Alice', 100, 1],
...                      ['Alice', 200, 2],
...                      ['Bob', 100, 1],
...                      ['Bob', 200, 2]],
...                     columns=['name', 'amount', 'id'])
>>> compute(e, data)
    name  amount  id
0  Alice     100   1
1    Bob     100   1

blaze.expr.collections.distinct(expr, *on)¶

Remove duplicate elements from an expression

Parameters:	on : tuple of `Field` The subset of fields or names of fields to be distinct on.

Examples

>>> from blaze import symbol
>>> t = symbol('t', 'var * {name: string, amount: int, id: int}')
>>> e = distinct(t)

>>> data = [('Alice', 100, 1),
...         ('Bob', 200, 2),
...         ('Alice', 100, 1)]

>>> from blaze.compute.python import compute
>>> sorted(compute(e, data))
[('Alice', 100, 1), ('Bob', 200, 2)]

Use a subset by passing on:

>>> import pandas as pd
>>> e = distinct(t, 'name')
>>> data = pd.DataFrame([['Alice', 100, 1],
...                      ['Alice', 200, 2],
...                      ['Bob', 100, 1],
...                      ['Bob', 200, 2]],
...                     columns=['name', 'amount', 'id'])
>>> compute(e, data)
    name  amount  id
0  Alice     100   1
1    Bob     100   1

class blaze.expr.collections.Head¶

First n elements of collection

See also

blaze.expr.collections.Tail

Examples

>>> from blaze import symbol
>>> accounts = symbol('accounts', 'var * {name: string, amount: int}')
>>> accounts.head(5).dshape
dshape("5 * {name: string, amount: int32}")

blaze.expr.collections.head(child, n=10)¶

First n elements of collection

See also

blaze.expr.collections.Tail

Examples

>>> from blaze import symbol
>>> accounts = symbol('accounts', 'var * {name: string, amount: int}')
>>> accounts.head(5).dshape
dshape("5 * {name: string, amount: int32}")

class blaze.expr.collections.IsIn¶

Check if an expression contains values from a set.

Return a boolean expression indicating whether another expression contains values that are members of a collection.

Parameters:	expr : Expr Expression whose elements to check for membership in keys keys : Sequence Elements to test against. Blaze stores this as a `frozenset`.

Examples

Check if a vector contains any of 1, 2 or 3:

>>> from blaze import symbol
>>> t = symbol('t', '10 * int64')
>>> expr = t.isin([1, 2, 3])
>>> expr.dshape
dshape("10 * bool")

blaze.expr.collections.isin(expr, keys)¶

Check if an expression contains values from a set.

Return a boolean expression indicating whether another expression contains values that are members of a collection.

Parameters:	expr : Expr Expression whose elements to check for membership in keys keys : Sequence Elements to test against. Blaze stores this as a `frozenset`.

Examples

Check if a vector contains any of 1, 2 or 3:

>>> from blaze import symbol
>>> t = symbol('t', '10 * int64')
>>> expr = t.isin([1, 2, 3])
>>> expr.dshape
dshape("10 * bool")

class blaze.expr.collections.Join¶

Join two tables on common columns

Parameters:

lhs, rhs : Expr: Expressions to join
on_left : str, optional: The fields from the left side to join on. If no on_right is passed, then these are the fields for both sides.
on_right : str, optional: The fields from the right side to join on.
how : {‘inner’, ‘outer’, ‘left’, ‘right’}: What type of join to perform.
suffixes: pair of str: The suffixes to be applied to the left and right sides in order to resolve duplicate field names.

See also

blaze.expr.collections.Merge

Examples

>>> from blaze import symbol
>>> names = symbol('names', 'var * {name: string, id: int}')
>>> amounts = symbol('amounts', 'var * {amount: int, id: int}')

Join tables based on shared column name

>>> joined = join(names, amounts, 'id')

Join based on different column names

>>> amounts = symbol('amounts', 'var * {amount: int, acctNumber: int}')
>>> joined = join(names, amounts, 'id', 'acctNumber')

blaze.expr.collections.join(lhs, rhs, on_left=None, on_right=None, how='inner', suffixes=('_left', '_right'))¶

Join two tables on common columns

Parameters:

lhs, rhs : Expr: Expressions to join
on_left : str, optional: The fields from the left side to join on. If no on_right is passed, then these are the fields for both sides.
on_right : str, optional: The fields from the right side to join on.
how : {‘inner’, ‘outer’, ‘left’, ‘right’}: What type of join to perform.
suffixes: pair of str: The suffixes to be applied to the left and right sides in order to resolve duplicate field names.

See also

blaze.expr.collections.Merge

Examples

>>> from blaze import symbol
>>> names = symbol('names', 'var * {name: string, id: int}')
>>> amounts = symbol('amounts', 'var * {amount: int, id: int}')

Join tables based on shared column name

>>> joined = join(names, amounts, 'id')

Join based on different column names

>>> amounts = symbol('amounts', 'var * {amount: int, acctNumber: int}')
>>> joined = join(names, amounts, 'id', 'acctNumber')

class blaze.expr.collections.Merge¶

Merge many fields together

Parameters:	labeled_exprs : iterable[Expr] The positional expressions to merge. These will use the expression’s _name as the key in the resulting table. *named_exprs : dict[str, Expr] The named expressions to label and merge into the table.

See also

label

Notes

To control the ordering of the fields, use label:

>>> merge(label(accounts.name, 'NAME'), label(accounts.x, 'X')).dshape
dshape("var * {NAME: string, X: int32}")
>>> merge(label(accounts.x, 'X'), label(accounts.name, 'NAME')).dshape
dshape("var * {X: int32, NAME: string}")

Examples

>>> from blaze import symbol, label
>>> accounts = symbol('accounts', 'var * {name: string, x: int, y: real}')
>>> merge(accounts.name, z=accounts.x + accounts.y).fields
['name', 'z']

blaze.expr.collections.merge(*exprs, **kwargs)¶

Merge many fields together

Parameters:	labeled_exprs : iterable[Expr] The positional expressions to merge. These will use the expression’s _name as the key in the resulting table. *named_exprs : dict[str, Expr] The named expressions to label and merge into the table.

See also

label

Notes

To control the ordering of the fields, use label:

>>> merge(label(accounts.name, 'NAME'), label(accounts.x, 'X')).dshape
dshape("var * {NAME: string, X: int32}")
>>> merge(label(accounts.x, 'X'), label(accounts.name, 'NAME')).dshape
dshape("var * {X: int32, NAME: string}")

Examples

>>> from blaze import symbol, label
>>> accounts = symbol('accounts', 'var * {name: string, x: int, y: real}')
>>> merge(accounts.name, z=accounts.x + accounts.y).fields
['name', 'z']

class blaze.expr.collections.Sample¶

Random row-wise sample. Can specify n or frac for an absolute or fractional number of rows, respectively.

Examples

>>> from blaze import symbol
>>> accounts = symbol('accounts', 'var * {name: string, amount: int}')
>>> accounts.sample(n=2).dshape
dshape("var * {name: string, amount: int32}")
>>> accounts.sample(frac=0.1).dshape
dshape("var * {name: string, amount: int32}")

blaze.expr.collections.sample(child, n=None, frac=None)¶

Random row-wise sample. Can specify n or frac for an absolute or fractional number of rows, respectively.

Examples

>>> from blaze import symbol
>>> accounts = symbol('accounts', 'var * {name: string, amount: int}')
>>> accounts.sample(n=2).dshape
dshape("var * {name: string, amount: int32}")
>>> accounts.sample(frac=0.1).dshape
dshape("var * {name: string, amount: int32}")

class blaze.expr.collections.Shift¶

Shift a column backward or forward by N elements

Parameters:	expr : Expr The expression to shift. This expression’s dshape should be columnar n : int The number of elements to shift by. If n < 0 then shift backward, if n == 0 do nothing, else shift forward.

blaze.expr.collections.shift(expr, n)¶

Shift a column backward or forward by N elements

Parameters:	expr : Expr The expression to shift. This expression’s dshape should be columnar n : int The number of elements to shift by. If n < 0 then shift backward, if n == 0 do nothing, else shift forward.

class blaze.expr.collections.Sort¶

Table in sorted order

Examples

>>> from blaze import symbol
>>> accounts = symbol('accounts', 'var * {name: string, amount: int}')
>>> accounts.sort('amount', ascending=False).schema
dshape("{name: string, amount: int32}")

Some backends support sorting by arbitrary rowwise tables, e.g.

>>> accounts.sort(-accounts.amount) 

blaze.expr.collections.sort(child, key=None, ascending=True)¶

Sort a collection

Parameters:

key : str, list of str, or Expr

Defines by what you want to sort.

A single column string: t.sort('amount')

A list of column strings: t.sort(['name', 'amount'])

An expression: t.sort(-t.amount)

If sorting a columnar dataset, the key is ignored, as it is not necessary:

t.amount.sort()

t.amount.sort('amount')

t.amount.sort('foobar')

are all equivalent.

ascending : bool, optional

Determines order of the sort

class blaze.expr.collections.Tail¶

Last n elements of collection

See also

blaze.expr.collections.Head

Examples

>>> from blaze import symbol
>>> accounts = symbol('accounts', 'var * {name: string, amount: int}')
>>> accounts.tail(5).dshape
dshape("5 * {name: string, amount: int32}")

blaze.expr.collections.tail(child, n=10)¶

Last n elements of collection

See also

blaze.expr.collections.Head

Examples

>>> from blaze import symbol
>>> accounts = symbol('accounts', 'var * {name: string, amount: int}')
>>> accounts.tail(5).dshape
dshape("5 * {name: string, amount: int32}")

blaze.expr.collections.transform(expr, replace=True, **kwargs)¶

Add named columns to table

Parameters:	expr : Expr A tabular expression. replace : bool, optional Should new columns be allowed to replace old columns? **kwargs The new columns to add to the table
Returns:	merged : Merge A new tabular expression with the new columns merged into the table.

See also

merge

Examples

>>> from blaze import symbol
>>> t = symbol('t', 'var * {x: int, y: int}')
>>> transform(t, z=t.x + t.y).fields
['x', 'y', 'z']

class blaze.expr.expressions.Apply¶

Apply an arbitrary Python function onto an expression

See also

blaze.expr.expressions.Map

Examples

>>> t = symbol('t', 'var * {name: string, amount: int}')
>>> h = t.apply(hash, dshape='int64')  # Hash value of resultant dataset

You must provide the datashape of the result with the dshape= keyword. For datashape examples see http://datashape.pydata.org/grammar.html#some-simple-examples

If using a chunking backend and your operation may be safely split and concatenated then add the splittable=True keyword argument

>>> t.apply(f, dshape='...', splittable=True) 

class blaze.expr.expressions.Cast¶

Cast an expression to a different type.

This is only an expression time operation.

Examples

>>> s = symbol('s', '?int64')
>>> s.cast('?int32').dshape
dshape("?int32")

# Cast to correct mislabeled optionals >>> s.cast(‘int64’).dshape dshape(“int64”)

# Cast to give concrete dimension length >>> t = symbol(‘t’, ‘var * float32’) >>> t.cast(‘10 * float32’).dshape dshape(“10 * float32”)

class blaze.expr.expressions.Coalesce¶

SQL like coalesce.

coalesce(a, b) = {
    a if a is not NULL
    b otherwise
}

Examples

>>> coalesce(1, 2)
1

>>> coalesce(1, None)
1

>>> coalesce(None, 2)
2

>>> coalesce(None, None) is None
True

class blaze.expr.expressions.Coerce¶

Coerce an expression to a different type.

Examples

>>> t = symbol('t', '100 * float64')
>>> t.coerce(to='int64')
t.coerce(to='int64')
>>> t.coerce('float32')
t.coerce(to='float32')
>>> t.coerce('int8').dshape
dshape("100 * int8")

class blaze.expr.expressions.ElemWise¶

Elementwise operation.

The shape of this expression matches the shape of the child.

class blaze.expr.expressions.Expr¶

Symbolic expression of a computation

All Blaze expressions (Join, By, Sort, …) descend from this class. It contains shared logic and syntax. It in turn inherits from Node which holds all tree traversal logic

cast(to)¶

Cast an expression to a different type.

This is only an expression time operation.

Examples

>>> s = symbol('s', '?int64')
>>> s.cast('?int32').dshape
dshape("?int32")

# Cast to correct mislabeled optionals >>> s.cast(‘int64’).dshape dshape(“int64”)

# Cast to give concrete dimension length >>> t = symbol(‘t’, ‘var * float32’) >>> t.cast(‘10 * float32’).dshape dshape(“10 * float32”)

map(func, schema=None, name=None)¶

Map an arbitrary Python function across elements in a collection

See also

blaze.expr.expresions.Apply

Examples

>>> from datetime import datetime

>>> t = symbol('t', 'var * {price: real, time: int64}')  # times as integers
>>> datetimes = t.time.map(datetime.utcfromtimestamp)

Optionally provide extra schema information

>>> datetimes = t.time.map(datetime.utcfromtimestamp,
...                           schema='{time: datetime}')

class blaze.expr.expressions.Field¶

A single field from an expression.

Get a single field from an expression with record-type schema. We store the name of the field in the _name attribute.

Examples

>>> points = symbol('points', '5 * 3 * {x: int32, y: int32}')
>>> points.x.dshape
dshape("5 * 3 * int32")

For fields that aren’t valid Python identifiers, use [] syntax:

>>> points = symbol('points', '5 * 3 * {"space station": float64}')
>>> points['space station'].dshape
dshape("5 * 3 * float64")

class blaze.expr.expressions.Label¶

An expression with a name.

See also

blaze.expr.expressions.ReLabel

Examples

>>> accounts = symbol('accounts', 'var * {name: string, amount: int}')
>>> expr = accounts.amount * 100
>>> expr._name
'amount'
>>> expr.label('new_amount')._name
'new_amount'

class blaze.expr.expressions.Map¶

Map an arbitrary Python function across elements in a collection

See also

blaze.expr.expresions.Apply

Examples

>>> from datetime import datetime

>>> t = symbol('t', 'var * {price: real, time: int64}')  # times as integers
>>> datetimes = t.time.map(datetime.utcfromtimestamp)

Optionally provide extra schema information

>>> datetimes = t.time.map(datetime.utcfromtimestamp,
...                           schema='{time: datetime}')

class blaze.expr.expressions.Projection¶

Select a subset of fields from data.

See also

blaze.expr.expressions.Field

Examples

>>> accounts = symbol('accounts',
...                   'var * {name: string, amount: int, id: int}')
>>> accounts[['name', 'amount']].schema
dshape("{name: string, amount: int32}")
>>> accounts[['name', 'amount']]
accounts[['name', 'amount']]

class blaze.expr.expressions.ReLabel¶

Table with same content but with new labels

See also

blaze.expr.expressions.Label

Notes

When names are not valid Python names, such as integers or string with spaces, you must pass a dictionary to relabel. For example

>>> s = symbol('s', 'var * {"0": int64}')
>>> s.relabel({'0': 'foo'})
s.relabel({'0': 'foo'})
>>> t = symbol('t', 'var * {"whoo hoo": ?float32}')
>>> t.relabel({"whoo hoo": 'foo'})
t.relabel({'whoo hoo': 'foo'})

Examples

>>> accounts = symbol('accounts', 'var * {name: string, amount: int}')
>>> accounts.schema
dshape("{name: string, amount: int32}")
>>> accounts.relabel(amount='balance').schema
dshape("{name: string, balance: int32}")
>>> accounts.relabel(not_a_column='definitely_not_a_column')
Traceback (most recent call last):
    ...
ValueError: Cannot relabel non-existent child fields: {'not_a_column'}
>>> s = symbol('s', 'var * {"0": int64}')
>>> s.relabel({'0': 'foo'})
s.relabel({'0': 'foo'})
>>> s.relabel(0='foo') 
Traceback (most recent call last):
    ...
SyntaxError: keyword can't be an expression

class blaze.expr.expressions.Selection¶

Filter elements of expression based on predicate

Examples

>>> accounts = symbol('accounts',
...                   'var * {name: string, amount: int, id: int}')
>>> deadbeats = accounts[accounts.amount < 0]

class blaze.expr.expressions.SimpleSelection¶: Internal selection class that does not treat the predicate as an input.

class blaze.expr.expressions.Slice¶

Elements start until stop. On many backends, a step parameter is also allowed.

Examples

>>> from blaze import symbol
>>> accounts = symbol('accounts', 'var * {name: string, amount: int}')
>>> accounts[2:7].dshape
dshape("5 * {name: string, amount: int32}")
>>> accounts[2:7:2].dshape
dshape("3 * {name: string, amount: int32}")

class blaze.expr.expressions.Symbol¶

Symbolic data. The leaf of a Blaze expression

Examples

>>> points = symbol('points', '5 * 3 * {x: int, y: int}')
>>> points
<`points` symbol; dshape='5 * 3 * {x: int32, y: int32}'>
>>> points.dshape
dshape("5 * 3 * {x: int32, y: int32}")

blaze.expr.expressions.apply(expr, func, dshape, splittable=False)¶

Apply an arbitrary Python function onto an expression

See also

blaze.expr.expressions.Map

Examples

>>> t = symbol('t', 'var * {name: string, amount: int}')
>>> h = t.apply(hash, dshape='int64')  # Hash value of resultant dataset

You must provide the datashape of the result with the dshape= keyword. For datashape examples see http://datashape.pydata.org/grammar.html#some-simple-examples

If using a chunking backend and your operation may be safely split and concatenated then add the splittable=True keyword argument

>>> t.apply(f, dshape='...', splittable=True) 

blaze.expr.expressions.cast(expr, to)¶

Cast an expression to a different type.

This is only an expression time operation.

Examples

>>> s = symbol('s', '?int64')
>>> s.cast('?int32').dshape
dshape("?int32")

# Cast to correct mislabeled optionals >>> s.cast(‘int64’).dshape dshape(“int64”)

# Cast to give concrete dimension length >>> t = symbol(‘t’, ‘var * float32’) >>> t.cast(‘10 * float32’).dshape dshape(“10 * float32”)

blaze.expr.expressions.coalesce(a, b)¶

SQL like coalesce.

coalesce(a, b) = {
    a if a is not NULL
    b otherwise
}

Examples

>>> coalesce(1, 2)
1

>>> coalesce(1, None)
1

>>> coalesce(None, 2)
2

>>> coalesce(None, None) is None
True

blaze.expr.expressions.coerce(expr, to)¶

Coerce an expression to a different type.

Examples

>>> t = symbol('t', '100 * float64')
>>> t.coerce(to='int64')
t.coerce(to='int64')
>>> t.coerce('float32')
t.coerce(to='float32')
>>> t.coerce('int8').dshape
dshape("100 * int8")

blaze.expr.expressions.drop_field(expr, field, *fields)¶

Drop a field or fields from a tabular expression.

Parameters:	expr : Expr A tabular expression to drop columns from. *fields The names of the fields to drop.
Returns:	dropped : Expr The new tabular expression with some columns missing.
Raises:	TypeError Raised when `expr` is not tabular. ValueError Raised when a column is not in the fields of `expr`.

See also

blaze.expr.expressions.projection()

blaze.expr.expressions.label(expr, lab)¶

An expression with a name.

See also

blaze.expr.expressions.ReLabel

Examples

>>> accounts = symbol('accounts', 'var * {name: string, amount: int}')
>>> expr = accounts.amount * 100
>>> expr._name
'amount'
>>> expr.label('new_amount')._name
'new_amount'

blaze.expr.expressions.ndim(expr)¶

Number of dimensions of expression

>>> symbol('s', '3 * var * int32').ndim
2

blaze.expr.expressions.projection(expr, names)¶

Select a subset of fields from data.

See also

blaze.expr.expressions.Field

Examples

>>> accounts = symbol('accounts',
...                   'var * {name: string, amount: int, id: int}')
>>> accounts[['name', 'amount']].schema
dshape("{name: string, amount: int32}")
>>> accounts[['name', 'amount']]
accounts[['name', 'amount']]

blaze.expr.expressions.relabel(child, labels=None, **kwargs)¶

Table with same content but with new labels

See also

blaze.expr.expressions.Label

Notes

When names are not valid Python names, such as integers or string with spaces, you must pass a dictionary to relabel. For example

>>> s = symbol('s', 'var * {"0": int64}')
>>> s.relabel({'0': 'foo'})
s.relabel({'0': 'foo'})
>>> t = symbol('t', 'var * {"whoo hoo": ?float32}')
>>> t.relabel({"whoo hoo": 'foo'})
t.relabel({'whoo hoo': 'foo'})

Examples

>>> accounts = symbol('accounts', 'var * {name: string, amount: int}')
>>> accounts.schema
dshape("{name: string, amount: int32}")
>>> accounts.relabel(amount='balance').schema
dshape("{name: string, balance: int32}")
>>> accounts.relabel(not_a_column='definitely_not_a_column')
Traceback (most recent call last):
    ...
ValueError: Cannot relabel non-existent child fields: {'not_a_column'}
>>> s = symbol('s', 'var * {"0": int64}')
>>> s.relabel({'0': 'foo'})
s.relabel({'0': 'foo'})
>>> s.relabel(0='foo') 
Traceback (most recent call last):
    ...
SyntaxError: keyword can't be an expression

blaze.expr.expressions.selection(table, predicate)¶

Filter elements of expression based on predicate

Examples

>>> accounts = symbol('accounts',
...                   'var * {name: string, amount: int, id: int}')
>>> deadbeats = accounts[accounts.amount < 0]

blaze.expr.expressions.symbol(name, dshape, token=None)¶

Symbolic data. The leaf of a Blaze expression

Examples

>>> points = symbol('points', '5 * 3 * {x: int, y: int}')
>>> points
<`points` symbol; dshape='5 * 3 * {x: int32, y: int32}'>
>>> points.dshape
dshape("5 * 3 * {x: int32, y: int32}")

class blaze.expr.reductions.FloatingReduction¶

class blaze.expr.reductions.Reduction¶

A column-wise reduction

Blaze supports the same class of reductions as NumPy and Pandas.

sum, min, max, any, all, mean, var, std, count, nunique

Examples

>>> from blaze import symbol
>>> t = symbol('t', 'var * {name: string, amount: int, id: int}')
>>> e = t['amount'].sum()

>>> data = [['Alice', 100, 1],
...         ['Bob', 200, 2],
...         ['Alice', 50, 3]]

>>> from blaze.compute.python import compute
>>> compute(e, data)
350

class blaze.expr.reductions.Summary¶

A collection of named reductions

Examples

>>> from blaze import symbol
>>> t = symbol('t', 'var * {name: string, amount: int, id: int}')
>>> expr = summary(number=t.id.nunique(), sum=t.amount.sum())

>>> data = [['Alice', 100, 1],
...         ['Bob', 200, 2],
...         ['Alice', 50, 1]]

>>> from blaze import compute
>>> compute(expr, data)
(2, 350)

class blaze.expr.reductions.all¶

class blaze.expr.reductions.any¶

class blaze.expr.reductions.count¶: The number of non-null elements

class blaze.expr.reductions.max¶

class blaze.expr.reductions.mean¶

class blaze.expr.reductions.min¶

class blaze.expr.reductions.nelements¶

Compute the number of elements in a collection, including missing values.

See also

blaze.expr.reductions.count: compute the number of non-null elements

Examples

>>> from blaze import symbol
>>> t = symbol('t', 'var * {name: string, amount: float64}')
>>> t[t.amount < 1].nelements()
nelements(t[t.amount < 1])

class blaze.expr.reductions.nunique¶

class blaze.expr.reductions.std¶

Standard Deviation

Parameters:	child : Expr An expression unbiased : bool, optional Compute the square root of an unbiased estimate of the population variance if this is `True`. Warning This does not return an unbiased estimate of the population standard deviation.

See also

var

class blaze.expr.reductions.sum¶

blaze.expr.reductions.summary(keepdims=False, axis=None, **kwargs)¶

A collection of named reductions

Examples

>>> from blaze import symbol
>>> t = symbol('t', 'var * {name: string, amount: int, id: int}')
>>> expr = summary(number=t.id.nunique(), sum=t.amount.sum())

>>> data = [['Alice', 100, 1],
...         ['Bob', 200, 2],
...         ['Alice', 50, 1]]

>>> from blaze import compute
>>> compute(expr, data)
(2, 350)

class blaze.expr.reductions.var¶

Variance

Parameters:	child : Expr An expression unbiased : bool, optional Compute an unbiased estimate of the population variance if this is `True`. In NumPy and pandas, this parameter is called `ddof` (delta degrees of freedom) and is equal to 1 for unbiased and 0 for biased.

blaze.expr.reductions.vnorm(expr, ord=None, axis=None, keepdims=False)¶

Vector norm

See np.linalg.norm

class blaze.expr.arrays.Transpose¶

Transpose dimensions in an N-Dimensional array

Examples

>>> x = symbol('x', '10 * 20 * int32')
>>> x.T
transpose(x)
>>> x.T.shape
(20, 10)

Specify axis ordering with axes keyword argument

>>> x = symbol('x', '10 * 20 * 30 * int32')
>>> x.transpose([2, 0, 1])
transpose(x, axes=[2, 0, 1])
>>> x.transpose([2, 0, 1]).shape
(30, 10, 20)

class blaze.expr.arrays.TensorDot¶

Dot Product: Contract and sum dimensions of two arrays

>>> x = symbol('x', '20 * 20 * int32')
>>> y = symbol('y', '20 * 30 * int32')

>>> x.dot(y)
tensordot(x, y)

>>> tensordot(x, y, axes=[0, 0])
tensordot(x, y, axes=[0, 0])

blaze.expr.arrays.dot(lhs, rhs)¶

Dot Product: Contract and sum dimensions of two arrays

>>> x = symbol('x', '20 * 20 * int32')
>>> y = symbol('y', '20 * 30 * int32')

>>> x.dot(y)
tensordot(x, y)

>>> tensordot(x, y, axes=[0, 0])
tensordot(x, y, axes=[0, 0])

blaze.expr.arrays.transpose(expr, axes=None)¶

Transpose dimensions in an N-Dimensional array

Examples

>>> x = symbol('x', '10 * 20 * int32')
>>> x.T
transpose(x)
>>> x.T.shape
(20, 10)

Specify axis ordering with axes keyword argument

>>> x = symbol('x', '10 * 20 * 30 * int32')
>>> x.transpose([2, 0, 1])
transpose(x, axes=[2, 0, 1])
>>> x.transpose([2, 0, 1]).shape
(30, 10, 20)

blaze.expr.arrays.tensordot(lhs, rhs, axes=None)¶

Dot Product: Contract and sum dimensions of two arrays

>>> x = symbol('x', '20 * 20 * int32')
>>> y = symbol('y', '20 * 30 * int32')

>>> x.dot(y)
tensordot(x, y)

>>> tensordot(x, y, axes=[0, 0])
tensordot(x, y, axes=[0, 0])

class blaze.expr.arithmetic.BinOp¶

class blaze.expr.arithmetic.UnaryOp¶

class blaze.expr.arithmetic.Arithmetic¶: Super class for arithmetic operators like add or mul

class blaze.expr.arithmetic.Add¶

op()¶: add(a, b) – Same as a + b.

class blaze.expr.arithmetic.Mult¶

op()¶: mul(a, b) – Same as a * b.

class blaze.expr.arithmetic.Repeat¶

op()¶: mul(a, b) – Same as a * b.

class blaze.expr.arithmetic.Sub¶

op()¶: sub(a, b) – Same as a - b.

class blaze.expr.arithmetic.Div¶

op()¶: truediv(a, b) – Same as a / b when __future__.division is in effect.

class blaze.expr.arithmetic.FloorDiv¶

op()¶: floordiv(a, b) – Same as a // b.

class blaze.expr.arithmetic.Pow¶

op()¶: pow(a, b) – Same as a ** b.

class blaze.expr.arithmetic.Mod¶

op()¶: mod(a, b) – Same as a % b.

class blaze.expr.arithmetic.Interp¶

op()¶: mod(a, b) – Same as a % b.

class blaze.expr.arithmetic.USub¶

op()¶: neg(a) – Same as -a.

class blaze.expr.arithmetic.Relational¶

class blaze.expr.arithmetic.Eq¶

op()¶: eq(a, b) – Same as a==b.

class blaze.expr.arithmetic.Ne¶

op()¶: ne(a, b) – Same as a!=b.

class blaze.expr.arithmetic.Ge¶

op()¶: ge(a, b) – Same as a>=b.

class blaze.expr.arithmetic.Lt¶

op()¶: lt(a, b) – Same as a<b.

class blaze.expr.arithmetic.Le¶

op()¶: le(a, b) – Same as a<=b.

class blaze.expr.arithmetic.Gt¶

op()¶: gt(a, b) – Same as a>b.

class blaze.expr.arithmetic.Gt

op(): gt(a, b) – Same as a>b.

class blaze.expr.arithmetic.And¶

op()¶: and_(a, b) – Same as a & b.

class blaze.expr.arithmetic.Or¶

op()¶: or_(a, b) – Same as a | b.

class blaze.expr.arithmetic.Not¶

op()¶: invert(a) – Same as ~a.

class blaze.expr.math.abs¶

class blaze.expr.math.sqrt¶

class blaze.expr.math.sin¶

class blaze.expr.math.sinh¶

class blaze.expr.math.cos¶

class blaze.expr.math.cosh¶

class blaze.expr.math.tan¶

class blaze.expr.math.tanh¶

class blaze.expr.math.exp¶

class blaze.expr.math.expm1¶

class blaze.expr.math.log¶

class blaze.expr.math.log10¶

class blaze.expr.math.log1p¶

class blaze.expr.math.acos¶

class blaze.expr.math.acosh¶

class blaze.expr.math.asin¶

class blaze.expr.math.asinh¶

class blaze.expr.math.atan¶

class blaze.expr.math.atanh¶

class blaze.expr.math.radians¶

class blaze.expr.math.degrees¶

class blaze.expr.math.atan2¶

class blaze.expr.math.ceil¶

class blaze.expr.math.floor¶

class blaze.expr.math.trunc¶

class blaze.expr.math.isnan¶

class blaze.expr.math.notnull¶

Return whether an expression is not null

Examples

>>> from blaze import symbol, compute
>>> s = symbol('s', 'var * int64')
>>> expr = notnull(s)
>>> expr.dshape
dshape("var * bool")
>>> list(compute(expr, [1, 2, None, 3]))
[True, True, False, True]

class blaze.expr.math.UnaryMath¶: Mathematical unary operator with real valued dshape like sin, or exp

class blaze.expr.math.BinaryMath¶

class blaze.expr.math.greatest¶

op()¶

max(iterable[, key=func]) -> value max(a, b, c, …[, key=func]) -> value

With a single iterable argument, return its largest item. With two or more arguments, return the largest argument.

class blaze.expr.math.least¶

op()¶

min(iterable[, key=func]) -> value min(a, b, c, …[, key=func]) -> value

With a single iterable argument, return its smallest item. With two or more arguments, return the smallest argument.

class blaze.expr.broadcast.Broadcast¶

Fuse scalar expressions over collections

Given elementwise operations on collections, e.g.

>>> from blaze import sin
>>> a = symbol('a', '100 * int')
>>> t = symbol('t', '100 * {x: int, y: int}')

>>> expr = sin(a) + t.y**2

It may be best to represent this as a scalar expression mapped over a collection

>>> sa = symbol('a', 'int')
>>> st = symbol('t', '{x: int, y: int}')

>>> sexpr = sin(sa) + st.y**2

>>> expr = Broadcast((a, t), (sa, st), sexpr)

This provides opportunities for optimized computation.

In practice, expressions are often collected into Broadcast expressions automatically. This class is mainly intented for internal use.

blaze.expr.broadcast.scalar_symbols(exprs)¶

Gives a sequence of scalar symbols to mirror these expressions

Examples

>>> x = symbol('x', '5 * 3 * int32')
>>> y = symbol('y', '5 * 3 * int32')

>>> xx, yy = scalar_symbols([x, y])

>>> xx._name, xx.dshape
('x', dshape("int32"))
>>> yy._name, yy.dshape
('y', dshape("int32"))

blaze.expr.broadcast.broadcast_collect(expr, broadcastable=(<class 'blaze.expr.expressions.Map'>, <class 'blaze.expr.expressions.Field'>, <class 'blaze.expr.datetime.DateTime'>, <class 'blaze.expr.arithmetic.UnaryOp'>, <class 'blaze.expr.arithmetic.BinOp'>, <class 'blaze.expr.expressions.Coerce'>, <class 'blaze.expr.collections.Shift'>, <class 'blaze.expr.strings.Like'>, <class 'blaze.expr.strings.StrCat'>), want_to_broadcast=(<class 'blaze.expr.expressions.Map'>, <class 'blaze.expr.datetime.DateTime'>, <class 'blaze.expr.arithmetic.UnaryOp'>, <class 'blaze.expr.arithmetic.BinOp'>, <class 'blaze.expr.expressions.Coerce'>, <class 'blaze.expr.collections.Shift'>, <class 'blaze.expr.strings.Like'>, <class 'blaze.expr.strings.StrCat'>), no_recurse=None)¶

Collapse expression down using Broadcast - Tabular cases only

Expressions of type Broadcastables are swallowed into Broadcast operations

>>> t = symbol('t', 'var * {x: int, y: int, z: int, when: datetime}')
>>> expr = (t.x + 2*t.y).distinct()

>>> broadcast_collect(expr)
distinct(Broadcast(_children=(t,), _scalars=(t,), _scalar_expr=t.x + (2 * t.y)))

>>> from blaze import exp
>>> expr = t.x + 2 * exp(-(t.x - 1.3) ** 2)
>>> broadcast_collect(expr)
Broadcast(_children=(t,), _scalars=(t,), _scalar_expr=t.x + (2 * (exp(-((t.x - 1.3) ** 2)))))

class blaze.expr.datetime.DateTime¶: Superclass for datetime accessors

class blaze.expr.datetime.Date¶

class blaze.expr.datetime.Year¶

class blaze.expr.datetime.Month¶

class blaze.expr.datetime.Day¶

class blaze.expr.datetime.days¶

class blaze.expr.datetime.Hour¶

class blaze.expr.datetime.Minute¶

class blaze.expr.datetime.Second¶

class blaze.expr.datetime.Millisecond¶

class blaze.expr.datetime.Microsecond¶

class blaze.expr.datetime.nanosecond¶

class blaze.expr.datetime.Date

class blaze.expr.datetime.Time¶

class blaze.expr.datetime.week¶

class blaze.expr.datetime.nanoseconds¶

class blaze.expr.datetime.seconds¶

class blaze.expr.datetime.total_seconds¶

class blaze.expr.datetime.UTCFromTimestamp¶

class blaze.expr.datetime.DateTimeTruncate¶

class blaze.expr.datetime.Ceil¶

class blaze.expr.datetime.Floor¶

class blaze.expr.datetime.Round¶

class blaze.expr.datetime.strftime¶

class blaze.expr.split_apply_combine.By¶

Split-Apply-Combine Operator

Examples

>>> from blaze import symbol
>>> t = symbol('t', 'var * {name: string, amount: int, id: int}')
>>> e = by(t['name'], total=t['amount'].sum())

>>> data = [['Alice', 100, 1],
...         ['Bob', 200, 2],
...         ['Alice', 50, 3]]

>>> from blaze.compute.python import compute
>>> sorted(compute(e, data))
[('Alice', 150), ('Bob', 200)]

blaze.expr.split_apply_combine.count_values(expr, sort=True)¶

Count occurrences of elements in this column

Sort by counts by default Add sort=False keyword to avoid this behavior.