Release Notes

Release 0.11.0

Release:0.11.0

New Expressions

Improved Expressions

None

New Backends

None

Improved Backends

None

Experimental Features

None

API Changes

  • The following functions were deprecated in favor of equivalent functions without the str_ name prefix:

    deprecated function replacement function
    str_len() len()
    str_upper() upper()
    str_lower() lower()
    str_cat() cat()

Bug Fixes

None

Miscellaneous

None

Release 0.10.2

Release:0.10.2

New Expressions

None

Improved Expressions

  • Adds support for any and all to the sql backend (#1511).

New Backends

None

Improved Backends

  • To allow access to the map and apply expressions in client / server interactions when in a trusted environment, new _trusted versions of the several default SerializationFormat instances were added. These trusted variants allow (de)serialization of builtin functions, NumPy functions, and Pandas functions. They are intentially kept separate from the default versions to ensure they are not accidentally enabled in untrusted environments (#1497 #1504).

Experimental Features

None

API Changes

None

Bug Fixes

  • Fixed a bug with to_tree() and slice objects. Have to change the order of cases in to_tree() to ensure slice objects are handled before lookups inside the names namespace (#1516).
  • Perform more input validation for sort() expression arguments (#1517).
  • Fixes issue with string and datetime coercions on Pandas objects (#1519 #1524).
  • Fixed a bug with isin and Selections on sql selectables (#1528).

Miscellaneous

Expression Identity Rework

Expression are now memoized by their inputs. This means that two identical expressions will always be the same object, or that a.isidentical(b) is the same as a is b. isidentical is called hundreds of thousands of times in a normal blaze workload. Moving more work to expression construction time has been shown to dramatically improve compute times when the expressions grow in complexity or size. In the past, blaze was spending linear time relative to the expression size to compare expressions because it needed to recurse through the entire expression tree but now it can do isidentical in constant time.

Users should still use a.isidentical(b) instead of a is b because we reserve the right to add more arguments or change the implementation of isidentical in the future.

Release 0.10.1

Release:0.10.1
Date:TBD

New Expressions

None

Improved Expressions

None

New Backends

None

Improved Backends

  • Blaze server’s /add endpoint was enhanced to take a more general payload (#1481).
  • Adds consistency check to blaze server at startup for YAML file and dynamic addition options (#1491).

Experimental Features

  • The str_cat() expression was added, mirroring Pandas’ Series.str.cat() API (#1496).

API Changes

None

Bug Fixes

  • The content type specification parsing was improved to accept more elaborate headers (#1490).
  • The discoverablility consistency check is done before a dataset is dynamically added to the server (#1498).

Miscellaneous

None

Release 0.10.0

Release:0.10.0
Date:TBD

New Expressions

  • The sample expression allows random sampling of rows to facilitate interactive data exploration (#1410). It is implemented for the Pandas, Dask, SQL, and Python backends.

  • Adds coalesce() expression which takes two arguments and returns the first non missing value. If both are missing then the result is missing. For example: coalesce(1, 2) == 1, coalesce(None, 1) == 1, and coalesce(None, None) == None. This is inspired by the sql function of the same name (#1409).

  • Adds cast() expression to reinterpret an expression’s dshape. This is based on C++ reinterpret_cast, or just normal C casts. For example: symbol('s', 'int32').cast('uint32').dshape == dshape('uint32'). This expression has no affect on the computation, it merely tells blaze to treat the result of the expression as the new dshape. The compute definition for cast is simply:

    @dispatch(Cast, object)
    def compute_up(expr, data, **kwargs):
        return data
    

    (#1409).

Improved Expressions

  • The test suite was expanded to validate proper expression input error handling (#1420).
  • The truncate() function was refactored to raise an exception for incorrect inputs, rather than using assertions (#1443).
  • The docstring for Merge was expanded to include examples using Label to control the ordering of the columns in the result (#1447).

New Backends

None

Improved Backends

  • Adds greatest and least support to the sql backend (#1428).
  • Generalize Field to support collections.Mapping object (#1467).

Experimental Features

  • The str_upper and str_lower expressions were added for the Pandas and SQL backends (#1462). These are marked experimental since their names are subject to change. More string methods will be added in coming versions.

API Changes

  • The strlen expression was deprecated in favor of str_len (#1462).
  • Long deprecated Table() and TableSymbol() were removed (#1441). The TableSymbol tests in test_table.py were migrated to test_symbol.py.
  • Data() has been deprecated in favor of data(). InteractiveSymbol has been deprecated and temporarily replaced by _Data. These deprecations will be in place for the 0.10 release. In the 0.11 release, _Data will be renamed to Data, calls to data() will create Data instances, and InteractiveSymbol will be removed (#1431 and #1421).
  • compute() has a new keyword argument return_type which defaults to 'native' (#1401, #1411, #1417), which preserves existing behavior. In the 0.11 release, return_type will be changed to default to 'core', which will odo non-core backends into core backends as the final step in a call to compute.
  • Due to API instability and on the recommendation of DyND developers, we removed the DyND dependency temporarily (#1379). When DyND achieves its 1.0 release, DyND will be re-incorporated into Blaze. The existing DyND support in Blaze was rudimentary and based on an egregiously outdated and buggy version of DyND. We are aware of no actual use of DyND via Blaze in practice.
  • The Expr __repr__ method’s triggering of implicit computation has been deprecated. Using this aspect of Blaze will trigger a DeprecationWarning in version 0.10, and this behavior will be replaced by a standard (boring) __repr__ implementation in version 0.11. Users can explicitly trigger a computation to see a quick view of the results of an interactive expression by means of the peek() method. By setting the use_new_repr flag to True, users can use the new (boring) __repr__ implementation in version 0.10 (#1414 and #1395).

Bug Fixes

  • The str_upper and str_lower schemas were fixed to pass through their underlying _child‘s schema to ensure option types are handled correctly (#1472).
  • Fixed a bug with Pandas’ implementation of compute_up on Broadcast expressions (#1442). Added tests for Pandas frame and series and dask dataframes on Broadcast expressions.
  • Fixed a bug with Sample on SQL backends (#1452 #1423 #1424 #1425).
  • Fixed several bugs relating to adding new datasets to blaze server instances (#1459). Blaze server will make a best effort to ensure that the added dataset is valid and loadable; if not, it will return appropriate HTTP status codes.

Miscellaneous

  • Adds logging to server compute endpoint. Includes expression being computed and total time to compute. (#1436)
  • Merged the core and all conda recipes (#1451). This simplifies the build process and makes it consistent with the single blaze package provided by the Anaconda distribution.
  • Adds a --yaml-dir option to blaze-server to indicate the server should load path-based yaml resources relative to the yaml file’s directory, not the CWD of the process (#1460).

Release 0.9.1

Release:0.9.1
Date:December 17th, 2015

New Expressions

Improved Expressions

  • The Like expression was improved to support more general Select queries that result from Join operations rather than soely ColumnElement queries (#1371 #1373).
  • Adds std and var reductions for timedelta types for sql and pandas backends (#1382).

New Backends

None

Improved Backends

  • Blaze Server no longer depends on Bokeh for CORS handling, and now uses the flask-cors third-party package (#1378).

Experimental Features

None

API Changes

None

Bug Fixes

  • Fixed a blaze-server entry point bug regarding an ambiguity between the spider() function and the :module:`~blaze.server.spider` module (#1385).
  • Fixed blaze.expr.datetime.truncate() handling for the sql backend (#1393).
  • Fix blaze.expr.core.isidentical() to check the _hashargs instead of the _args. This fixes a case that caused objects that hashed the same to not compare equal when somewhere in the tree of _args was a non hashable structure (#1387).
  • Fixed a type issue where datetime - datetime :: datetime instead of timedelta (#1382).
  • Fixed a bug that caused coerce() to fail when computing against ColumnElements. This would break coerce for many sql operations (#1382).
  • Fixed reductions over timedelta returning float (#1382).
  • Fixed interactive repr for timedelta not coercing to timedelta objects (#1382).
  • Fixed weakkeydict cache failures that were causing .dshape lookups to fail sometimes (#1399).
  • Fixed relabeling columns over selects by using reconstruct_select (:issue: 1471).

Miscellaneous

  • Removed support for Spark 1.3 (#1386) based on community consensus.
  • Added blaze.utils.literal_compile() for converting sqlalchemy expressions into sql strings with bind parameters inlined as sql literals. blaze.utils.normalize() now accepts a sqlalchemy selectable and uses literal_compile to convert into a string first (#1386).

Release 0.9.0

Release:0.9.0
Date:December 17th, 2015

New Expressions

  • Add a shift() expression for shifting data backwards or forwards by N rows (#1266).

Improved Expressions

New Backends

  • Initial support for dask.dataframe has been added, see (#1317). Please send feedback via an issue or pull request if we missed any expressions you need.

Improved Backends

  • Adds support for tail() in the sql backend (#1289).

  • Blaze Server now supports dynamically adding datasets (#1329).

  • Two new keyword only arguments are added to compute() for use when computing against a Client object:

    1. compute_kwargs: This is a dictionary to send to the server to use as keyword arguments when calling compute on the server.
    2. odo_kwargs: This is a dictionary to send to the server to use as keyword arguments when calling odo on the server.

    This extra information is completely optional and will have different meanings based on the backend of the data on the server (#1342).

  • Can now point Data() to URLs of CSVs (#1336).

Experimental Features

  • There is now support for joining tables from multiple sources. This is very experimental right now, so use it at your own risk. It currently only works with things that fit in memory (#1282).
  • Foreign columns in database tables that have foreign key relationships can now be accessed with a more concise syntax (#1192).

API Changes

  • Removed support for Python 2.6 (#1267).
  • Removed support for Python 3.3 (#1270).
  • When a CSV file consists of all strings, you must pass has_header=True when using the Data constructor (#1254).
  • Comparing date and datetime datashaped things to the empty string now raises a TypeError (#1308).
  • Like expressions behave like a predicate, and operate on columns, rather than performing the selection for you on a table (#1333, #1340).
  • blaze.server.Server.run() no longer retries binding to a new port by default. Also, positional arguments are no longer forwarded to the inner flask app’s run method. All keyword arguments not consumed by the blaze server run are still forwarded (#1316).
  • Server represents datashapes in a canonical form with consistent linebreaks for use by non-Python clients (#1361).

Bug Fixes

  • Fixed a bug where Merge expressions would unpack option types in their fields. This could cause you to have a table where expr::{a: int32} but expr.a::?int32. Note that the dotted access is an option (#1262).
  • Explicitly set Node.__slots__ and Expr.__slots__ to (). This ensures instances of slotted subclasses like Join do not have a useless empty __dict__ attribute (#1274 and #1268).
  • Fixed a bug that prevented creating a InteractiveSymbol that wrapped nan if the dshape was datetime. This now correctly coerces to NaT (#1272).
  • Fixed an issue where blaze client/server could not use isin expressions because the frozenset failed to serialize. This also added support for rich serialization over json for things like datetimes (#1255).
  • Fixed a bug where len would fail on an interactive expression whose resources were sqlalchemy objects (#1273).
  • Use aliases instead of common table expressions (CTEs) because MySQL doesn’t have CTEs (#1278).
  • Fixed a bug where we couldn’t display an empty string identifier in interactive mode (#1279).
  • Fixed a bug where comparisons with optionals that should have resulted in optionals did not (#1292).
  • Fixed a bug where Join.schema would not always be instantiated (#1288).
  • Fixed a bug where comparisons to a empty string magically converted the empty string to None (#1308).
  • Fix the retry kwarg to the blaze server. When retry is False, an exception is now properly raised if the port is in use. (#1316).
  • Fixed a bug where leaves that only appeared in the predicate of a selection would not be in scope in time to compute the predicate. This would cause whole expressions like a[a > b] to fail because b was not in scope (#1275).
  • Fix a broken test on machines that don’t allow postgres to read from the local filesystem (#1323).
  • Updated a test to reflect changes from odo #366 (#1323).
  • Fixed pickling of blaze expressions with interactive symbols (#1319).
  • Fixed repring partials in blaze expression to show keyword arguments (#1319).
  • Fixed a memory leak that would preserve the lifetime of any blaze expression that had cached an attribute access (#1335).
  • Fixed a bug where common_subexpression() gave the wrong answer (#1325, #1338).
  • BinaryMath operations without numba installed were failing (#1343).
  • win32 tests were failing for hypot and atan2 due to slight differences in numpy vs numba implementations of those functions (#1343).
  • Only start up a ThreadPool when using the h5py backend (#1347, #1331).
  • Fix return type for sum and mean reductions whose children have a Decimal dshape.

Miscellaneous

  • blaze.server.Server.run() now uses warnings.warn() instead of print when it fails to bind to a port and is retrying (#1316).
  • Make expressions (subclasses of Expr) weak referencable (:issue:`1319).
  • Memoize dshape and schema methods (#1319).
  • Use pandas.DataFrame.sort_values() with pandas version >= 0.17.0 (#1321).

Release 0.8.3

Release:0.8.3
Date:September 15, 2015

New Expressions

  • Adds Tail which acts as an opposite to head. This is exposed throught the tail() function. This returns the last n elements from a collection. (#1187)
  • Adds notnull returning an indicator of whether values in a field are null (#697, #733)

Improved Expressions

  • Distinct expressions now support an on parameter to allow distinct on a subset of columns (#1159)
  • Reduction instances are now named as their class name if their _child attribute is named '_' (#1198)
  • Join expressions now promotes the types of the fields being joined on. This allows us to join things like int32 and int64 and have the result be an int64. This also allows us to join any type a with ?a. (#1193, #1218).

New Backends

Improved Backends

  • Blaze now tries a bit harder to avoid generating ScalarSelects when using the SQL backend (#1201, #1205)
  • ReLabel expressions on the SQL backend are now flattened (#1217).

API Changes

  • Serialization format in blaze server is now passed in as a mimetype (#1176)

  • We only allow and use HTTP POST requests when sending a computation to Blaze server for consistency with the HTTP spec (#1172)

  • Allow Client objects to explicitly disable verification of ssl certificates by passing verify_ssl=False. (#1170)

  • Enable basic auth for the blaze server. The server now accepts an authorization keyword which must be a callable that accepts an object holding the username and password, or None if no auth was given and returns a bool indicating if the request should be allowed. Client objects can pass an optional auth keyword which should be a tuple of (username, password) to send to the server. (#1175)

  • We now allow Distinct expressions on ColumnElement to be more general and let things like sa.sql.elements.Label objects through (#1212)

  • Methods now take priority over field names when using attribute access for Field instances to fix a bug that prevented accessing the method at all (#1204). Here’s an example of how this works:

    >>> from blaze import symbol
    >>> t = symbol('t', 'var * {max: float64, isin: int64, count: int64}')
    >>> t['count'].max()
    t.count.max()
    >>> t.count()  # calls the count method on t
    t.count()
    >>> t.count.max()  # AttributeError
    Traceback (most recent call last):
       ...
    AttributeError: ...
    

Bug Fixes

  • Upgrade versioneer so that our version string is now PEP 440 compliant (#1171)
  • Computed columns (e.g., the result of a call to transform()) can now be accessed via standard attribute access when using the SQL backend (#1201)
  • Fixed a bug where blaze server was depending on an implementation detail of CPython regarding builtins (#1196)
  • Fixed incorrect SQL generated by count on a subquery (#1202).
  • Fixed an ImportError generated by an API change in dask.
  • Fixed an issue where columns were getting trampled if there were column name collisions in a sql join. (#1208)
  • Fixed an issue where arithmetic in a Merge expression wouldn’t work because arithmetic wasn’t defined on sa.sql.Select objects (#1207)
  • Fixed a bug where the wrong value was being passed into time() (#1213)
  • Fixed a bug in sql relabel that prevented relabeling anything that generated a subselect. (#1216)
  • Fixed a bug where methods weren’t accessible on fields with the same name (#1204)
  • Fixed a bug where optimized expressions going into a pandas group by were incorrectly assigning extra values to the child DataFrame (#1221)
  • Fixed a bug where multiple same-named operands caused incorrect scope to be constructed ultimately resulting in incorrect results on expressions like x + x + x (#1227). Thanks to @llllllllll and @jcrist for discussion around solving the issue.
  • Fixed a bug where minute() and Minute were not being exported which made them unusable from the blaze server (#1232).
  • Fixed a bug where repr was being called on data resources rather than string, which caused massive slowdowns on largish expressions running against blaze server (#1240, #1247).
  • Skip a test on Win32 + Python 3.4 and PyTables until this gets sorted out on the library side (#1251).

Miscellaneous

  • We now run tests against pandas master to catch incompatibility issues (#1243).

Release 0.8.2

Release:0.8.2
Date:July 9th, 2015

Bug Fixes

  • Fix broken sdist tarball

Release 0.8.1

Release:0.8.1
Date:July 7th, 2015

New Expressions

  • String arithmetic is now possible across the numpy and pandas backends via the + (concatenation) and * (repeat) operators (#1058).
  • Datetime arithmetic is now available (#1112).
  • Add a Concat expression that implements Union-style operations (#1128).
  • Add a Coerce expression that casts expressions to a different datashape. This maps to astype in numpy and cast in SQL (#1137).

Improved Expressions

  • ReLabel expressions repr differently depending on whether the existing column names are valid Python variable names (#1070).

New Backends

None

Improved Backends

  • In-memory merges of CSV files are now possible (#1121).
  • Tie blueprint registration to data registration (#1061).
  • Don’t catch import error when flask doesn’t exist, since blaze does this in its __init__.py (#1087).
  • Multiple serialization formats including JSON, pickle, and msgpack are now available. Additionally, one can add custom serialization formats with this implementation (#1102, #1122).
  • Add a 'names' field to the response of the compute.<format> route for Bokeh compatibility (#1129).
  • Add cross origin resource sharing for Bokeh compatibility (#1134).
  • Add a command line interface (#1115).
  • Add a way to tell the blaze server command line interface what to server via a YAML file (#1115).
  • Use aliases to allow expressions on the SQL backend that involve a multiple step reduction operation (#1066, #1126).
  • Fix unary not operator ~ (#1091).
  • Postgres uses == to compare NaN so we do it that way as well for the postgresql backend (#1123).
  • Find table inside non-default schema when serving off a SQLAlchemy MetaData object (#1145).

API Changes

  • Remove old ExprClient(). Use Client instead (#1154).

  • Make sort + slice and sort + slice semantics of the SQL backend reflect those of numpy (#1125).

  • The following syntax is no longer allowed for Blaze server (#1154):

    >>> Data('blaze://localhost::accounts')  # raises an error 
    

    Use this syntax instead:

    >>> Data('blaze://localhost').accounts  # valid 
    

Bug Fixes

  • Handle SQLAlchemy API churn around reference of ColumnElement objects in the 1.0.x series (#1071, #1076).
  • Obscure hashing bug when passing in both a pandas Timestamp and a datetime.datetime object. Both objects hash to the same value but don’t necessarily compare equal; this makes Python call __eq__ which caused an Eq expression to be constructed (#1097).
  • Properly handle And expressions that involve the same field in MongoDB (#1099).
  • Handle Dask API changes (#1114).
  • Use the date function in SQLAlchemy when getting the date attribute of a datetime dshaped expression. Previously this was calling extract, which is incorrect for the postgres backend (#1120).
  • Fix API compatibility with different versions of psutil (#1136).
  • Use explicit int64 comparisons on Windows, since the default values may be different (#1148).
  • Fix name attribute propagation in pandas Series objects (#1152).
  • Raise a more informative error when trying to subset with an unsupported expression in the MongoDB backend (#1155).

Release 0.7.3

  • General maturation of many backends through use.
  • Renamed into to odo

Release 0.7.0

  • Pull out data migration utilities to into project
  • Out-of-core CSV support now depends on chunked pandas computation
  • h5py and bcolz backends support multi-threading/processing
  • Remove data directory including SQL, HDF5 objects. Depend on standard types within other projects instead (e.g. sqlalchemy.Table, h5py.Dataset, ...)
  • Better support SQL nested queries for complex queries
  • Support databases, h5py files, servers as first class datasets

Release 0.6.6

  • Not intended for public use, mostly for internal build systems
  • Bugfix

Release 0.6.5

  • Improve uri string handling #715
  • Various bug fixes #715

Release 0.6.4

  • Back CSV with pandas.read_csv. Better performance and more robust unicode support but less robust missing value support (some regressions) #597
  • Much improved SQL support #626 #650 #652 #662
  • Server supports remote execution of computations, not just indexing #631
  • Better PyTables and datetime support #608 #639
  • Support SparkSQL #592

Release 0.6.3

  • by takes only two arguments, the grouper and apply child is inferred using common_subexpression
  • Better handling of pandas Series object
  • Better printing of empty results in interactive mode
  • Regex dispatched resource function bound to Table, e.g.
    Table('/path/to/file.csv')

Release 0.6.2

  • Efficient CSV to SQL migration using native tools #454
  • Dispatched drop and create_index functions #495
  • DPlyr interface at blaze.api.dplyr. #484
  • Various bits borrowed from that interface
    • transform function adopted to main namespace
    • Summary object for named reductions
    • Keyword syntax in by and merge e.g. by(t, t.col, label=t.col2.max(), label2=t.col2.min())
  • New Computation Server #527
  • Better PyTables support #487 #496 #526

Release 0.6.1

  • More consistent behavior of into
  • bcolz backend
  • Control namespace leakage

Release 0.6

  • Nearly complete rewrite
  • Add abstract table expression system
  • Translate expressions onto a variety of backends
  • Support Python, NumPy, Pandas, h5py, sqlalchemy, pyspark, PyTables, pymongo

Release 0.5

  • HDF5 in catalog.
  • Reductions like any, all, sum, product, min, max.
  • Datetime design and some initial functionality.
  • Change how Storage and ddesc works.
  • Some preliminary rolling window code.
  • Python 3.4 now in the test harness.

Release 0.4.2

  • Fix bug for compatibility with numba 0.12
  • Add sql formats
  • Add hdf5 formats
  • Add support for numpy ufunc operators

Release 0.4.1

  • Fix bug with compatibility for numba 0.12

Release 0.4

  • Split the datashape and blz modules out.
  • Add catalog and server for blaze arrays.
  • Add remote arrays.
  • Add csv and json persistence formats.
  • Add python3 support
  • Add scidb interface

Release 0.3

  • Solidifies the execution subsystem around an IR based on the pykit project, as well as a ckernel abstraction at the ABI level.
  • Supports ufuncs running on ragged array data.
  • Cleans out previous low level data descriptor code, the data descriptor will have a higher level focus.
  • Example out of core groupby operation using BLZ.

Release 0.2

  • Brings in dynd as a required dependency for in-memory data.

Release 0.1

  • Initial preview release