SQL to Blaze

This page maps SQL expressions to blaze expressions.

Note

The following SQL expressions are somewhat specific to PostgreSQL, but blaze itself works with any database for which a SQLAlchemy dialect exists.

Prerequisites

If you’re interested in testing these against a PostgreSQL database, make sure you’ve executed the following code in psql session:

CREATE TABLE df (
    id BIGINT,
    amount DOUBLE PRECISION,
    name TEXT
);

On the blaze side of things, the table below assumes the following code has been executed:

>>> from blaze import symbol, by, join, concat
>>> df = symbol('df', 'var * {id: int64, amount: float64, name: string}')

Note

Certain SQL constructs such as window functions don’t directly correspond to a particular Blaze expression. Map expressions are the closest representation of window functions in Blaze.

Computation SQL Blaze
Column Arithmetic
select amount * 2 from df
df.amount * 2
Multiple Columns
select id, amount from df
df[['id', 'amount']]
Selection
select * from df where amount > 300
df[df.amount > 300]
Group By
select avg(amount) from df group by name
by(df.name, amount=df.amount.mean())
select avg(amount) from df group by name, id
by(merge(df.name, df.id),
   amount=df.amount.mean())
Join
select * from
    df inner join df2
on df.name = df2.name
join(df, df2, 'name')
Map
select amount + 1 over () from df
df.amount.map(lambda x: x + 1,
              'int64')
Relabel Columns
select
    id,
    name as alias,
    amount as dollars
 from df
df.relabel(name='alias',
           amount='dollars')
Drop duplicates
select distinct * from df
df.distinct()
select distinct(name) from df
df.name.distinct()
/* postgresql only */
select distinct on (name) * from
df order by name
Reductions
select avg(amount) from df
df.amount.mean()
select amount, count(amount)
from df group by amount
df.amount.count_values()
Concatenate
select * from df
union all
select * from df
concat(df, df)
Column Type Information
select
    column_name,
    data_type,
    character_maximum_length
from
    information_schema.columns
where
    table_name = 'df'
df.dshape
select
    column_name,
    data_type,
    character_maximum_length
from
    information_schema.columns
where
    table_name = 'df'
        and
    column_name = 'amount'
df.amount.dshape