https://fugue-tutorials.readthedocs.io/en/latest/tutorials/fugue_sql/index.html

[ ] [ ]
Hide navigation sidebar
Hide table of contents sidebar
Toggle site navigation sidebar
 
Fugue Tutorials
Toggle Light / Dark / Auto color theme
Toggle table of contents sidebar
 
Logo
Fugue Tutorials
[                    ] 
Tutorials

  * Getting Started[ ]
    Toggle child pages in navigation
      + Introduction
      + Decoupling Logic and Execution
      + Fugue Interface
      + Joins
      + Extensions
      + Distributed Compute
      + FugueSQL
  * Extensions[ ]
    Toggle child pages in navigation
      + Creator
      + Processor
      + Outputter
      + Transformer
      + CoTransformer
      + Output Transformer (Advanced)
      + Output CoTransformer (Advanced)
      + Interfaceless
  * FugueSQL[*]
    Toggle child pages in navigation
      + Syntax
      + Operators
      + Using Python
      + Extensions
      + FugueSQL and Dask-sql
  * Deep Dive[ ]
    Toggle child pages in navigation
      + Execution Graph (DAG)
      + Fugue Configurations
      + Execution Engine
      + Extension Input Data Validation
      + Data Type, Schema & DataFrames
      + Partitioning
      + Checkpoint Deep Dive
      + Callbacks From Transformers To Driver
      + X-like Objects

Applications

  * Examples[ ]
    Toggle child pages in navigation
      + Stock Sentiment Analysis (Preprocessing)
      + COVID19 Data Exploration with FugueSQL
  * Applications[ ]
    Toggle child pages in navigation
      + Data Validation
      + Using databricks-connect

Further Information

  * Resources
  * Appendix[ ]
    Toggle child pages in navigation
      + Fugue and PyArrow Types

Toggle Light / Dark / Auto color theme
Toggle table of contents sidebar

FugueSQLP

All questions are welcome in the Slack channel.

Slack Status

FugueSQL is designed for heavy SQL users to extend the boundaries of
traditional SQL workflows. FugueSQL allows the expression of logic
for end-to-end distributed computing workflows. It can also be
combined with Python code to use custom functions alongside the SQL
commands. It provides a unified interface, allowing the same SQL code
to run on Pandas, Dask, and Spark.

The SQL code is parsed with ANTLR and mapped to the equivalent
functions in the Fugue programming interface.

1. InstallationP

In order to use FugueSQL, you first need to make sure you have
installed the sql extra

pip install fugue[sql]

To run on Spark or Dask execution engines, install the appropriate
extras. Alternatively, all can be used as an extra.

pip install fugue[sql, spark]
pip install fugue[sql, dask]
pip install fugue[all]

2. FugueSQL SyntaxP

Get started with FugueSQL. This shows input and output of data,
enhancements over standard SQL, and how to use SQL to describe
computation logic. After this, users will be able to use FugueSQL
with the familiar SQL keywords to perform operations on top of Pandas
, Spark, and Dask.

3. Additional SQL OperatorsP

Go over the implemented operations that Fugue has on top of the ones
provided by standard SQL. FugueSQL is extensible with Python code,
but the most common functions are added as built-ins. These include
filling NULL values, dropping NULL values, renaming columns, changing
schema, etc. This section goes over the most used additional
keywords.

4. Integrating PythonP

Explore Jinja templating for variable passing, and using a Python
functions as a Transformer in a %%fsql cell.

5. Using Other Fugue ExtensionsP

The Transformer is just one of many possible Fugue extensions. In
this section we'll explore the syntax of all the other Fugue
extensions: Creator, Processor, Outputter, and CoTransformer.

6. FugueSQL with PandasP

%%fsql takes in the NativeExecutionEngine as a default parameter.
This engine runs on Pandas. All of the SQL operations have
equivalents in Pandas, but the behavior can be inconsistent
sometimes. For example, Pandas will drop NULL values by default in a
groupby operation. The NativeExecutionEngine was designed to mostly
make operations consistent with Spark and SQL.

7. FugueSQL with DaskP

Fugue and dask-sql are collaborating to have our solutions converge
and bring the SQL interface for Dask. Currently, dask-sql is faster
on average, while FugueSQL is more complete in terms of SQL keywords
implemented. Conveniently, our solutions can be used together to
bring the best of both worlds. This is done by using dask-sql as the
underlying execution engine of the FugueSQLWorkflow context manager.

8. FugueSQL with SparkP

FugueSQL also works on Spark by passing in the execution engine. This
looks like %%fsql spark. The operations are mapped to Spark and Spark
SQL operations. The difference is FugueSQL has added functionality
for syntax compared to SparkSQL as seen in the syntax tutorial.
Additionally with FugueSQL, the same code will execute on Pandas and
Dask without modification. This allows for quick testing without
having to spin up a cluster. Users prototype with the 
NativeExecutionEngine, and then move to the Spark cluster by changing
the execution engine.

 
Next
Syntax
 
Previous
Interfaceless
Copyright (c) 2021, The Fugue Development Team | Built with Sphinx and
@pradyunsg's Furo theme. | Show Source
Contents

  * FugueSQL
      + 1. Installation
      + 2. FugueSQL Syntax
      + 3. Additional SQL Operators
      + 4. Integrating Python
      + 5. Using Other Fugue Extensions
      + 6. FugueSQL with Pandas
      + 7. FugueSQL with Dask
      + 8. FugueSQL with Spark