https://trilogydata.dev/demo

TrilogyTrilogy
Thesis
Quickstart
Concepts
Installation
Demo
Transform
NLP/GenAI
Studio
Blog
GitHub open in new window
Thesis
Quickstart
Concepts
Installation
Demo
Transform
NLP/GenAI
Studio
Blog
GitHub open in new window

  * Demo - Exploring the TPC-DS Dataset

      + Our First Queries
      + The Model
      + Derived Concepts and Filtering
      + Let's run through some examples
          o Sandbox
      + Multiple Models
          o Query Merges
          o Model Merges
      + Saving Results / ETL
          o TrilogyT
      + Benefits of A Semantic Layer
          o Trilogy-NLP / GenAI
      + Query Comparisons
          o Query 3 - Brand Sales
          o Query 1 - Customer Returns

# Demo - Exploring the TPC-DS Dataset

This demo uses the popular TPD-DS dataset for transactional database
benchmarking. You can read more about Trilogy and this benchmark tool
here hereopen in new window.

No benchmarking right now, though - we just care that DuckDB can
natively produce a nice representative data warehouse for TPC-DS with
a built in extension, and that this dataset gives us a good way to
explore the language syntax.

TIP

This demo uses an in-memory DuckDB database that may take some time
to start for your first query. Trilogy is database agnostic; the same
syntax will work on any backend such as Postgres, Bigquery, or
Snowflake.

# Our First Queries

TPC-DS creates a nice small warehouse; 17 dimension tables and 7 fact
tables.

# Dimensions

call_centercatalog_pagecustomercustomer_addresscustomer_demographics
date_dimhousehold_demographicsincome_banditempromotionreasonship_mode
storetime_dimwarehouseweb_pageweb_site

# Fact Tables

catalog_salescatalog_returnsinventorystore_salesstore_returns
web_salesweb_returns

That's a lot to work with! Fortunately, someone has already defined a
Trilogy model for TPC-DS hereopen in new window; we'll be using that
directly. We'll discuss models in more detail in a little bit.

To start out, let's focus on these Trilogy imports. You'll notice
that these map closely to fact tables; that's not a coincidence. It's
typical for a user query to be centered on one or more facts, so
those are natural places to start a modeling hierarchy. Trilogy
imports let you expose many entrypoints into the same relationship of
tables to support intuitive query patterns.

TIP

We're going to hide the import statements from all the queries we run
here. We'll work in an environment with all of them loaded already

customerstore_salesweb_salescatalog_sales

Basic Select

A basic select from our customer import to see what our names and IDs
look like. Click the SQL tab to see the generated SQL.
Copy

select
    customer.id,
    customer.first_name,
    customer.last_name
limit 5;

Run 

Another View, Similar Data

The semantic model encodes relationships and supports imports; a
hierarchical selection like this - `store_sales.customer` is
referencing the customer model imported and related to the
store_sales model.
Copy

select
    store_sales.customer.id,
    store_sales.customer.first_name,
    store_sales.customer.last_name,
    store_sales.ticket_number,
limit 5;

Run 

Sales By Customer

This becomes more relevant as we start to pull in fact information.
Just like SQL, you can define new information in a query, such as
creating a total_sales calculation. All derived values must have an
explicit name and are then reusable further on in the query.
Copy

select
    store_sales.customer.id,
    store_sales.customer.first_name,
    sum(store_sales.sales_price) as total_sales
order by
    total_sales desc
limit 10;

Run 

# The Model

We mentioned earlier that we had a model defined that was lettings us
run these queries. Let's peek below the hood. Modeling is the heart
and "art" of Trilogy; defining names and relationships that enable
intuitive access to data.

Models will define some concepts, bind some concepts, and often -
import other models. While models can be 1-1 to tables and often are,
they don't need to be - and this can be useful when refactoring,
extending, or simplifying models.

TIP

A model describes the relationships between Trilogy concepts and the
underlying data. It is a contract that can be used to generate
queries.

TIP

A model definition is Trilogy - the access language and the modeling
language are the same. You can define and extend a model inline.

This definition is for the store_sales references we've been querying
above. The full tpc-ds model contains multiple files like this.

import item as item;
import date as date;
import date as return_date;
import time as time;
import time as return_time;
import customer as customer;
import customer as return_customer;
import promotion as promotion;
import customer_demographic as customer_demographic;
import store as store;
import store as return_store;


key ticket_number int;
property <ticket_number,item.id>.quantity int;
property <ticket_number,item.id>.sales_price float;
property <ticket_number,item.id>.list_price float;
property <ticket_number,item.id>.ext_sales_price float;
property <ticket_number,item.id>.ext_wholesale_cost float;
property <ticket_number,item.id>.ext_list_price float;
property <ticket_number,item.id>.ext_discount_amount float;
property <ticket_number,item.id>.coupon_amt float;
property <ticket_number,item.id>.net_profit float;
property <ticket_number,item.id>.is_returned bool;
property <ticket_number,item.id>.net_paid float;
property <ticket_number,item.id>.return_amount float?;
property <ticket_number,item.id>.return_net_loss float?;

auto profit <- ext_list_price - ext_wholesale_cost - ext_discount_amount + ext_sales_price;

datasource store_sales (
    SS_SOLD_DATE_SK: date.id,
    SS_SOLD_TIME_SK: time.id,
    SS_CUSTOMER_SK: customer.id,
    SS_CDEMO_SK: customer_demographic.id,
    SS_TICKET_NUMBER: ticket_number,
    SS_ITEM_SK: item.id,
    SS_SALES_PRICE: sales_price,
    SS_LIST_PRICE: list_price,
    SS_EXT_SALES_PRICE: ext_sales_price,
    SS_EXT_LIST_PRICE: ext_list_price,
    SS_EXT_WHOLESALE_COST: ext_wholesale_cost,
    SS_EXT_DISCOUNT_AMT: ext_discount_amount,
    SS_NET_PROFIT: net_profit,
    SS_PROMO_SK: promotion.id,
    SS_QUANTITY: quantity,
    SS_COUPON_AMT: coupon_amt,
    SS_STORE_SK: store.id,
    SS_NET_PAID: net_paid,
)
grain (ticket_number, item.id)
address memory.store_sales
;

datasource store_returns(
    SR_RETURNED_DATE_SK: return_date.id,
    SR_RETURN_TIME_SK: return_time.id,
    SR_ITEM_SK: ~item.id,
    SR_CUSTOMER_SK: return_customer.id,
    SR_RETURN_AMT: return_amount,
    SR_TICKET_NUMBER: ~ticket_number,
    SR_STORE_SK: return_store.id,
    SR_NET_LOSS: return_net_loss,
    # capture if there's a row in this table
    bool(return_time.id): is_returned,
)
grain (ticket_number, item.id)
address memory.store_returns;

# Derived Concepts and Filtering

Now that we understand the basics of selection, let's get into the
world of derivation, multi-level aggregation, and filtering.

We're going to stick with store_sales data for now.

# Let's run through some examples

State Aggregation

As an example, if we move our aggregation up to the state level, we
can reuse or total_sales and a new customer_count to calculate
average sales
Copy

select
    store_sales.customer.state,
    sum(store_sales.sales_price) as total_sales,
    count(store_sales.customer.id) as customer_count,
    total_sales / customer_count as average_sales_per_customer
order by
    average_sales_per_customer desc
limit 10;

Run 

Mixed aggregate

Often, we need to compare concepts at different level of aggregation.
Our queries so far have not specified a level, which means they are
implicitly grouped by all dimensions in the query. We can combine
different levels by explicitly aggregating "by" a concept. Tip: we've
commented out some output fields with -- to streamline the result
set; these are included in the projection but not returned in final
query.
Copy

select
    --store_sales.customer.id,
    store_sales.customer.first_name,
    --store_sales.customer.state,
    sum(store_sales.sales_price) as total_sales,
    --sum(store_sales.sales_price) by store_sales.customer.state as total_state_sales,
    total_sales / total_state_sales as fraction_of_total_state_sales
order by
    fraction_of_total_state_sales desc
limit 10;

Run 

Filtering

The where clause comes first in idiomatic Trilogy, reducing the size
of the query space before other operations are applied. (It'll accept
a where clause after the select, if you prefer that). A where
condition applies to the entire query.
Copy

WHERE store_sales.customer_demographic.education_status = 'College'
SELECT
    --store_sales.customer.id,
    store_sales.customer.first_name,
    store_sales.customer.state,
    sum(store_sales.sales_price) as total_sales,
    --sum(store_sales.sales_price) by store_sales.customer.state as total_state_sales,
    total_sales / total_state_sales as fraction_of_total_state_sales
order by
    store_sales.customer.state asc,
    fraction_of_total_state_sales desc
limit 10;


Run 

Nuanced Filter

When that's too blunt an instrument, selection can be further refined
in two ways. A filtered concept can be created by using a where
clause - "filter store_sales.sales_price where store_sales
customer_demographic.education_status = 'College'", for example - and
the output of a selection can be filtereing via the HAVING clause
(just like SQL). So to see the % of all state sales that is coming
from college educated people, we can leave the global space
unfiltered, filter sales inline, and then further refine our output
rows. Tip: the "?" syntax, eg store_sales.sales_price ?
store_sales.customer_demographic.education_status = 'College', can be
a more concise inline filtering if you're okay with shorthand!
Copy

SELECT
    store_sales.customer.state,
    store_sales.customer_demographic.education_status,
    sum(filter store_sales.sales_price where store_sales.customer_demographic.education_status = 'College') as college_sales,
    sum(store_sales.sales_price ? store_sales.customer_demographic.education_status = 'College') as college_sales_alt,
    --sum(store_sales.sales_price ) by store_sales.customer.state as total_state_sales,
    college_sales / total_state_sales as fraction_of_total_state_sales
HAVING
    store_sales.customer_demographic.education_status = 'College'
order by
    store_sales.customer.state asc,
    fraction_of_total_state_sales desc
limit 10;


Run 

Find the average per-state rank by total store sales of items

More complicated derivations can be done by creating a virtual
collection of rows [like a CTE in SQL], called a rowset. A rowset is
defined by prefixing "with <name> as" or "rowset <name> <-" before a
normal select query and can then be queried directly to access the
outputs under the alias of the rowset name. Rowsets can be thought of
as reusable CTEs, though there are some differences - defined
properties that are not used may not ever be materialized, and
rowsets can be used in more places than CTEs. You shouldn't need to
use rowsets as often as you need to use CTEs, but they're an option.
Copy


with ranked_states as
select
    store_sales.item.name,
    store_sales.customer.state,
    rank store_sales.item.name
        over store_sales.customer.state
        order by sum(store_sales.sales_price) by store_sales.item.name, store_sales.customer.state  desc
    as sales_rank;

select
    ranked_states.store_sales.item.name,
    avg(cast(ranked_states.sales_rank as int))-> avg_sales_rank,
    max(cast(ranked_states.sales_rank as int))-> max_sales_rank,
    min(cast(ranked_states.sales_rank as int))-> min_sales_rank
order by
    avg_sales_rank desc
limit 10
;

Run 

# Sandbox

Ready? Let's have you try out some queries!

TIP

For basic queries, Trilogy should be almost identical to SQL. When in
doubt, try the SQL syntax!

# Can you answer these questions? (click to show a possible answer)

  * What were the sales by year in the state of CA? >
  * What was the average yearly sales for the state of CA? >
  * Which customer demographics had the most store sales in 2001, and
    what were the sales within that demographic in hawaii? >
  * What was the most popular item bought by customers in
    massachesetts and kentucky by customers with more than 5 orders? 
    >

The following concepts are predefined for you and can be referenced
by name.

# Concept Search

There's more concepts that we can reasonably show you available.
Search this box for inspiration:

[                    ]

WARNING

Stick to querying concepts with the same root for now - eg
store_sales.x and store_sales.y. Or if you're feeling adventerous,
read on to find out how to merge models to enable cross-namespace
querying..

[select 1 -> one;    ]
[                    ]
[                    ]
[                    ]
[                    ]
Run (ctrl-enter)

# Multiple Models

We've had some fun with store sales, but what about the rest of the
dataset?

A typical Trilogy script will be based on importing existing models -
and these models may themselves have imports. But if you have two
models you want to connect, how can you do that?

There's two tools. Let's start with the one you generally don't want
to use - a query merge.

# Query Merges

Query merges look like selects, but with more than 1 select clause +
an align statement that defines how to merge the selects. Think of
this as a full outer join between the listed columns to merge the
select outputs. The where, order by, and others are identical.

This can be useful to combine disparate concepts into a single result
set row, such as show orders by ship date and delivery date in the
same row.

TIP

Only use query merges when you are aligning conceptually unrelated
topics into a tabular dataset.

SELECT 1
MERGE
SELECT 2
ALIGN <align_alias>:<select 1 concept>,<select 2 concept>
AND
<align_alias_2>:<select 1 concept>,<select 2 concept>

# Model Merges

The preferred method is to merge the model on one or more concepts.
This tells Trilogy that these two fields are the "same". For example,
if you have a sales dataset and a holidays dataset, they might have
the following fields:

Sales: 'order_date', 'ship_date', 'returned_date', 'order_id'

Holidays: 'date', '<>.holiday_name'

You want to know what sales were ordered on holidays - you would
merge the holidays date into the sales.order_date, and you could now
easily query select holiday_name, count(order_id).

If you wanted to see orders that shipped on a holiday, you'd merge
them on ship date - and if you wanted to be able to query both, you
could import the holidays dataset under two differet names and merge
them independently.

Merge on one: MERGE <concept1> into <modifiers?><concept2>

Merge on many: MERGE <namespace1>.* into <modifiers?><namespace2>.*

Let's try both out:

Query Merge

A MERGE clause can be used to combine multiple select statements.
When present, there must be an align clause that contains one or more
named 'alignment' concepts from each sub select. The output of each
select will be merged together based on these keys, and the coalesced
value will be in the new name column. This can be used both to merge
models and to combine results of multiple queries in the same model,
such as showing product orders for a month by both ship date and
order date in the same row.
Copy

SELECT
    store_sales.date.year,
    count(store_sales.ticket_number) as store_order_count
MERGE
SELECT
    web_sales.date.year,
    count(web_sales.order_number) as web_order_count
ALIGN
    report_date: store_sales.date.year, web_sales.date.year
ORDER BY
    report_date asc;

Run 

Model Merge

A standalone MERGE clause can be used to combine two (or all)
concepts across two models, enabling them to be queried seamlessly.
The syntax is merge <left> into <modifiers?><right>, where the left
and right are the concepts to be merged. The ~ symbol in this example
is an optional modifier to indicate that the values being merged in
might not be complete - eg not every date might be in
store_sales.date - and so querying for the root concept should left
outer join across to the other models..
Copy

MERGE store_sales.date.* into ~date.*;
MERGE web_sales.date.* into ~date.*;

SELECT
    date.year,
    count(web_sales.order_number) as web_order_count,
    count(store_sales.ticket_number) as store_order_count
HAVING
    web_order_count>0 or store_order_count>0
ORDER BY
    date.year asc
LIMIT 100;

Run 

TIP

Multiple merge statements can be defined between two models; queries
will merge across as many concepts as are referenced in the query.

# Saving Results / ETL

Imagine you want to create tables or save the outputs of a query to
power a dashboard.

Trilogy supports this through the persist keyword.

This keyword is used to signify that any table created by Trilogy is
effectively a cache of a given output. You've already defined
canonical sources; if we copy that data into a new table it's only
valid until the sources change.

In base Trilogy, this can be used to create or update a table, such
as on powering a dashboard.

TIP

In practice, most warehouse will have a finite set of 'roots' and a
number of caches derived from them that are refreshed on some cadence
to drive reporting, analytics, and performance. Trilogy allows you to
explicitly define and manage those caches via persist statements.

Basic Persist

The first identifier after persist is the trilogy name of the source;
the second is the resource name used in the backing system. (DuckDB
in this case)
Copy


MERGE store_sales.customer.* into customer.*;
MERGE web_sales.customer.* into customer.*;
MERGE store_sales.date.* into store_sales.date.*;
MERGE web_sales.date.* into web_sales.date.*;

PERSIST customer_fact into customer_fact from
WHERE date.year = 2000
AND customer.state = 'VT'
SELECT
    customer.id,
    date.year,
    sum(store_sales.sales_price) as total_store_sales,
    sum(web_sales.sales_price) as total_web_sales,
    total_store_sales + total_web_sales as total_sales


Run 

# TrilogyT

Interested in using Trilogy for data transformation as well as data
access?

TrilogyTopen in new window is a Trilogy integration that adds
additional keywords to support ETL workflows and integrates closely
with DBT to support a full data warehouse workflow.

You can read more in the TrilogyT docs.

# Benefits of A Semantic Layer

A semantic layer is a business representation of data and offers a
unified and consolidated view of data across an organization.

Trilogy is a semantic layer - the abstract expression of
relationships between concepts enables a consistent and coherent view
of the data, and the separation of the model from the underlying data
enables the model to evolve independently of the underlying data.

This makes Trilogy a natural fit for any case where you want to
operate at this higher level of abstraction - a popular one these
days being GenAI. It's much easier for a GenAI model to identify the
appropriate concepts for a query and let Trilogy map that to the
underlying data model; and as with a human written query this makes
the result robust to changes/evolution of your data model.

# Trilogy-NLP / GenAI

Trilogy-NLPopen in new window is a Trilogy integration that adds
natural language processing to Trilogy, enabling users to write
queries in natural language and have them translated to SQL via the
Trilogy semantic model. The simplified abstraction of a Trilogy model
is natural fit for generative AI models, bypassing potential
hallucination issues and SQL correctness challenges by focusing the
model solely on interpreting the user query and mapping that to the
higher level semantic layer.

WARNING

Trilogy-NLP is in active development and may not always return
correct results. As with other generative AI tools, experimenting
with what prompts produce the best results is recommended.

Trilogy-NLP can either be used in an interactive analytics context,
as a helper for users to write queries, or as a backend for a chatbot
or other natural language interface to a database.

You can experiment with querying the TPC-ds dataset with trilogy-NLP
below.

TIP

You can view the generated query just like with Trilogy - Trilogy-NLP
is just a different way to write the same queries, and it will always
show it's work.

[What were store sale]
[                    ]
[                    ]
[                    ]
[                    ]
Run (ctrl-enter)

# Query Comparisons

The TPC-DS benchmark comes with built in queries - we can use these
to explore how idiomatic SQL compares to Trilogy in practice.

WARNING

This section is a work in progress and will be expanded over time.

# Query 3 - Brand Sales

This is a fairly straightforward query in both languages.

# SQL

Copy

SELECT dt.d_year,
       item.i_brand_id brand_id,
       item.i_brand brand,
       sum(ss_ext_sales_price) sum_agg
FROM date_dim dt,
     store_sales,
     item
WHERE dt.d_date_sk = store_sales.ss_sold_date_sk
  AND store_sales.ss_item_sk = item.i_item_sk
  AND item.i_manufact_id = 128
  AND dt.d_moy=11
GROUP BY dt.d_year,
         item.i_brand,
         item.i_brand_id
ORDER BY dt.d_year,
         sum_agg DESC,
         brand_id
LIMIT 100;

# Trilogy

Trilogy looks very similar, without the group by.

Copy

import store_sales as store_sales;

WHERE
    store_sales.date.month_of_year=11 and store_sales.item.manufacturer_id=128
SELECT
    store_sales.date.year,
    store_sales.item.brand_id,
    store_sales.item.brand_name,
    sum(store_sales.ext_sales_price)->sum_agg
ORDER BY
    store_sales.date.year asc,
    sum_agg desc,
    store_sales.item.brand_id asc
limit 100;


# Query 1 - Customer Returns

This is a more complicated query.

# SQL

Copy

WITH customer_total_return AS
  (SELECT sr_customer_sk AS ctr_customer_sk,
          sr_store_sk AS ctr_store_sk,
          sum(sr_return_amt) AS ctr_total_return
   FROM store_returns,
        date_dim
   WHERE sr_returned_date_sk = d_date_sk
     AND d_year = 2000
   GROUP BY sr_customer_sk,
            sr_store_sk)
SELECT c_customer_id
FROM customer_total_return ctr1,
     store,
     customer
WHERE ctr1.ctr_total_return >
    (SELECT avg(ctr_total_return)*1.2
     FROM customer_total_return ctr2
     WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
  AND s_store_sk = ctr1.ctr_store_sk
  AND s_state = 'TN'
  AND ctr1.ctr_customer_sk = c_customer_sk
ORDER BY c_customer_id
LIMIT 100;

# Trilogy

Here, we'll use an inline calculation of total returns by customer /
store, and then reuse that immediately to calculate the average
customer returns by store.

Since these are newly derived aggegates, we'll filter them in the
having clause.

Copy

import store_returns as returns;

WHERE
    returns.store.state = 'TN' and
    returns.return_date.year = 2000
SELECT
    returns.customer.text_id,
    --sum(returns.return_amount)-> total_returns, # returns by store
    --returns.store.id, # this query implicity groups by text_id and store, but only returns text_id
    --avg(total_returns) by returns.store.id -> avg_store_returns # verage off the previous calc
HAVING
   total_returns > (1.2*avg_store_returns)
ORDER BY
    returns.customer.text_id asc
limit 100;