https://trilogydata.dev/demo TrilogyTrilogy Thesis Quickstart Concepts Installation Demo Transform NLP/GenAI Studio Blog GitHub open in new window Thesis Quickstart Concepts Installation Demo Transform NLP/GenAI Studio Blog GitHub open in new window * Demo - Exploring the TPC-DS Dataset + Our First Queries + The Model + Derived Concepts and Filtering + Let's run through some examples o Sandbox + Multiple Models o Query Merges o Model Merges + Saving Results / ETL o TrilogyT + Benefits of A Semantic Layer o Trilogy-NLP / GenAI + Query Comparisons o Query 3 - Brand Sales o Query 1 - Customer Returns # Demo - Exploring the TPC-DS Dataset This demo uses the popular TPD-DS dataset for transactional database benchmarking. You can read more about Trilogy and this benchmark tool here hereopen in new window. No benchmarking right now, though - we just care that DuckDB can natively produce a nice representative data warehouse for TPC-DS with a built in extension, and that this dataset gives us a good way to explore the language syntax. TIP This demo uses an in-memory DuckDB database that may take some time to start for your first query. Trilogy is database agnostic; the same syntax will work on any backend such as Postgres, Bigquery, or Snowflake. # Our First Queries TPC-DS creates a nice small warehouse; 17 dimension tables and 7 fact tables. # Dimensions call_centercatalog_pagecustomercustomer_addresscustomer_demographics date_dimhousehold_demographicsincome_banditempromotionreasonship_mode storetime_dimwarehouseweb_pageweb_site # Fact Tables catalog_salescatalog_returnsinventorystore_salesstore_returns web_salesweb_returns That's a lot to work with! Fortunately, someone has already defined a Trilogy model for TPC-DS hereopen in new window; we'll be using that directly. We'll discuss models in more detail in a little bit. To start out, let's focus on these Trilogy imports. You'll notice that these map closely to fact tables; that's not a coincidence. It's typical for a user query to be centered on one or more facts, so those are natural places to start a modeling hierarchy. Trilogy imports let you expose many entrypoints into the same relationship of tables to support intuitive query patterns. TIP We're going to hide the import statements from all the queries we run here. We'll work in an environment with all of them loaded already customerstore_salesweb_salescatalog_sales Basic Select A basic select from our customer import to see what our names and IDs look like. Click the SQL tab to see the generated SQL. Copy select customer.id, customer.first_name, customer.last_name limit 5; Run Another View, Similar Data The semantic model encodes relationships and supports imports; a hierarchical selection like this - `store_sales.customer` is referencing the customer model imported and related to the store_sales model. Copy select store_sales.customer.id, store_sales.customer.first_name, store_sales.customer.last_name, store_sales.ticket_number, limit 5; Run Sales By Customer This becomes more relevant as we start to pull in fact information. Just like SQL, you can define new information in a query, such as creating a total_sales calculation. All derived values must have an explicit name and are then reusable further on in the query. Copy select store_sales.customer.id, store_sales.customer.first_name, sum(store_sales.sales_price) as total_sales order by total_sales desc limit 10; Run # The Model We mentioned earlier that we had a model defined that was lettings us run these queries. Let's peek below the hood. Modeling is the heart and "art" of Trilogy; defining names and relationships that enable intuitive access to data. Models will define some concepts, bind some concepts, and often - import other models. While models can be 1-1 to tables and often are, they don't need to be - and this can be useful when refactoring, extending, or simplifying models. TIP A model describes the relationships between Trilogy concepts and the underlying data. It is a contract that can be used to generate queries. TIP A model definition is Trilogy - the access language and the modeling language are the same. You can define and extend a model inline. This definition is for the store_sales references we've been querying above. The full tpc-ds model contains multiple files like this. import item as item; import date as date; import date as return_date; import time as time; import time as return_time; import customer as customer; import customer as return_customer; import promotion as promotion; import customer_demographic as customer_demographic; import store as store; import store as return_store; key ticket_number int; property .quantity int; property .sales_price float; property .list_price float; property .ext_sales_price float; property .ext_wholesale_cost float; property .ext_list_price float; property .ext_discount_amount float; property .coupon_amt float; property .net_profit float; property .is_returned bool; property .net_paid float; property .return_amount float?; property .return_net_loss float?; auto profit <- ext_list_price - ext_wholesale_cost - ext_discount_amount + ext_sales_price; datasource store_sales ( SS_SOLD_DATE_SK: date.id, SS_SOLD_TIME_SK: time.id, SS_CUSTOMER_SK: customer.id, SS_CDEMO_SK: customer_demographic.id, SS_TICKET_NUMBER: ticket_number, SS_ITEM_SK: item.id, SS_SALES_PRICE: sales_price, SS_LIST_PRICE: list_price, SS_EXT_SALES_PRICE: ext_sales_price, SS_EXT_LIST_PRICE: ext_list_price, SS_EXT_WHOLESALE_COST: ext_wholesale_cost, SS_EXT_DISCOUNT_AMT: ext_discount_amount, SS_NET_PROFIT: net_profit, SS_PROMO_SK: promotion.id, SS_QUANTITY: quantity, SS_COUPON_AMT: coupon_amt, SS_STORE_SK: store.id, SS_NET_PAID: net_paid, ) grain (ticket_number, item.id) address memory.store_sales ; datasource store_returns( SR_RETURNED_DATE_SK: return_date.id, SR_RETURN_TIME_SK: return_time.id, SR_ITEM_SK: ~item.id, SR_CUSTOMER_SK: return_customer.id, SR_RETURN_AMT: return_amount, SR_TICKET_NUMBER: ~ticket_number, SR_STORE_SK: return_store.id, SR_NET_LOSS: return_net_loss, # capture if there's a row in this table bool(return_time.id): is_returned, ) grain (ticket_number, item.id) address memory.store_returns; # Derived Concepts and Filtering Now that we understand the basics of selection, let's get into the world of derivation, multi-level aggregation, and filtering. We're going to stick with store_sales data for now. # Let's run through some examples State Aggregation As an example, if we move our aggregation up to the state level, we can reuse or total_sales and a new customer_count to calculate average sales Copy select store_sales.customer.state, sum(store_sales.sales_price) as total_sales, count(store_sales.customer.id) as customer_count, total_sales / customer_count as average_sales_per_customer order by average_sales_per_customer desc limit 10; Run Mixed aggregate Often, we need to compare concepts at different level of aggregation. Our queries so far have not specified a level, which means they are implicitly grouped by all dimensions in the query. We can combine different levels by explicitly aggregating "by" a concept. Tip: we've commented out some output fields with -- to streamline the result set; these are included in the projection but not returned in final query. Copy select --store_sales.customer.id, store_sales.customer.first_name, --store_sales.customer.state, sum(store_sales.sales_price) as total_sales, --sum(store_sales.sales_price) by store_sales.customer.state as total_state_sales, total_sales / total_state_sales as fraction_of_total_state_sales order by fraction_of_total_state_sales desc limit 10; Run Filtering The where clause comes first in idiomatic Trilogy, reducing the size of the query space before other operations are applied. (It'll accept a where clause after the select, if you prefer that). A where condition applies to the entire query. Copy WHERE store_sales.customer_demographic.education_status = 'College' SELECT --store_sales.customer.id, store_sales.customer.first_name, store_sales.customer.state, sum(store_sales.sales_price) as total_sales, --sum(store_sales.sales_price) by store_sales.customer.state as total_state_sales, total_sales / total_state_sales as fraction_of_total_state_sales order by store_sales.customer.state asc, fraction_of_total_state_sales desc limit 10; Run Nuanced Filter When that's too blunt an instrument, selection can be further refined in two ways. A filtered concept can be created by using a where clause - "filter store_sales.sales_price where store_sales customer_demographic.education_status = 'College'", for example - and the output of a selection can be filtereing via the HAVING clause (just like SQL). So to see the % of all state sales that is coming from college educated people, we can leave the global space unfiltered, filter sales inline, and then further refine our output rows. Tip: the "?" syntax, eg store_sales.sales_price ? store_sales.customer_demographic.education_status = 'College', can be a more concise inline filtering if you're okay with shorthand! Copy SELECT store_sales.customer.state, store_sales.customer_demographic.education_status, sum(filter store_sales.sales_price where store_sales.customer_demographic.education_status = 'College') as college_sales, sum(store_sales.sales_price ? store_sales.customer_demographic.education_status = 'College') as college_sales_alt, --sum(store_sales.sales_price ) by store_sales.customer.state as total_state_sales, college_sales / total_state_sales as fraction_of_total_state_sales HAVING store_sales.customer_demographic.education_status = 'College' order by store_sales.customer.state asc, fraction_of_total_state_sales desc limit 10; Run Find the average per-state rank by total store sales of items More complicated derivations can be done by creating a virtual collection of rows [like a CTE in SQL], called a rowset. A rowset is defined by prefixing "with as" or "rowset <-" before a normal select query and can then be queried directly to access the outputs under the alias of the rowset name. Rowsets can be thought of as reusable CTEs, though there are some differences - defined properties that are not used may not ever be materialized, and rowsets can be used in more places than CTEs. You shouldn't need to use rowsets as often as you need to use CTEs, but they're an option. Copy with ranked_states as select store_sales.item.name, store_sales.customer.state, rank store_sales.item.name over store_sales.customer.state order by sum(store_sales.sales_price) by store_sales.item.name, store_sales.customer.state desc as sales_rank; select ranked_states.store_sales.item.name, avg(cast(ranked_states.sales_rank as int))-> avg_sales_rank, max(cast(ranked_states.sales_rank as int))-> max_sales_rank, min(cast(ranked_states.sales_rank as int))-> min_sales_rank order by avg_sales_rank desc limit 10 ; Run # Sandbox Ready? Let's have you try out some queries! TIP For basic queries, Trilogy should be almost identical to SQL. When in doubt, try the SQL syntax! # Can you answer these questions? (click to show a possible answer) * What were the sales by year in the state of CA? > * What was the average yearly sales for the state of CA? > * Which customer demographics had the most store sales in 2001, and what were the sales within that demographic in hawaii? > * What was the most popular item bought by customers in massachesetts and kentucky by customers with more than 5 orders? > The following concepts are predefined for you and can be referenced by name. # Concept Search There's more concepts that we can reasonably show you available. Search this box for inspiration: [ ] WARNING Stick to querying concepts with the same root for now - eg store_sales.x and store_sales.y. Or if you're feeling adventerous, read on to find out how to merge models to enable cross-namespace querying.. [select 1 -> one; ] [ ] [ ] [ ] [ ] Run (ctrl-enter) # Multiple Models We've had some fun with store sales, but what about the rest of the dataset? A typical Trilogy script will be based on importing existing models - and these models may themselves have imports. But if you have two models you want to connect, how can you do that? There's two tools. Let's start with the one you generally don't want to use - a query merge. # Query Merges Query merges look like selects, but with more than 1 select clause + an align statement that defines how to merge the selects. Think of this as a full outer join between the listed columns to merge the select outputs. The where, order by, and others are identical. This can be useful to combine disparate concepts into a single result set row, such as show orders by ship date and delivery date in the same row. TIP Only use query merges when you are aligning conceptually unrelated topics into a tabular dataset. SELECT 1 MERGE SELECT 2 ALIGN : AND : # Model Merges The preferred method is to merge the model on one or more concepts. This tells Trilogy that these two fields are the "same". For example, if you have a sales dataset and a holidays dataset, they might have the following fields: Sales: 'order_date', 'ship_date', 'returned_date', 'order_id' Holidays: 'date', '<>.holiday_name' You want to know what sales were ordered on holidays - you would merge the holidays date into the sales.order_date, and you could now easily query select holiday_name, count(order_id). If you wanted to see orders that shipped on a holiday, you'd merge them on ship date - and if you wanted to be able to query both, you could import the holidays dataset under two differet names and merge them independently. Merge on one: MERGE into Merge on many: MERGE .* into .* Let's try both out: Query Merge A MERGE clause can be used to combine multiple select statements. When present, there must be an align clause that contains one or more named 'alignment' concepts from each sub select. The output of each select will be merged together based on these keys, and the coalesced value will be in the new name column. This can be used both to merge models and to combine results of multiple queries in the same model, such as showing product orders for a month by both ship date and order date in the same row. Copy SELECT store_sales.date.year, count(store_sales.ticket_number) as store_order_count MERGE SELECT web_sales.date.year, count(web_sales.order_number) as web_order_count ALIGN report_date: store_sales.date.year, web_sales.date.year ORDER BY report_date asc; Run Model Merge A standalone MERGE clause can be used to combine two (or all) concepts across two models, enabling them to be queried seamlessly. The syntax is merge into , where the left and right are the concepts to be merged. The ~ symbol in this example is an optional modifier to indicate that the values being merged in might not be complete - eg not every date might be in store_sales.date - and so querying for the root concept should left outer join across to the other models.. Copy MERGE store_sales.date.* into ~date.*; MERGE web_sales.date.* into ~date.*; SELECT date.year, count(web_sales.order_number) as web_order_count, count(store_sales.ticket_number) as store_order_count HAVING web_order_count>0 or store_order_count>0 ORDER BY date.year asc LIMIT 100; Run TIP Multiple merge statements can be defined between two models; queries will merge across as many concepts as are referenced in the query. # Saving Results / ETL Imagine you want to create tables or save the outputs of a query to power a dashboard. Trilogy supports this through the persist keyword. This keyword is used to signify that any table created by Trilogy is effectively a cache of a given output. You've already defined canonical sources; if we copy that data into a new table it's only valid until the sources change. In base Trilogy, this can be used to create or update a table, such as on powering a dashboard. TIP In practice, most warehouse will have a finite set of 'roots' and a number of caches derived from them that are refreshed on some cadence to drive reporting, analytics, and performance. Trilogy allows you to explicitly define and manage those caches via persist statements. Basic Persist The first identifier after persist is the trilogy name of the source; the second is the resource name used in the backing system. (DuckDB in this case) Copy MERGE store_sales.customer.* into customer.*; MERGE web_sales.customer.* into customer.*; MERGE store_sales.date.* into store_sales.date.*; MERGE web_sales.date.* into web_sales.date.*; PERSIST customer_fact into customer_fact from WHERE date.year = 2000 AND customer.state = 'VT' SELECT customer.id, date.year, sum(store_sales.sales_price) as total_store_sales, sum(web_sales.sales_price) as total_web_sales, total_store_sales + total_web_sales as total_sales Run # TrilogyT Interested in using Trilogy for data transformation as well as data access? TrilogyTopen in new window is a Trilogy integration that adds additional keywords to support ETL workflows and integrates closely with DBT to support a full data warehouse workflow. You can read more in the TrilogyT docs. # Benefits of A Semantic Layer A semantic layer is a business representation of data and offers a unified and consolidated view of data across an organization. Trilogy is a semantic layer - the abstract expression of relationships between concepts enables a consistent and coherent view of the data, and the separation of the model from the underlying data enables the model to evolve independently of the underlying data. This makes Trilogy a natural fit for any case where you want to operate at this higher level of abstraction - a popular one these days being GenAI. It's much easier for a GenAI model to identify the appropriate concepts for a query and let Trilogy map that to the underlying data model; and as with a human written query this makes the result robust to changes/evolution of your data model. # Trilogy-NLP / GenAI Trilogy-NLPopen in new window is a Trilogy integration that adds natural language processing to Trilogy, enabling users to write queries in natural language and have them translated to SQL via the Trilogy semantic model. The simplified abstraction of a Trilogy model is natural fit for generative AI models, bypassing potential hallucination issues and SQL correctness challenges by focusing the model solely on interpreting the user query and mapping that to the higher level semantic layer. WARNING Trilogy-NLP is in active development and may not always return correct results. As with other generative AI tools, experimenting with what prompts produce the best results is recommended. Trilogy-NLP can either be used in an interactive analytics context, as a helper for users to write queries, or as a backend for a chatbot or other natural language interface to a database. You can experiment with querying the TPC-ds dataset with trilogy-NLP below. TIP You can view the generated query just like with Trilogy - Trilogy-NLP is just a different way to write the same queries, and it will always show it's work. [What were store sale] [ ] [ ] [ ] [ ] Run (ctrl-enter) # Query Comparisons The TPC-DS benchmark comes with built in queries - we can use these to explore how idiomatic SQL compares to Trilogy in practice. WARNING This section is a work in progress and will be expanded over time. # Query 3 - Brand Sales This is a fairly straightforward query in both languages. # SQL Copy SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand, sum(ss_ext_sales_price) sum_agg FROM date_dim dt, store_sales, item WHERE dt.d_date_sk = store_sales.ss_sold_date_sk AND store_sales.ss_item_sk = item.i_item_sk AND item.i_manufact_id = 128 AND dt.d_moy=11 GROUP BY dt.d_year, item.i_brand, item.i_brand_id ORDER BY dt.d_year, sum_agg DESC, brand_id LIMIT 100; # Trilogy Trilogy looks very similar, without the group by. Copy import store_sales as store_sales; WHERE store_sales.date.month_of_year=11 and store_sales.item.manufacturer_id=128 SELECT store_sales.date.year, store_sales.item.brand_id, store_sales.item.brand_name, sum(store_sales.ext_sales_price)->sum_agg ORDER BY store_sales.date.year asc, sum_agg desc, store_sales.item.brand_id asc limit 100; # Query 1 - Customer Returns This is a more complicated query. # SQL Copy WITH customer_total_return AS (SELECT sr_customer_sk AS ctr_customer_sk, sr_store_sk AS ctr_store_sk, sum(sr_return_amt) AS ctr_total_return FROM store_returns, date_dim WHERE sr_returned_date_sk = d_date_sk AND d_year = 2000 GROUP BY sr_customer_sk, sr_store_sk) SELECT c_customer_id FROM customer_total_return ctr1, store, customer WHERE ctr1.ctr_total_return > (SELECT avg(ctr_total_return)*1.2 FROM customer_total_return ctr2 WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk) AND s_store_sk = ctr1.ctr_store_sk AND s_state = 'TN' AND ctr1.ctr_customer_sk = c_customer_sk ORDER BY c_customer_id LIMIT 100; # Trilogy Here, we'll use an inline calculation of total returns by customer / store, and then reuse that immediately to calculate the average customer returns by store. Since these are newly derived aggegates, we'll filter them in the having clause. Copy import store_returns as returns; WHERE returns.store.state = 'TN' and returns.return_date.year = 2000 SELECT returns.customer.text_id, --sum(returns.return_amount)-> total_returns, # returns by store --returns.store.id, # this query implicity groups by text_id and store, but only returns text_id --avg(total_returns) by returns.store.id -> avg_store_returns # verage off the previous calc HAVING total_returns > (1.2*avg_store_returns) ORDER BY returns.customer.text_id asc limit 100;