Queries¶
Queries define what data to fetch from your data sources. Dataface supports four types of queries and, in most projects, teams start with SQL. MetricFlow semantic layer queries and dbt model queries are optional enhancements that play nicely with a dbt setup—bonus points if you already leverage dbt!
- SQL queries - Direct SQL against database tables
- HTTP / API queries - Fetch JSON data from REST endpoints
- Semantic layer queries - Use dbt MetricFlow metrics and dimensions (optional, advanced)
- dbt model queries - Query dbt models directly (optional, no MetricFlow required)
Basic Query Structure¶
Every query has a unique name and defines what data to fetch. Query names must be unique within a file and across all imported files. Here is the SQL-first example that most users start with:
Important: SQL queries must explicitly specify both sql and profile fields. The profile field tells Dataface which dbt profile to use for the database connection.
queries: sales: sql: | SELECT DATE_TRUNC('month', created_at) AS month, region, SUM(revenue) AS total_revenue, COUNT(*) AS order_count FROM orders WHERE status = 'completed' AND {{ filter('region', region) }} AND {{ filter_date_range('created_at', date_range) }} GROUP BY 1, 2 profile: my_postgres # Required: dbt profile name
Query Types¶
1. SQL Queries (Baseline)¶
SQL queries require two fields:
- sql: The SQL query string (supports Jinja templating)
- profile: The dbt profile name to use for database connection
Use the pipe (|) after sql: to start a YAML literal block. Everything you indent under it becomes one multiline string, so you can paste your SQL verbatim and keep the formatting, line breaks, and indentation intact.
queries: sales: sql: | SELECT DATE_TRUNC('month', created_at) as month, region, SUM(revenue) as total_revenue, COUNT(*) as order_count FROM orders WHERE status = 'completed' AND {{ filter('region', region) }} AND {{ filter_date_range('created_at', date_range) }} GROUP BY 1, 2 ORDER BY 1 DESC profile: my_postgres # Required: dbt profile name target: prod # Optional: dbt target name (defaults to 'dev')
Database Connection via dbt Profiles¶
Dataface uses dbt's profile system for SQL database connections. This means:
- ✅ No separate connection configuration needed - Dataface reads your existing
profiles.ymlfile - ✅ Works with all dbt-supported databases - Postgres, Snowflake, BigQuery, Redshift, DuckDB, etc.
- ✅ Leverages dbt's adapter system - Handles database-specific quirks automatically
- ✅ Same credentials as dbt - Uses your existing database connections
Setting up profiles:
- Create or edit
~/.dbt/profiles.yml(orprofiles.ymlin your project directory) - Define your database connection using dbt's profile format:
# ~/.dbt/profiles.yml my_postgres: outputs: dev: type: postgres host: localhost port: 5432 user: myuser password: mypass dbname: mydb schema: public prod: type: postgres host: prod.example.com port: 5432 user: prod_user password: "{{ env_var('DB_PASSWORD') }}" dbname: analytics schema: public target: dev # Default target
- Reference the profile in your queries using the
profilefield (and optionallytarget):
queries: sales: sql: SELECT * FROM orders profile: my_postgres # Uses 'my_postgres' profile target: prod # Uses 'prod' target (defaults to 'dev' if not specified)
For more information: - See Database Connections for Dataface-specific details - See dbt's profile documentation for complete profile configuration options - See dbt adapter documentation for database-specific connection parameters
Dataface also has direct support for dbt MetricFlow semantic layer queries and dbt model queries as optional extras—use them when your dbt project already exposes metrics or models and treat it as a little reward for being a great dbt user.
Type Inference: Query types are automatically inferred from the keys present:
- sql: → SQL Query (requires profile field)
- metrics: and/or dimensions: → Semantic Layer Query
- model: and columns: → dbt Model Query
Use cases:
- Most common use case - direct SQL queries
- Full control over query logic
- Works with any dbt-supported database
- Requires dbt profile configuration (minimal setup - just profiles.yml)
2. HTTP / API Queries¶
Fetch data from any JSON API. This is useful for integrating external data sources (weather, stock prices) or calling your own services (ML models, Jupyter notebooks exposed as endpoints).
queries: prediction: type: http url: "https://api.model-service.com/predict" method: POST headers: Authorization: "Bearer {{ env.MODEL_API_KEY }}" body: features: region: "{{ region }}" date: "{{ date_range.start }}"
Type Inference: Detected by the presence of url: or explicit type: http.
Use cases: - Calling ML inference endpoints - Fetching data from 3rd party APIs - Triggering server-side workflows (if read-only)
3. Semantic Layer Queries (MetricFlow) - Optional¶
Use dbt MetricFlow metrics and dimensions (requires dbt Semantic Layer setup):
queries: semantic_sales: metrics: [total_revenue, order_count] # Semantic layer metrics dimensions: [month, region] # Semantic layer dimensions filters: region: region order_date: "{{ date_range }}"
Use cases: - Teams already using dbt's semantic layer - Consistent metric definitions across dashboards - Automatic metric calculation logic - Leveraging existing MetricFlow setup
Note: - Requires dbt Semantic Layer (MetricFlow) setup - All metric names must exist in your dbt Semantic Layer - See MetricFlow Guide for details on how it works
3. dbt Model Queries - Optional¶
Query dbt models directly (requires dbt, but no MetricFlow):
queries: model_sales: model: "ref('fct_orders')" # Reference dbt model columns: [month, region, revenue, order_count] filters: region: region created_at: "{{ date_range }}"
Use cases: - Teams using dbt models but not MetricFlow semantic layer - Direct column access without metric definitions - Simpler queries that don't need semantic layer abstraction
Note: Requires dbt setup, but not MetricFlow
Field Reference¶
HTTP / API Queries¶
queries: <query_name>: type: http # Required url: string # Required: URL (supports Jinja) method: GET | POST # Optional: Default GET headers: Record<string, string> # Optional: HTTP headers params: Record<string, string> # Optional: Query parameters body: any # Optional: JSON body (for POST)
Semantic Layer Queries¶
queries: <query_name>: # Unique identifier metrics: [string] # Required dimensions: [string] # Optional: for grouping filters: # Optional <field>: <expression> time_grain: day | week | month | quarter | year limit: number # Optional: max rows
dbt Model Queries¶
queries: <query_name>: model: string # Required: e.g., "ref('orders')" columns: [string] # Required filters: # Optional <field>: <expression> limit: number # Optional: max rows
Raw SQL Queries¶
queries: <query_name>: sql: string # Required: SQL query (supports Jinja) profile: string # Required: dbt profile name (from profiles.yml) target: string # Optional: dbt target name (defaults to 'dev')
Example:
queries: sales: sql: | SELECT * FROM orders WHERE status = 'completed' LIMIT 100 profile: my_postgres # References profile in ~/.dbt/profiles.yml target: prod # Uses 'prod' target from the profile
Database Connection via dbt Profiles:
Dataface uses dbt's profile system for SQL database connections. The profile field must match a profile name defined in your dbt profiles.yml file (typically located at ~/.dbt/profiles.yml or in your project directory).
Profile Configuration:
- Create or edit
~/.dbt/profiles.ymlwith your database connection:
my_postgres: outputs: dev: type: postgres host: localhost port: 5432 user: myuser password: mypass dbname: mydb schema: public prod: type: postgres host: prod.example.com port: 5432 user: prod_user password: "{{ env_var('DB_PASSWORD') }}" dbname: analytics schema: public target: dev # Default target
- Reference the profile in your queries:
queries: sales: sql: SELECT * FROM orders profile: my_postgres # Uses 'my_postgres' profile target: prod # Uses 'prod' target (defaults to 'dev' if not specified)
For more information: - See dbt's profile documentation for complete profile configuration options - See dbt adapter documentation for database-specific connection parameters (Postgres, Snowflake, BigQuery, Redshift, etc.) - See Database Connections for Dataface-specific details
Metrics (Semantic Layer Only)¶
Metrics are the measures you want to calculate. They come from your dbt Semantic Layer:
queries: revenue: metrics: [total_revenue] # Single metric multiple: metrics: [total_revenue, order_count, avg_order_value] # Multiple metrics
Note: Metrics only work with semantic layer queries. All metric names must exist in your dbt Semantic Layer (MetricFlow).
Dimensions¶
Dimensions are the categories you want to group by:
queries: sales_by_region: metrics: [total_revenue] dimensions: [region] # Group by region sales_by_time_and_region: metrics: [total_revenue] dimensions: [month, region] # Group by month AND region
Filters¶
Filters limit which data is included. You can reference variables or use literal values:
queries: filtered_sales: metrics: [total_revenue] dimensions: [month] filters: # Reference a variable directly region: region # Use a variable with Jinja expression order_date: "{{ date_range }}" # Literal value status: "completed" # Complex expression revenue: "{{ min_revenue }}"
See the Expressions guide for more on using variables in filters.
Time Grain¶
Time grain automatically groups time-based dimensions:
queries: monthly: metrics: [total_revenue] dimensions: [order_date] time_grain: month # Groups by month automatically
Available time grains: day, week, month, quarter, year
Query Examples¶
Simple Revenue Query¶
queries: revenue: metrics: [total_revenue] dimensions: [month]
Filtered by Variable¶
queries: regional_sales: metrics: [total_revenue, order_count] dimensions: [month, region] filters: region: region # Uses 'region' variable order_date: "{{ date_range }}"
With Time Grain¶
queries: quarterly: metrics: [total_revenue] dimensions: [order_date] time_grain: quarter filters: region: region
Multiple Metrics and Dimensions¶
queries: comprehensive: metrics: [total_revenue, order_count, avg_order_value] dimensions: [month, region, product_category] filters: order_date: "{{ date_range }}" region: region
Best Practices¶
Reuse Queries Across Charts¶
Multiple charts can reference the same query, which is more efficient:
queries: sales: metrics: [total_revenue] dimensions: [month, region] rows: - cols: - chart1: query: queries.sales # Same query type: bar x: month y: total_revenue - chart2: query: queries.sales # Same query type: line x: month y: total_revenue color: region
Use Query References for DRY SQL¶
Instead of repeating SQL, reference other queries:
# ❌ Bad: Repetitive SQL queries: north_sales: sql: | SELECT DATE_TRUNC('month', created_at) as month, SUM(revenue) as total_revenue FROM orders WHERE region = 'North' GROUP BY 1 profile: my_postgres south_sales: sql: | SELECT DATE_TRUNC('month', created_at) as month, # Repeated SUM(revenue) as total_revenue # Repeated FROM orders WHERE region = 'South' # Only difference GROUP BY 1 profile: my_postgres # ✅ Good: DRY with query references queries: base_sales: sql: | SELECT DATE_TRUNC('month', created_at) as month, SUM(revenue) as total_revenue FROM orders GROUP BY 1 profile: my_postgres north_sales: sql: | SELECT * FROM {{ queries.base_sales }} WHERE region = 'North' profile: my_postgres south_sales: sql: | SELECT * FROM {{ queries.base_sales }} WHERE region = 'South' profile: my_postgres
Use Semantic Layer When Possible¶
Semantic layer queries provide: - Consistent metric definitions - Automatic calculation logic - Better maintainability - Reusability across dashboards
Performance Considerations¶
- Use
limitwhen you don't need all rows - Filter early in queries (use
filtersfield) - Reuse queries across multiple charts
- Use
time_grainfor automatic time grouping
Naming Conventions¶
- Use descriptive names (e.g.,
sales,products) and reference with explicit namespacing (queries.sales) - Use descriptive names that indicate what data the query fetches
- Keep names lowercase with underscores
Testing and Validation¶
Validate Your Dashboard¶
After writing queries, validate your dashboard to catch errors early:
See the CLI Reference for validation options.
Compile to Check Structure¶
Compile your dashboard to see the normalized query structure:
See the CLI Reference for compilation options.
Query References¶
You can reference other queries within your SQL using Jinja template syntax. This enables CTE-style query composition where complex queries can be built from simpler building blocks.
queries: # Base query - sales by month sales_by_month: sql: | SELECT DATE_TRUNC('month', order_date) as month, SUM(amount) as sales FROM orders WHERE status = 'completed' GROUP BY 1 profile: my_postgres # Reference the base query current_month_sales: sql: | SELECT sales as current_sales FROM {{ queries.sales_by_month }} ORDER BY month DESC LIMIT 1 profile: my_postgres
How It Works¶
When you use {{ queries.query_name }}, Dataface:
1. Wraps the referenced query's SQL in parentheses
2. Substitutes it inline as a subquery
3. Detects circular dependencies
The example above becomes:
SELECT sales as current_sales
FROM (
SELECT
DATE_TRUNC('month', order_date) as month,
SUM(amount) as sales
FROM orders
WHERE status = 'completed'
GROUP BY 1
)
ORDER BY month DESC
LIMIT 1
Chained References¶
You can chain multiple query references:
queries: raw_orders: sql: | SELECT * FROM orders WHERE status = 'completed' profile: my_postgres monthly_sales: sql: | SELECT DATE_TRUNC('month', order_date) as month, SUM(amount) as sales FROM {{ queries.raw_orders }} GROUP BY 1 profile: my_postgres current_month: sql: | SELECT sales FROM {{ queries.monthly_sales }} ORDER BY month DESC LIMIT 1 profile: my_postgres
Subquery Aliases¶
Use SQL aliases when referencing queries as subqueries:
queries: base_sales: sql: | SELECT product_id, amount FROM orders profile: my_postgres top_products: sql: | SELECT product_id, SUM(amount) as total FROM {{ queries.base_sales }} AS sales GROUP BY product_id HAVING SUM(amount) > 1000 profile: my_postgres
Limitations & Notes¶
Query references are powerful - they let you build complex analyses by composing simpler queries. Just two simple limitations:
1. Top-Level Only
All query references (`{{ queries.* }}`) MUST reference top-level queries in the current dashboard's queries: section:
queries: base_query: # ✅ Top-level - can be referenced sql: SELECT * FROM orders profile: my_postgres # Import from other files and add to top level: shared_sales: _shared_queries.queries.base_sales # ✅ Top-level - can be referenced my_analysis: # ✅ Top-level - can be referenced sql: | SELECT * FROM {{ queries.shared_sales }} WHERE amount > 100 UNION ALL SELECT * FROM {{ queries.base_query }} profile: my_postgres rows: - queries: # ❌ NESTED - cannot be referenced anywhere! nested_query: sql: SELECT * FROM products profile: my_postgres
Important: Even if you import a query from another file that has internal references (like {{ queries.raw_data }}), you must import ALL referenced queries to the top level of your current dashboard for them to work.
2. SQL Only
Only SQL queries can reference other queries. HTTP, CSV, and other query types cannot use {{ queries.* }}.
Importing Queries from Other Files¶
You can reuse queries across multiple dashboards by referencing queries from other files directly in your queries: section.
Import Syntax¶
Use the format: file_name.queries.query_name
queries: # Import queries from other files base_sales: _shared_queries.queries.base_sales customer_data: _shared_queries.queries.customer_data # Use imported queries like any other query my_analysis: sql: | SELECT * FROM {{ queries.base_sales }} WHERE revenue > 1000 profile: my_postgres charts: sales_chart: query: my_analysis type: bar x: month y: revenue
Import Examples¶
Import from same directory:
queries: # _shared_queries.yml is in the same directory (underscore prefix means not rendered) orders: _shared_queries.queries.base_orders customers: _shared_queries.queries.base_customers
Import from subdirectory:
queries: # analytics/metrics.yml revenue: analytics/metrics.queries.total_revenue # ml/predictions.yml top_customers: ml/predictions.queries.predict_top_customers
Use imported queries:
queries: base_sales: _shared.queries.sales filtered_sales: sql: | SELECT * FROM {{ queries.base_sales }} WHERE region = '{{ filter("region", "US") }}' profile: my_postgres summary: sql: | SELECT COUNT(*) as total_orders, SUM(amount) as total_revenue FROM {{ queries.filtered_sales }} profile: my_postgres charts: summary_chart: query: summary type: number value: total_revenue
Best Practices¶
- Use imports for shared queries: Create
_shared_queries.ymlfiles (underscore prefix means not rendered as dashboards) for queries used across multiple dashboards - Don't overdo it: If you find yourself creating many shared query files, consider whether you need better data modeling in dbt or a proper data mart instead. Query references are for composition, not data transformation pipelines.
- Use for composition: Build complex queries from simple, reusable building blocks
- Consider CTEs: For simple cases or one-off queries, standard SQL CTEs might be clearer
- Add aliases: Always alias your subqueries for clarity (
FROM {{ queries.base }} AS base) - Test separately: Ensure referenced queries work independently before chaining them
- Keep it simple: Avoid deep chains (A → B → C → D). Two levels is usually enough
Related¶
- Variables - Using variables in filters
- Expressions - Variable references and Jinja expressions
- Charts - Visualizing query data
- CLI Reference - Validate and compile commands
- Field Reference - Complete field reference
- Troubleshooting Guide - Common query issues