Skip to content

Queries

Queries define what data to fetch from your data sources. Dataface supports four types of queries and, in most projects, teams start with SQL. MetricFlow semantic layer queries and dbt model queries are optional enhancements that play nicely with a dbt setup—bonus points if you already leverage dbt!

  1. SQL queries - Direct SQL against database tables
  2. HTTP / API queries - Fetch JSON data from REST endpoints
  3. Semantic layer queries - Use dbt MetricFlow metrics and dimensions (optional, advanced)
  4. dbt model queries - Query dbt models directly (optional, no MetricFlow required)

Basic Query Structure

Every query has a unique name and defines what data to fetch. Query names must be unique within a file and across all imported files. Here is the SQL-first example that most users start with:

Important: SQL queries must explicitly specify both sql and profile fields. The profile field tells Dataface which dbt profile to use for the database connection.

queries:
  sales:
    sql: |
      SELECT
        DATE_TRUNC('month', created_at) AS month,
        region,
        SUM(revenue) AS total_revenue,
        COUNT(*) AS order_count
      FROM orders
      WHERE status = 'completed'
        AND {{ filter('region', region) }}
        AND {{ filter_date_range('created_at', date_range) }}
      GROUP BY 1, 2
    profile: my_postgres  # Required: dbt profile name

Query Types

1. SQL Queries (Baseline)

SQL queries require two fields: - sql: The SQL query string (supports Jinja templating) - profile: The dbt profile name to use for database connection

Use the pipe (|) after sql: to start a YAML literal block. Everything you indent under it becomes one multiline string, so you can paste your SQL verbatim and keep the formatting, line breaks, and indentation intact.

queries:
  sales:
    sql: |
      SELECT
        DATE_TRUNC('month', created_at) as month,
        region,
        SUM(revenue) as total_revenue,
        COUNT(*) as order_count
      FROM orders
      WHERE status = 'completed'
        AND {{ filter('region', region) }}
        AND {{ filter_date_range('created_at', date_range) }}
      GROUP BY 1, 2
      ORDER BY 1 DESC
    profile: my_postgres  # Required: dbt profile name
    target: prod           # Optional: dbt target name (defaults to 'dev')

Database Connection via dbt Profiles

Dataface uses dbt's profile system for SQL database connections. This means:

  • No separate connection configuration needed - Dataface reads your existing profiles.yml file
  • Works with all dbt-supported databases - Postgres, Snowflake, BigQuery, Redshift, DuckDB, etc.
  • Leverages dbt's adapter system - Handles database-specific quirks automatically
  • Same credentials as dbt - Uses your existing database connections

Setting up profiles:

  1. Create or edit ~/.dbt/profiles.yml (or profiles.yml in your project directory)
  2. Define your database connection using dbt's profile format:
# ~/.dbt/profiles.yml
my_postgres:
  outputs:
    dev:
      type: postgres
      host: localhost
      port: 5432
      user: myuser
      password: mypass
      dbname: mydb
      schema: public
    prod:
      type: postgres
      host: prod.example.com
      port: 5432
      user: prod_user
      password: "{{ env_var('DB_PASSWORD') }}"
      dbname: analytics
      schema: public
  target: dev  # Default target
  1. Reference the profile in your queries using the profile field (and optionally target):
queries:
  sales:
    sql: SELECT * FROM orders
    profile: my_postgres  # Uses 'my_postgres' profile
    target: prod           # Uses 'prod' target (defaults to 'dev' if not specified)

For more information: - See Database Connections for Dataface-specific details - See dbt's profile documentation for complete profile configuration options - See dbt adapter documentation for database-specific connection parameters

Dataface also has direct support for dbt MetricFlow semantic layer queries and dbt model queries as optional extras—use them when your dbt project already exposes metrics or models and treat it as a little reward for being a great dbt user.

Type Inference: Query types are automatically inferred from the keys present: - sql: → SQL Query (requires profile field) - metrics: and/or dimensions: → Semantic Layer Query - model: and columns: → dbt Model Query

Use cases: - Most common use case - direct SQL queries - Full control over query logic - Works with any dbt-supported database - Requires dbt profile configuration (minimal setup - just profiles.yml)

2. HTTP / API Queries

Fetch data from any JSON API. This is useful for integrating external data sources (weather, stock prices) or calling your own services (ML models, Jupyter notebooks exposed as endpoints).

queries:
  prediction:
    type: http
    url: "https://api.model-service.com/predict"
    method: POST
    headers:
      Authorization: "Bearer {{ env.MODEL_API_KEY }}"
    body:
      features:
        region: "{{ region }}"
        date: "{{ date_range.start }}"

Type Inference: Detected by the presence of url: or explicit type: http.

Use cases: - Calling ML inference endpoints - Fetching data from 3rd party APIs - Triggering server-side workflows (if read-only)

3. Semantic Layer Queries (MetricFlow) - Optional

Use dbt MetricFlow metrics and dimensions (requires dbt Semantic Layer setup):

queries:
  semantic_sales:
    metrics: [total_revenue, order_count]  # Semantic layer metrics
    dimensions: [month, region]            # Semantic layer dimensions
    filters:
      region: region
      order_date: "{{ date_range }}"

Use cases: - Teams already using dbt's semantic layer - Consistent metric definitions across dashboards - Automatic metric calculation logic - Leveraging existing MetricFlow setup

Note: - Requires dbt Semantic Layer (MetricFlow) setup - All metric names must exist in your dbt Semantic Layer - See MetricFlow Guide for details on how it works

3. dbt Model Queries - Optional

Query dbt models directly (requires dbt, but no MetricFlow):

queries:
  model_sales:
    model: "ref('fct_orders')"     # Reference dbt model
    columns: [month, region, revenue, order_count]
    filters:
      region: region
      created_at: "{{ date_range }}"

Use cases: - Teams using dbt models but not MetricFlow semantic layer - Direct column access without metric definitions - Simpler queries that don't need semantic layer abstraction

Note: Requires dbt setup, but not MetricFlow


Field Reference

HTTP / API Queries

queries:
  <query_name>:
    type: http                     # Required
    url: string                    # Required: URL (supports Jinja)
    method: GET | POST             # Optional: Default GET
    headers: Record<string, string> # Optional: HTTP headers
    params: Record<string, string>  # Optional: Query parameters
    body: any                      # Optional: JSON body (for POST)

Semantic Layer Queries

queries:
  <query_name>:             # Unique identifier
    metrics: [string]       # Required
    dimensions: [string]    # Optional: for grouping
    filters:                # Optional
      <field>: <expression>
    time_grain: day | week | month | quarter | year
    limit: number           # Optional: max rows

dbt Model Queries

queries:
  <query_name>:
    model: string         # Required: e.g., "ref('orders')"
    columns: [string]     # Required
    filters:              # Optional
      <field>: <expression>
    limit: number         # Optional: max rows

Raw SQL Queries

queries:
  <query_name>:
    sql: string           # Required: SQL query (supports Jinja)
    profile: string       # Required: dbt profile name (from profiles.yml)
    target: string        # Optional: dbt target name (defaults to 'dev')

Example:

queries:
  sales:
    sql: |
      SELECT * FROM orders
      WHERE status = 'completed'
      LIMIT 100
    profile: my_postgres  # References profile in ~/.dbt/profiles.yml
    target: prod           # Uses 'prod' target from the profile

Database Connection via dbt Profiles:

Dataface uses dbt's profile system for SQL database connections. The profile field must match a profile name defined in your dbt profiles.yml file (typically located at ~/.dbt/profiles.yml or in your project directory).

Profile Configuration:

  1. Create or edit ~/.dbt/profiles.yml with your database connection:
my_postgres:
  outputs:
    dev:
      type: postgres
      host: localhost
      port: 5432
      user: myuser
      password: mypass
      dbname: mydb
      schema: public
    prod:
      type: postgres
      host: prod.example.com
      port: 5432
      user: prod_user
      password: "{{ env_var('DB_PASSWORD') }}"
      dbname: analytics
      schema: public
  target: dev  # Default target
  1. Reference the profile in your queries:
queries:
  sales:
    sql: SELECT * FROM orders
    profile: my_postgres  # Uses 'my_postgres' profile
    target: prod           # Uses 'prod' target (defaults to 'dev' if not specified)

For more information: - See dbt's profile documentation for complete profile configuration options - See dbt adapter documentation for database-specific connection parameters (Postgres, Snowflake, BigQuery, Redshift, etc.) - See Database Connections for Dataface-specific details


Metrics (Semantic Layer Only)

Metrics are the measures you want to calculate. They come from your dbt Semantic Layer:

queries:
  revenue:
    metrics: [total_revenue]       # Single metric

  multiple:
    metrics: [total_revenue, order_count, avg_order_value]  # Multiple metrics

Note: Metrics only work with semantic layer queries. All metric names must exist in your dbt Semantic Layer (MetricFlow).


Dimensions

Dimensions are the categories you want to group by:

queries:
  sales_by_region:
    metrics: [total_revenue]
    dimensions: [region]           # Group by region

  sales_by_time_and_region:
    metrics: [total_revenue]
    dimensions: [month, region]    # Group by month AND region

Filters

Filters limit which data is included. You can reference variables or use literal values:

queries:
  filtered_sales:
    metrics: [total_revenue]
    dimensions: [month]
    filters:
      # Reference a variable directly
      region: region

      # Use a variable with Jinja expression
      order_date: "{{ date_range }}"

      # Literal value
      status: "completed"

      # Complex expression
      revenue: "{{ min_revenue }}"

See the Expressions guide for more on using variables in filters.


Time Grain

Time grain automatically groups time-based dimensions:

queries:
  monthly:
    metrics: [total_revenue]
    dimensions: [order_date]
    time_grain: month              # Groups by month automatically

Available time grains: day, week, month, quarter, year


Query Examples

Simple Revenue Query

queries:
  revenue:
    metrics: [total_revenue]
    dimensions: [month]

Filtered by Variable

queries:
  regional_sales:
    metrics: [total_revenue, order_count]
    dimensions: [month, region]
    filters:
      region: region              # Uses 'region' variable
      order_date: "{{ date_range }}"

With Time Grain

queries:
  quarterly:
    metrics: [total_revenue]
    dimensions: [order_date]
    time_grain: quarter
    filters:
      region: region

Multiple Metrics and Dimensions

queries:
  comprehensive:
    metrics: [total_revenue, order_count, avg_order_value]
    dimensions: [month, region, product_category]
    filters:
      order_date: "{{ date_range }}"
      region: region

Best Practices

Reuse Queries Across Charts

Multiple charts can reference the same query, which is more efficient:

queries:
  sales:
    metrics: [total_revenue]
    dimensions: [month, region]

rows:
  - cols:
      - chart1:
          query: queries.sales  # Same query
          type: bar
          x: month
          y: total_revenue
      - chart2:
          query: queries.sales  # Same query
          type: line
          x: month
          y: total_revenue
          color: region

Use Query References for DRY SQL

Instead of repeating SQL, reference other queries:

# ❌ Bad: Repetitive SQL
queries:
  north_sales:
    sql: |
      SELECT
        DATE_TRUNC('month', created_at) as month,
        SUM(revenue) as total_revenue
      FROM orders
      WHERE region = 'North'
      GROUP BY 1
    profile: my_postgres

  south_sales:
    sql: |
      SELECT
        DATE_TRUNC('month', created_at) as month,  # Repeated
        SUM(revenue) as total_revenue               # Repeated
      FROM orders
      WHERE region = 'South'                        # Only difference
      GROUP BY 1
    profile: my_postgres

# ✅ Good: DRY with query references
queries:
  base_sales:
    sql: |
      SELECT
        DATE_TRUNC('month', created_at) as month,
        SUM(revenue) as total_revenue
      FROM orders
      GROUP BY 1
    profile: my_postgres

  north_sales:
    sql: |
      SELECT * FROM {{ queries.base_sales }}
      WHERE region = 'North'
    profile: my_postgres

  south_sales:
    sql: |
      SELECT * FROM {{ queries.base_sales }}
      WHERE region = 'South'
    profile: my_postgres

Use Semantic Layer When Possible

Semantic layer queries provide: - Consistent metric definitions - Automatic calculation logic - Better maintainability - Reusability across dashboards

Performance Considerations

  • Use limit when you don't need all rows
  • Filter early in queries (use filters field)
  • Reuse queries across multiple charts
  • Use time_grain for automatic time grouping

Naming Conventions

  • Use descriptive names (e.g., sales, products) and reference with explicit namespacing (queries.sales)
  • Use descriptive names that indicate what data the query fetches
  • Keep names lowercase with underscores

Testing and Validation

Validate Your Dashboard

After writing queries, validate your dashboard to catch errors early:

face validate dashboards/sales.yml

See the CLI Reference for validation options.

Compile to Check Structure

Compile your dashboard to see the normalized query structure:

face compile dashboards/sales.yml

See the CLI Reference for compilation options.


Query References

You can reference other queries within your SQL using Jinja template syntax. This enables CTE-style query composition where complex queries can be built from simpler building blocks.

queries:
  # Base query - sales by month
  sales_by_month:
    sql: |
      SELECT
        DATE_TRUNC('month', order_date) as month,
        SUM(amount) as sales
      FROM orders
      WHERE status = 'completed'
      GROUP BY 1
    profile: my_postgres

  # Reference the base query
  current_month_sales:
    sql: |
      SELECT sales as current_sales
      FROM {{ queries.sales_by_month }}
      ORDER BY month DESC
      LIMIT 1
    profile: my_postgres

How It Works

When you use {{ queries.query_name }}, Dataface: 1. Wraps the referenced query's SQL in parentheses 2. Substitutes it inline as a subquery 3. Detects circular dependencies

The example above becomes:

SELECT sales as current_sales
FROM (
  SELECT
    DATE_TRUNC('month', order_date) as month,
    SUM(amount) as sales
  FROM orders
  WHERE status = 'completed'
  GROUP BY 1
)
ORDER BY month DESC
LIMIT 1

Chained References

You can chain multiple query references:

queries:
  raw_orders:
    sql: |
      SELECT * FROM orders WHERE status = 'completed'
    profile: my_postgres

  monthly_sales:
    sql: |
      SELECT
        DATE_TRUNC('month', order_date) as month,
        SUM(amount) as sales
      FROM {{ queries.raw_orders }}
      GROUP BY 1
    profile: my_postgres

  current_month:
    sql: |
      SELECT sales
      FROM {{ queries.monthly_sales }}
      ORDER BY month DESC
      LIMIT 1
    profile: my_postgres

Subquery Aliases

Use SQL aliases when referencing queries as subqueries:

queries:
  base_sales:
    sql: |
      SELECT product_id, amount FROM orders
    profile: my_postgres

  top_products:
    sql: |
      SELECT product_id, SUM(amount) as total
      FROM {{ queries.base_sales }} AS sales
      GROUP BY product_id
      HAVING SUM(amount) > 1000
    profile: my_postgres

Limitations & Notes

Query references are powerful - they let you build complex analyses by composing simpler queries. Just two simple limitations:

1. Top-Level Only

All query references (`{{ queries.* }}`) MUST reference top-level queries in the current dashboard's queries: section:

queries:
  base_query:  # ✅ Top-level - can be referenced
    sql: SELECT * FROM orders
    profile: my_postgres

  # Import from other files and add to top level:
  shared_sales: _shared_queries.queries.base_sales  # ✅ Top-level - can be referenced

  my_analysis:  # ✅ Top-level - can be referenced
    sql: |
      SELECT * FROM {{ queries.shared_sales }} WHERE amount > 100
      UNION ALL
      SELECT * FROM {{ queries.base_query }}
    profile: my_postgres

rows:
  - queries:  # ❌ NESTED - cannot be referenced anywhere!
      nested_query:
        sql: SELECT * FROM products
        profile: my_postgres

Important: Even if you import a query from another file that has internal references (like {{ queries.raw_data }}), you must import ALL referenced queries to the top level of your current dashboard for them to work.

2. SQL Only

Only SQL queries can reference other queries. HTTP, CSV, and other query types cannot use {{ queries.* }}.

Importing Queries from Other Files

You can reuse queries across multiple dashboards by referencing queries from other files directly in your queries: section.

Import Syntax

Use the format: file_name.queries.query_name

queries:
  # Import queries from other files
  base_sales: _shared_queries.queries.base_sales
  customer_data: _shared_queries.queries.customer_data

  # Use imported queries like any other query
  my_analysis:
    sql: |
      SELECT * FROM {{ queries.base_sales }}
      WHERE revenue > 1000
    profile: my_postgres

charts:
  sales_chart:
    query: my_analysis
    type: bar
    x: month
    y: revenue

Import Examples

Import from same directory:

queries:
  # _shared_queries.yml is in the same directory (underscore prefix means not rendered)
  orders: _shared_queries.queries.base_orders
  customers: _shared_queries.queries.base_customers

Import from subdirectory:

queries:
  # analytics/metrics.yml
  revenue: analytics/metrics.queries.total_revenue

  # ml/predictions.yml
  top_customers: ml/predictions.queries.predict_top_customers

Use imported queries:

queries:
  base_sales: _shared.queries.sales

  filtered_sales:
    sql: |
      SELECT *
      FROM {{ queries.base_sales }}
      WHERE region = '{{ filter("region", "US") }}'
    profile: my_postgres

  summary:
    sql: |
      SELECT
        COUNT(*) as total_orders,
        SUM(amount) as total_revenue
      FROM {{ queries.filtered_sales }}
    profile: my_postgres

charts:
  summary_chart:
    query: summary
    type: number
    value: total_revenue

Best Practices

  1. Use imports for shared queries: Create _shared_queries.yml files (underscore prefix means not rendered as dashboards) for queries used across multiple dashboards
  2. Don't overdo it: If you find yourself creating many shared query files, consider whether you need better data modeling in dbt or a proper data mart instead. Query references are for composition, not data transformation pipelines.
  3. Use for composition: Build complex queries from simple, reusable building blocks
  4. Consider CTEs: For simple cases or one-off queries, standard SQL CTEs might be clearer
  5. Add aliases: Always alias your subqueries for clarity (FROM {{ queries.base }} AS base)
  6. Test separately: Ensure referenced queries work independently before chaining them
  7. Keep it simple: Avoid deep chains (A → B → C → D). Two levels is usually enough