Skip to content

Schema Reference

All Duck Hunt parsers output a standardized ValidationEvent schema with 39 fields organized into logical groups.

Field Summary

Group Fields
Core event_id, tool_name, event_type
Reference Location ref_file, ref_line, ref_column, function_name
Classification status, severity, category, error_code
Content message, suggestion, log_content, structured_data
Log Tracking log_file, log_line_start, log_line_end
Test test_name, execution_time
Identity principal, origin, target, actor_type
Temporal started_at
Correlation external_id
Hierarchy hierarchy_level, scope, scope_id, scope_status, group, group_id, group_status, unit, unit_id, unit_status, subunit, subunit_id
Pattern Analysis fingerprint, similarity_score, pattern_id

Core Fields

Field Type Description
event_id BIGINT Unique identifier for each event
tool_name VARCHAR Name of the tool (pytest, eslint, make, etc.)
event_type VARCHAR Category: TEST_RESULT, BUILD_ERROR, LINT_ISSUE, etc.

Reference Location Fields

For code locations referenced in events (lint issues, test failures, stack traces). These refer to positions in source code files mentioned in the log output.

Field Type Description
ref_file VARCHAR Path to referenced source code file
ref_line INTEGER Line number in source file (-1 if not applicable)
ref_column INTEGER Column number in source file (-1 if not applicable)
function_name VARCHAR Function/method name in source code

Classification Fields

Field Type Description
status VARCHAR PASS, FAIL, ERROR, WARNING, INFO, SKIP
severity VARCHAR debug, info, warning, error, critical
category VARCHAR Domain-specific classifier
error_code VARCHAR Tool-specific error code or rule ID

Severity Levels

Severity indicates the importance/urgency of an event, ordered from lowest to highest:

Level Description Examples
debug Trace/debug information Build command output, directory changes
info Informational events Passing tests, commands executed, summaries
warning Non-fatal issues Skipped tests, deprecations, compiler warnings
error Errors and failures Test failures, build errors, lint errors
critical Fatal/severe errors Crashes, security critical, internal errors

Severity Threshold Parameter

Control which events are emitted using severity_threshold:

-- Default (threshold = 'all') - includes everything, backwards compatible
SELECT * FROM read_duck_hunt_log('build.log', 'make_error');

-- Only warnings and above
SELECT * FROM read_duck_hunt_log('test.json', 'pytest_json', severity_threshold := 'warning');

-- Only errors and critical
SELECT * FROM read_duck_hunt_log('build.log', 'make_error', severity_threshold := 'error');

-- Explicit 'all' for clarity
SELECT * FROM read_duck_hunt_log('build.log', 'make_error', severity_threshold := 'all');

Events are emitted when event.severity >= threshold. The default is 'all' (alias for 'debug') for backwards compatibility.

Status vs Severity

These fields serve different purposes:

Field Purpose Example
status Semantic outcome PASS, FAIL, SKIP
severity Importance level info, warning, error

A passing test has status='PASS' but severity='info'. A skipped test has status='SKIP' but severity='warning'.

Content Fields

Field Type Description
message VARCHAR Detailed error/warning message
suggestion VARCHAR Suggested fix or improvement
log_content VARCHAR Original raw log content for this event
structured_data VARCHAR Additional metadata (JSON or format name for delegated events)

structured_data Usage

The structured_data field serves multiple purposes:

  • JSON Metadata: Additional structured information from the parser (e.g., test configuration)
  • Delegation Format: For workflow delegation, contains the format name of the delegated parser (e.g., make_error, pytest_text). This allows filtering delegated events by tool type.

Log Tracking Fields

Track where each event originated within the source log file.

Field Type Description
log_file VARCHAR Path to the log file being parsed
log_line_start INTEGER 1-indexed line where event starts in log
log_line_end INTEGER 1-indexed line where event ends in log

Note: ref_line refers to source code line (e.g., line 15 in main.c), while log_line_start/log_line_end track position within the log file itself.

Context Extraction

When using context := N parameter, an additional context column is added containing surrounding log lines:

-- Include 3 lines before/after each event
SELECT ref_file, message, context
FROM read_duck_hunt_log('build.log', 'make_error', context := 3);

Context column type: LIST(STRUCT(line_number INT, content VARCHAR, is_event BOOL))

Field Type Description
line_number INTEGER 1-indexed line number in original log file
content VARCHAR The line content
is_event BOOLEAN TRUE if this line is part of the parsed event

Example usage:

-- Access individual context lines
SELECT context[1].line_number, context[1].content
FROM parse_duck_hunt_log(log_text, 'make_error', context := 2);

-- Filter to just the event lines
SELECT list_filter(context, x -> x.is_event) as event_lines
FROM read_duck_hunt_log('build.log', context := 3);

-- Count context lines
SELECT len(context) as total_context_lines
FROM read_duck_hunt_log('build.log', context := 5);

Note: The context column is only added when context > 0. By default (context := 0), the column is not present.

Test Fields

Field Type Description
test_name VARCHAR Full test identifier
execution_time DOUBLE Duration in milliseconds

Identity & Network Fields

For tracking actors and network activity.

Field Type Description
principal VARCHAR Actor identity (ARN, email, username)
origin VARCHAR Source (IP address, hostname, runner)
target VARCHAR Destination (IP:port, HTTP path, resource ARN)
actor_type VARCHAR Type: user, service, system, anonymous

Temporal Fields

Field Type Description
started_at VARCHAR Event timestamp (ISO format)

Correlation Fields

Field Type Description
external_id VARCHAR External correlation ID (request ID, trace ID)

Hierarchy Fields

Generic 4-level hierarchy that maps to any domain (CI/CD, K8s, cloud, tests).

Field Type Description
hierarchy_level INTEGER Hierarchy depth: 1=scope, 2=group, 3=unit, 4=subunit/delegated
scope VARCHAR Level 1: Broadest context (workflow, cluster, account, suite)
scope_id VARCHAR Unique identifier for scope
scope_status VARCHAR Status at scope level
group VARCHAR Level 2: Middle grouping (job, namespace, region, class)
group_id VARCHAR Unique identifier for group
group_status VARCHAR Status at group level
unit VARCHAR Level 3: Specific unit (step, pod, service, method)
unit_id VARCHAR Unique identifier for unit
unit_status VARCHAR Status at unit level
subunit VARCHAR Level 4: Sub-unit when needed (container, resource)
subunit_id VARCHAR Unique identifier for subunit

Hierarchy Level Values

The hierarchy_level field indicates the depth of an event in the hierarchy:

Level Name CI/CD Example Description
1 Scope Workflow Top-level container
2 Group Job Major grouping
3 Unit Step Specific work unit
4 Subunit/Delegated Compiler error Sub-unit or delegated parser output

Workflow Delegation: When workflow parsers delegate to tool parsers (e.g., make_error), the delegated events have hierarchy_level = 4. See Workflow Delegation for details.

See Field Mappings for how these map to specific domains.

Pattern Analysis Fields

For error clustering and root cause analysis.

Field Type Description
fingerprint VARCHAR Normalized event signature for pattern detection
similarity_score DOUBLE Similarity to pattern cluster centroid (0.0-1.0)
pattern_id BIGINT Assigned pattern group ID (-1 if unassigned)

How Fingerprinting Works

The fingerprint is a hash of the normalized message. Normalization removes variable content so structurally identical errors cluster together:

  • Numbers: line 15line <N>
  • Quoted strings: 'myVar'<STR>
  • File paths: /home/user/src/main.c<PATH>
  • Hex values: 0x7fff5fbff8c0<HEX>

Example: These errors get the same fingerprint:

src/main.c:10: error: 'ptr' undeclared
src/utils.c:25: error: 'counter' undeclared

Both normalize to: error: <STR> undeclared

Pattern Clustering

-- Find most common error types
SELECT
    pattern_id,
    COUNT(*) as occurrences,
    ANY_VALUE(message) as example
FROM read_duck_hunt_log('build.log', 'auto')
WHERE status = 'ERROR'
GROUP BY pattern_id
ORDER BY occurrences DESC;

Workflow-Specific Fields

The read_duck_hunt_workflow_log() function outputs additional fields beyond the standard schema:

Field Type Description
workflow_type VARCHAR Workflow system: github_actions, gitlab_ci, jenkins, docker_build, spack
hierarchy_level INTEGER Depth: 1=workflow, 2=job, 3=step, 4=delegated
parent_id VARCHAR ID of parent element in hierarchy
job_order INTEGER Job execution order (ZIP format only, -1 otherwise)
job_name VARCHAR Job name from ZIP filename (ZIP format only)

The job_order and job_name fields are populated when using github_actions_zip format to parse downloaded workflow run ZIP archives. See Workflow Formats for details.


Event Types

Event Type Description
TEST_RESULT Test pass/fail/skip
LINT_ISSUE Linting warning or error
TYPE_ERROR Type checking error
BUILD_ERROR Compilation/build error
SECURITY_FINDING Security vulnerability
MEMORY_ERROR Memory access error
MEMORY_LEAK Memory leak detected
THREAD_ERROR Threading/concurrency error
PERFORMANCE_ISSUE Performance problem
CRASH_SIGNAL Program crash/signal
DEBUG_EVENT Debugger event
SUMMARY Summary/statistics

Summary Events

Parsers emit SUMMARY events to confirm tool execution and provide aggregate statistics. This is useful when using severity_threshold to filter events—you can still see that a tool ran successfully even when filtering out individual issues.

Parsers with summary events: - pytest - "2 passed, 1 failed in 1.23s" from text output or JSON summary object - ESLint - "3 problem(s) in 2 file(s)" with file/issue counts - Flake8 - "18 issue(s) found" with total count - MyPy - "Success: no issues found" or "Found 5 errors" - Pylint - "Your code has been rated at 8.50/10"

Summary event fields: - event_type = 'summary' - status = WARNING if issues found, INFO if clean - severity = 'warning' or 'info' - message = Human-readable summary - structured_data = JSON with counts (e.g., {"passed":5,"failed":1})

Example: Filtering with summaries

-- Show only errors, but keep summary events visible
SELECT event_type, status, message
FROM read_duck_hunt_log('pytest.json', 'pytest_json', severity_threshold := 'warning')
WHERE status = 'ERROR' OR event_type = 'summary';

Status Values

Status Description
PASS Test passed, check succeeded
FAIL Test failed, check failed
ERROR Error occurred
WARNING Warning issued
INFO Informational message
SKIP Test/check skipped

Example Queries

Get all fields for an event:

SELECT * FROM read_duck_hunt_log('build.log', 'auto') LIMIT 1;

Filter by status:

SELECT ref_file, ref_line, message
FROM read_duck_hunt_log('build.log', 'auto')
WHERE status = 'ERROR';

Aggregate by pattern:

SELECT fingerprint, COUNT(*) as count
FROM read_duck_hunt_log('logs/*.log', 'auto')
WHERE fingerprint IS NOT NULL
GROUP BY fingerprint;

Workflow hierarchy:

SELECT scope, "group", unit, unit_status
FROM read_duck_hunt_workflow_log('actions.log', 'github_actions')
ORDER BY event_id;