Schema Reference¶
All Duck Hunt parsers output a standardized ValidationEvent schema with 39 fields organized into logical groups.
Field Summary¶
| Group | Fields |
|---|---|
| Core | event_id, tool_name, event_type |
| Reference Location | ref_file, ref_line, ref_column, function_name |
| Classification | status, severity, category, error_code |
| Content | message, suggestion, log_content, structured_data |
| Log Tracking | log_file, log_line_start, log_line_end |
| Test | test_name, execution_time |
| Identity | principal, origin, target, actor_type |
| Temporal | started_at |
| Correlation | external_id |
| Hierarchy | hierarchy_level, scope, scope_id, scope_status, group, group_id, group_status, unit, unit_id, unit_status, subunit, subunit_id |
| Pattern Analysis | fingerprint, similarity_score, pattern_id |
Core Fields¶
| Field | Type | Description |
|---|---|---|
event_id |
BIGINT | Unique identifier for each event |
tool_name |
VARCHAR | Name of the tool (pytest, eslint, make, etc.) |
event_type |
VARCHAR | Category: TEST_RESULT, BUILD_ERROR, LINT_ISSUE, etc. |
Reference Location Fields¶
For code locations referenced in events (lint issues, test failures, stack traces). These refer to positions in source code files mentioned in the log output.
| Field | Type | Description |
|---|---|---|
ref_file |
VARCHAR | Path to referenced source code file |
ref_line |
INTEGER | Line number in source file (-1 if not applicable) |
ref_column |
INTEGER | Column number in source file (-1 if not applicable) |
function_name |
VARCHAR | Function/method name in source code |
Classification Fields¶
| Field | Type | Description |
|---|---|---|
status |
VARCHAR | PASS, FAIL, ERROR, WARNING, INFO, SKIP |
severity |
VARCHAR | debug, info, warning, error, critical |
category |
VARCHAR | Domain-specific classifier |
error_code |
VARCHAR | Tool-specific error code or rule ID |
Severity Levels¶
Severity indicates the importance/urgency of an event, ordered from lowest to highest:
| Level | Description | Examples |
|---|---|---|
debug |
Trace/debug information | Build command output, directory changes |
info |
Informational events | Passing tests, commands executed, summaries |
warning |
Non-fatal issues | Skipped tests, deprecations, compiler warnings |
error |
Errors and failures | Test failures, build errors, lint errors |
critical |
Fatal/severe errors | Crashes, security critical, internal errors |
Severity Threshold Parameter¶
Control which events are emitted using severity_threshold:
-- Default (threshold = 'all') - includes everything, backwards compatible
SELECT * FROM read_duck_hunt_log('build.log', 'make_error');
-- Only warnings and above
SELECT * FROM read_duck_hunt_log('test.json', 'pytest_json', severity_threshold := 'warning');
-- Only errors and critical
SELECT * FROM read_duck_hunt_log('build.log', 'make_error', severity_threshold := 'error');
-- Explicit 'all' for clarity
SELECT * FROM read_duck_hunt_log('build.log', 'make_error', severity_threshold := 'all');
Events are emitted when event.severity >= threshold. The default is 'all' (alias for 'debug') for backwards compatibility.
Status vs Severity¶
These fields serve different purposes:
| Field | Purpose | Example |
|---|---|---|
status |
Semantic outcome | PASS, FAIL, SKIP |
severity |
Importance level | info, warning, error |
A passing test has status='PASS' but severity='info'. A skipped test has status='SKIP' but severity='warning'.
Content Fields¶
| Field | Type | Description |
|---|---|---|
message |
VARCHAR | Detailed error/warning message |
suggestion |
VARCHAR | Suggested fix or improvement |
log_content |
VARCHAR | Original raw log content for this event |
structured_data |
VARCHAR | Additional metadata (JSON or format name for delegated events) |
structured_data Usage¶
The structured_data field serves multiple purposes:
- JSON Metadata: Additional structured information from the parser (e.g., test configuration)
- Delegation Format: For workflow delegation, contains the format name of the delegated parser (e.g.,
make_error,pytest_text). This allows filtering delegated events by tool type.
Log Tracking Fields¶
Track where each event originated within the source log file.
| Field | Type | Description |
|---|---|---|
log_file |
VARCHAR | Path to the log file being parsed |
log_line_start |
INTEGER | 1-indexed line where event starts in log |
log_line_end |
INTEGER | 1-indexed line where event ends in log |
Note: ref_line refers to source code line (e.g., line 15 in main.c), while log_line_start/log_line_end track position within the log file itself.
Context Extraction¶
When using context := N parameter, an additional context column is added containing surrounding log lines:
-- Include 3 lines before/after each event
SELECT ref_file, message, context
FROM read_duck_hunt_log('build.log', 'make_error', context := 3);
Context column type: LIST(STRUCT(line_number INT, content VARCHAR, is_event BOOL))
| Field | Type | Description |
|---|---|---|
line_number |
INTEGER | 1-indexed line number in original log file |
content |
VARCHAR | The line content |
is_event |
BOOLEAN | TRUE if this line is part of the parsed event |
Example usage:
-- Access individual context lines
SELECT context[1].line_number, context[1].content
FROM parse_duck_hunt_log(log_text, 'make_error', context := 2);
-- Filter to just the event lines
SELECT list_filter(context, x -> x.is_event) as event_lines
FROM read_duck_hunt_log('build.log', context := 3);
-- Count context lines
SELECT len(context) as total_context_lines
FROM read_duck_hunt_log('build.log', context := 5);
Note: The context column is only added when context > 0. By default (context := 0), the column is not present.
Test Fields¶
| Field | Type | Description |
|---|---|---|
test_name |
VARCHAR | Full test identifier |
execution_time |
DOUBLE | Duration in milliseconds |
Identity & Network Fields¶
For tracking actors and network activity.
| Field | Type | Description |
|---|---|---|
principal |
VARCHAR | Actor identity (ARN, email, username) |
origin |
VARCHAR | Source (IP address, hostname, runner) |
target |
VARCHAR | Destination (IP:port, HTTP path, resource ARN) |
actor_type |
VARCHAR | Type: user, service, system, anonymous |
Temporal Fields¶
| Field | Type | Description |
|---|---|---|
started_at |
VARCHAR | Event timestamp (ISO format) |
Correlation Fields¶
| Field | Type | Description |
|---|---|---|
external_id |
VARCHAR | External correlation ID (request ID, trace ID) |
Hierarchy Fields¶
Generic 4-level hierarchy that maps to any domain (CI/CD, K8s, cloud, tests).
| Field | Type | Description |
|---|---|---|
hierarchy_level |
INTEGER | Hierarchy depth: 1=scope, 2=group, 3=unit, 4=subunit/delegated |
scope |
VARCHAR | Level 1: Broadest context (workflow, cluster, account, suite) |
scope_id |
VARCHAR | Unique identifier for scope |
scope_status |
VARCHAR | Status at scope level |
group |
VARCHAR | Level 2: Middle grouping (job, namespace, region, class) |
group_id |
VARCHAR | Unique identifier for group |
group_status |
VARCHAR | Status at group level |
unit |
VARCHAR | Level 3: Specific unit (step, pod, service, method) |
unit_id |
VARCHAR | Unique identifier for unit |
unit_status |
VARCHAR | Status at unit level |
subunit |
VARCHAR | Level 4: Sub-unit when needed (container, resource) |
subunit_id |
VARCHAR | Unique identifier for subunit |
Hierarchy Level Values¶
The hierarchy_level field indicates the depth of an event in the hierarchy:
| Level | Name | CI/CD Example | Description |
|---|---|---|---|
| 1 | Scope | Workflow | Top-level container |
| 2 | Group | Job | Major grouping |
| 3 | Unit | Step | Specific work unit |
| 4 | Subunit/Delegated | Compiler error | Sub-unit or delegated parser output |
Workflow Delegation: When workflow parsers delegate to tool parsers (e.g., make_error), the delegated events have hierarchy_level = 4. See Workflow Delegation for details.
See Field Mappings for how these map to specific domains.
Pattern Analysis Fields¶
For error clustering and root cause analysis.
| Field | Type | Description |
|---|---|---|
fingerprint |
VARCHAR | Normalized event signature for pattern detection |
similarity_score |
DOUBLE | Similarity to pattern cluster centroid (0.0-1.0) |
pattern_id |
BIGINT | Assigned pattern group ID (-1 if unassigned) |
How Fingerprinting Works¶
The fingerprint is a hash of the normalized message. Normalization removes variable content so structurally identical errors cluster together:
- Numbers:
line 15→line <N> - Quoted strings:
'myVar'→<STR> - File paths:
/home/user/src/main.c→<PATH> - Hex values:
0x7fff5fbff8c0→<HEX>
Example: These errors get the same fingerprint:
Both normalize to: error: <STR> undeclared
Pattern Clustering¶
-- Find most common error types
SELECT
pattern_id,
COUNT(*) as occurrences,
ANY_VALUE(message) as example
FROM read_duck_hunt_log('build.log', 'auto')
WHERE status = 'ERROR'
GROUP BY pattern_id
ORDER BY occurrences DESC;
Workflow-Specific Fields¶
The read_duck_hunt_workflow_log() function outputs additional fields beyond the standard schema:
| Field | Type | Description |
|---|---|---|
workflow_type |
VARCHAR | Workflow system: github_actions, gitlab_ci, jenkins, docker_build, spack |
hierarchy_level |
INTEGER | Depth: 1=workflow, 2=job, 3=step, 4=delegated |
parent_id |
VARCHAR | ID of parent element in hierarchy |
job_order |
INTEGER | Job execution order (ZIP format only, -1 otherwise) |
job_name |
VARCHAR | Job name from ZIP filename (ZIP format only) |
The job_order and job_name fields are populated when using github_actions_zip format to parse downloaded workflow run ZIP archives. See Workflow Formats for details.
Event Types¶
| Event Type | Description |
|---|---|
TEST_RESULT |
Test pass/fail/skip |
LINT_ISSUE |
Linting warning or error |
TYPE_ERROR |
Type checking error |
BUILD_ERROR |
Compilation/build error |
SECURITY_FINDING |
Security vulnerability |
MEMORY_ERROR |
Memory access error |
MEMORY_LEAK |
Memory leak detected |
THREAD_ERROR |
Threading/concurrency error |
PERFORMANCE_ISSUE |
Performance problem |
CRASH_SIGNAL |
Program crash/signal |
DEBUG_EVENT |
Debugger event |
SUMMARY |
Summary/statistics |
Summary Events¶
Parsers emit SUMMARY events to confirm tool execution and provide aggregate statistics. This is useful when using severity_threshold to filter events—you can still see that a tool ran successfully even when filtering out individual issues.
Parsers with summary events:
- pytest - "2 passed, 1 failed in 1.23s" from text output or JSON summary object
- ESLint - "3 problem(s) in 2 file(s)" with file/issue counts
- Flake8 - "18 issue(s) found" with total count
- MyPy - "Success: no issues found" or "Found 5 errors"
- Pylint - "Your code has been rated at 8.50/10"
Summary event fields:
- event_type = 'summary'
- status = WARNING if issues found, INFO if clean
- severity = 'warning' or 'info'
- message = Human-readable summary
- structured_data = JSON with counts (e.g., {"passed":5,"failed":1})
Example: Filtering with summaries
-- Show only errors, but keep summary events visible
SELECT event_type, status, message
FROM read_duck_hunt_log('pytest.json', 'pytest_json', severity_threshold := 'warning')
WHERE status = 'ERROR' OR event_type = 'summary';
Status Values¶
| Status | Description |
|---|---|
PASS |
Test passed, check succeeded |
FAIL |
Test failed, check failed |
ERROR |
Error occurred |
WARNING |
Warning issued |
INFO |
Informational message |
SKIP |
Test/check skipped |
Example Queries¶
Get all fields for an event:
Filter by status:
SELECT ref_file, ref_line, message
FROM read_duck_hunt_log('build.log', 'auto')
WHERE status = 'ERROR';
Aggregate by pattern:
SELECT fingerprint, COUNT(*) as count
FROM read_duck_hunt_log('logs/*.log', 'auto')
WHERE fingerprint IS NOT NULL
GROUP BY fingerprint;
Workflow hierarchy: