Schema V2 Migration Guide¶
This guide helps you migrate from the previous schema to Schema V2. This is a breaking change - queries using removed or renamed fields will need to be updated.
Quick Reference¶
Renamed Fields¶
| Old Field | New Field | Notes |
|---|---|---|
file_path |
ref_file |
Clarifies this is a referenced source file location |
line_number |
ref_line |
Clarifies this is a line in referenced source file |
column_number |
ref_column |
Clarifies this is a column in referenced source file |
raw_output |
log_content |
Clarifies this is the log content, not raw output |
error_fingerprint |
fingerprint |
Same functionality, shorter name |
workflow_name |
scope |
Generic hierarchy level 1 |
job_name |
group |
Generic hierarchy level 2 |
step_name |
unit |
Generic hierarchy level 3 |
workflow_run_id |
scope_id |
ID for hierarchy level 1 |
job_id |
group_id |
ID for hierarchy level 2 |
step_id |
unit_id |
ID for hierarchy level 3 |
workflow_status |
scope_status |
Status for hierarchy level 1 |
job_status |
group_status |
Status for hierarchy level 2 |
step_status |
unit_status |
Status for hierarchy level 3 |
Removed Fields¶
| Old Field | Migration Path |
|---|---|
source_file |
Use ref_file or structured_data |
build_id |
Use scope_id or external_id |
environment |
Use scope or structured_data |
file_index |
Use structured_data if needed |
root_cause_category |
Use fingerprint + pattern_id for clustering |
completed_at |
Compute from started_at + execution_time |
duration |
Use execution_time |
New Fields¶
| Field | Purpose |
|---|---|
log_file |
Path to the log file being parsed |
target |
Destination (IP:port, HTTP path, resource ARN) |
actor_type |
Type of actor: user, service, system, anonymous |
external_id |
External correlation ID (request ID, trace ID) |
subunit |
Hierarchy level 4 (container, sub-resource) |
subunit_id |
ID for hierarchy level 4 |
Migration Examples¶
Test Result Queries¶
Before:
SELECT test_name, status, error_fingerprint
FROM read_duck_hunt_log('pytest.json', 'pytest_json')
WHERE status = 'FAIL';
After:
SELECT test_name, status, fingerprint
FROM read_duck_hunt_log('pytest.json', 'pytest_json')
WHERE status = 'FAIL';
Location Field Queries¶
Before:
SELECT file_path, line_number, column_number, message
FROM read_duck_hunt_log('build.log', 'auto')
WHERE status = 'ERROR';
After:
SELECT ref_file, ref_line, ref_column, message
FROM read_duck_hunt_log('build.log', 'auto')
WHERE status = 'ERROR';
Note: The
ref_prefix clarifies these are locations referenced in the log (e.g., source file mentioned in an error), not the log file itself. Uselog_filefor the actual log file path.
CI/CD Workflow Queries¶
Before:
SELECT workflow_name, job_name, step_name, step_status
FROM read_duck_hunt_workflow_log('actions.log', 'github_actions')
WHERE step_status = 'failure';
After:
SELECT scope as workflow, "group" as job, unit as step, unit_status
FROM read_duck_hunt_workflow_log('actions.log', 'github_actions')
WHERE unit_status = 'failure';
Note:
groupis a SQL reserved word in some contexts. Use"group"(quoted) or alias it immediately.
Error Pattern Analysis¶
Before:
SELECT
error_fingerprint,
root_cause_category,
COUNT(*) as occurrences
FROM read_duck_hunt_log('build.log', 'auto')
WHERE status = 'ERROR'
GROUP BY error_fingerprint, root_cause_category;
After:
SELECT
fingerprint,
pattern_id,
COUNT(*) as occurrences
FROM read_duck_hunt_log('build.log', 'auto')
WHERE status = 'ERROR'
GROUP BY fingerprint, pattern_id;
Note:
root_cause_categorywas removed. Usefingerprintclustering withpattern_idfor grouping similar errors.
Multi-file Processing¶
Before:
SELECT source_file, build_id, environment, COUNT(*)
FROM read_duck_hunt_log('logs/**/*.log', 'auto')
GROUP BY source_file, build_id, environment;
After:
-- source_file, build_id, environment are no longer available
-- Use tool_name or message content for grouping
SELECT tool_name, scope, COUNT(*)
FROM read_duck_hunt_log('logs/**/*.log', 'auto')
GROUP BY tool_name, scope;
Workflow Hierarchy Traversal¶
Before:
SELECT
workflow_name,
job_name,
step_name,
workflow_status,
job_status,
step_status
FROM read_duck_hunt_workflow_log('ci.log', 'github_actions');
After:
SELECT
scope as workflow_name,
"group" as job_name,
unit as step_name,
scope_status as workflow_status,
group_status as job_status,
unit_status as step_status
FROM read_duck_hunt_workflow_log('ci.log', 'github_actions');
Duration Calculations¶
Before:
SELECT step_name, duration, completed_at
FROM read_duck_hunt_workflow_log('ci.log', 'github_actions');
After:
-- duration and completed_at are removed
-- Use execution_time for duration (in milliseconds)
SELECT unit as step_name, execution_time, started_at
FROM read_duck_hunt_workflow_log('ci.log', 'github_actions');
Creating Compatibility Views¶
If you have many queries to migrate, you can create views that provide the old field names:
-- Compatibility view for read_duck_hunt_log
CREATE VIEW duck_hunt_log_compat AS
SELECT
*,
-- Renamed fields
fingerprint as error_fingerprint,
scope as workflow_name,
"group" as job_name,
unit as step_name,
scope_id as workflow_run_id,
group_id as job_id,
unit_id as step_id,
scope_status as workflow_status,
group_status as job_status,
unit_status as step_status,
-- Removed fields (NULL placeholders)
NULL as source_file,
NULL as build_id,
NULL as environment,
NULL as file_index,
NULL as root_cause_category,
NULL as completed_at,
execution_time as duration
FROM read_duck_hunt_log('your_file.log', 'auto');
Field Mapping by Domain¶
The new generic hierarchy maps differently depending on the log type:
| Domain | scope | group | unit | subunit |
|---|---|---|---|---|
| CI/CD | Workflow | Job | Step | - |
| Kubernetes | Cluster | Namespace | Pod | Container |
| Cloud Audit | Account | Region | Service | - |
| Tests | Suite | Class | Method | - |
| App Logs | Service | Component | Handler | - |
See Field Mappings for complete documentation.
Breaking Changes Summary¶
- Column count changed: 40 → 39 fields
- Reserved word:
groupmay need quoting in SQL - No more
root_cause_category: Usefingerprint+pattern_idinstead - No more
source_file: Multi-file metadata removed - No more
completed_at/duration: Usestarted_at+execution_time
Getting Help¶
If you encounter issues migrating:
- Check Schema Reference for complete field documentation
- Check Field Mappings for domain-specific usage
- Open an issue at https://github.com/teaguesterling/duck_hunt/issues