Schema V2 Migration Guide¶

This guide helps you migrate from the previous schema to Schema V2. This is a breaking change - queries using removed or renamed fields will need to be updated.

Quick Reference¶

Renamed Fields¶

Old Field	New Field	Notes
`file_path`	`ref_file`	Clarifies this is a referenced source file location
`line_number`	`ref_line`	Clarifies this is a line in referenced source file
`column_number`	`ref_column`	Clarifies this is a column in referenced source file
`raw_output`	`log_content`	Clarifies this is the log content, not raw output
`error_fingerprint`	`fingerprint`	Same functionality, shorter name
`workflow_name`	`scope`	Generic hierarchy level 1
`job_name`	`group`	Generic hierarchy level 2
`step_name`	`unit`	Generic hierarchy level 3
`workflow_run_id`	`scope_id`	ID for hierarchy level 1
`job_id`	`group_id`	ID for hierarchy level 2
`step_id`	`unit_id`	ID for hierarchy level 3
`workflow_status`	`scope_status`	Status for hierarchy level 1
`job_status`	`group_status`	Status for hierarchy level 2
`step_status`	`unit_status`	Status for hierarchy level 3

Removed Fields¶

Old Field	Migration Path
`source_file`	Use `ref_file` or `structured_data`
`build_id`	Use `scope_id` or `external_id`
`environment`	Use `scope` or `structured_data`
`file_index`	Use `structured_data` if needed
`root_cause_category`	Use `fingerprint` + `pattern_id` for clustering
`completed_at`	Compute from `started_at` + `execution_time`
`duration`	Use `execution_time`

New Fields¶

Field	Purpose
`log_file`	Path to the log file being parsed
`target`	Destination (IP:port, HTTP path, resource ARN)
`actor_type`	Type of actor: user, service, system, anonymous
`external_id`	External correlation ID (request ID, trace ID)
`subunit`	Hierarchy level 4 (container, sub-resource)
`subunit_id`	ID for hierarchy level 4

Migration Examples¶

Test Result Queries¶

Before:

SELECT test_name, status, error_fingerprint
FROM read_duck_hunt_log('pytest.json', 'pytest_json')
WHERE status = 'FAIL';

After:

SELECT test_name, status, fingerprint
FROM read_duck_hunt_log('pytest.json', 'pytest_json')
WHERE status = 'FAIL';

Location Field Queries¶

Before:

SELECT file_path, line_number, column_number, message
FROM read_duck_hunt_log('build.log', 'auto')
WHERE status = 'ERROR';

After:

SELECT ref_file, ref_line, ref_column, message
FROM read_duck_hunt_log('build.log', 'auto')
WHERE status = 'ERROR';

Note: The ref_ prefix clarifies these are locations referenced in the log (e.g., source file mentioned in an error), not the log file itself. Use log_file for the actual log file path.

CI/CD Workflow Queries¶

Before:

SELECT workflow_name, job_name, step_name, step_status
FROM read_duck_hunt_workflow_log('actions.log', 'github_actions')
WHERE step_status = 'failure';

After:

SELECT scope as workflow, "group" as job, unit as step, unit_status
FROM read_duck_hunt_workflow_log('actions.log', 'github_actions')
WHERE unit_status = 'failure';

Note: group is a SQL reserved word in some contexts. Use "group" (quoted) or alias it immediately.

Error Pattern Analysis¶

Before:

SELECT
    error_fingerprint,
    root_cause_category,
    COUNT(*) as occurrences
FROM read_duck_hunt_log('build.log', 'auto')
WHERE status = 'ERROR'
GROUP BY error_fingerprint, root_cause_category;

After:

SELECT
    fingerprint,
    pattern_id,
    COUNT(*) as occurrences
FROM read_duck_hunt_log('build.log', 'auto')
WHERE status = 'ERROR'
GROUP BY fingerprint, pattern_id;

Note: root_cause_category was removed. Use fingerprint clustering with pattern_id for grouping similar errors.

Multi-file Processing¶

Before:

SELECT source_file, build_id, environment, COUNT(*)
FROM read_duck_hunt_log('logs/**/*.log', 'auto')
GROUP BY source_file, build_id, environment;

After:

-- source_file, build_id, environment are no longer available
-- Use tool_name or message content for grouping
SELECT tool_name, scope, COUNT(*)
FROM read_duck_hunt_log('logs/**/*.log', 'auto')
GROUP BY tool_name, scope;

Workflow Hierarchy Traversal¶

Before:

SELECT
    workflow_name,
    job_name,
    step_name,
    workflow_status,
    job_status,
    step_status
FROM read_duck_hunt_workflow_log('ci.log', 'github_actions');

After:

SELECT
    scope as workflow_name,
    "group" as job_name,
    unit as step_name,
    scope_status as workflow_status,
    group_status as job_status,
    unit_status as step_status
FROM read_duck_hunt_workflow_log('ci.log', 'github_actions');

Duration Calculations¶

Before:

SELECT step_name, duration, completed_at
FROM read_duck_hunt_workflow_log('ci.log', 'github_actions');

After:

-- duration and completed_at are removed
-- Use execution_time for duration (in milliseconds)
SELECT unit as step_name, execution_time, started_at
FROM read_duck_hunt_workflow_log('ci.log', 'github_actions');

Creating Compatibility Views¶

If you have many queries to migrate, you can create views that provide the old field names:

-- Compatibility view for read_duck_hunt_log
CREATE VIEW duck_hunt_log_compat AS
SELECT
    *,
    -- Renamed fields
    fingerprint as error_fingerprint,
    scope as workflow_name,
    "group" as job_name,
    unit as step_name,
    scope_id as workflow_run_id,
    group_id as job_id,
    unit_id as step_id,
    scope_status as workflow_status,
    group_status as job_status,
    unit_status as step_status,
    -- Removed fields (NULL placeholders)
    NULL as source_file,
    NULL as build_id,
    NULL as environment,
    NULL as file_index,
    NULL as root_cause_category,
    NULL as completed_at,
    execution_time as duration
FROM read_duck_hunt_log('your_file.log', 'auto');

Field Mapping by Domain¶

The new generic hierarchy maps differently depending on the log type:

Domain	scope	group	unit	subunit
CI/CD	Workflow	Job	Step	-
Kubernetes	Cluster	Namespace	Pod	Container
Cloud Audit	Account	Region	Service	-
Tests	Suite	Class	Method	-
App Logs	Service	Component	Handler	-

See Field Mappings for complete documentation.

Breaking Changes Summary¶

Column count changed: 40 → 39 fields
Reserved word: group may need quoting in SQL
No more root_cause_category: Use fingerprint + pattern_id instead
No more source_file: Multi-file metadata removed
No more completed_at/duration: Use started_at + execution_time

Getting Help¶

If you encounter issues migrating:

Check Schema Reference for complete field documentation
Check Field Mappings for domain-specific usage
Open an issue at https://github.com/teaguesterling/duck_hunt/issues