Skip to content

Latest commit

 

History

History
255 lines (162 loc) · 11.8 KB

File metadata and controls

255 lines (162 loc) · 11.8 KB

Reporting: Implementation Guide

Practical guidance for implementing APTS Reporting requirements. Each section provides a brief implementation approach, key considerations, and common pitfalls.

Note: This guide is informative, not normative. Recommended defaults and example values are suggested starting points; the Reporting README contains the authoritative requirements. Where this guide and the README differ, the README governs.


APTS-RP-001: Evidence-Based Finding Validation

Implementation: Attach raw technical evidence (logs, screenshots, payloads, network traces) to every finding. Clearly demarcate any AI-generated summaries or analysis as distinct from raw evidence.

Key Considerations:

  • Evidence must be reproducible by the client and independent reviewers
  • Storage should support secure chain-of-custody for forensic purposes

Common Pitfalls:

  • Including only AI summaries without raw technical data
  • Mixing evidence with unsubstantiated interpretations

APTS-RP-002: Finding Verification and Human Review Pipeline

Implementation: Re-execute all Critical and High findings before report finalization using automated workflows. Tag each finding as "Confirmed" or "Unconfirmed" with reproduction timestamp and methodology.

Key Considerations:

  • Automation must simulate human exploitation paths, not just tool-based detection
  • Unconfirmed findings require escalation for manual review

Common Pitfalls:

  • Assuming initial detection is sufficient without re-verification
  • Automating reproduction without documenting the specific steps used

APTS-RP-003: Confidence Scoring with Auditable Methodology

Implementation: Assign 0-100% confidence scores based on measurable factors (evidence completeness, reproduction success, tool consistency, manual verification). Document the scoring model used in the assessment methodology.

Key Considerations:

  • Confidence scoring must be independent from severity/impact ratings
  • Scoring methodology should be transparently disclosed to clients

Common Pitfalls:

  • Conflating confidence with severity or criticality
  • Using subjective confidence scores without documented factors

Confidence Scoring Methodology:

A recommended confidence scoring approach uses weighted factors:

Factor Weight Description
Evidence quality 30% Direct evidence (exploit success) scores 100%; indirect evidence (version banner match) scores 50%; inference only scores 20%
Independent confirmation 25% Confirmed by 2+ independent methods scores 100%; single method scores 50%
Environmental factors 20% Default configuration scores 100%; custom/hardened environment scores 60%
Historical accuracy 15% Platform's historical true positive rate for this vulnerability class
Recency 10% Evidence gathered within 1 hour scores 100%; decays to 50% after 24 hours

Confidence = Sum(factor_score * weight)

Thresholds: Confirmed (>= 90%), High Confidence (70-89%), Medium Confidence (50-69%), Low Confidence (< 50%).

Findings below 50% confidence SHOULD be flagged as "Unconfirmed" and excluded from the executive summary unless the customer requests full disclosure.


APTS-RP-004: Finding Provenance Chain

Implementation: Maintain a cryptographically-signed chain linking each finding to its discovery method, tool output, and operator actions. Use timestamped logs with digital signatures for forensic accountability.

Key Considerations:

  • Provenance data must be immutable and timestamped
  • Chain should include tool version, configuration, and execution parameters

Common Pitfalls:

  • Losing tool output or discovery artifacts mid-assessment
  • Failing to document who verified each finding and when

Implementation Aid: See the Evidence Package Manifest appendix for an illustrative machine-readable structure that links one finding to its raw artifacts, provenance events, and review state.


APTS-RP-005: Cryptographic Evidence Chain Integrity

Implementation: Bind all evidence to the discovery chain using SHA-256 hashing of raw artifacts. Provide hash values in the report with instructions for client-side verification against original evidence files.

Key Considerations:

  • Hashing must cover complete artifacts (logs, captures, transcripts)
  • Include hash computation methodology in technical appendix

Common Pitfalls:

  • Hashing only summaries instead of raw data
  • Failing to provide verification instructions for clients

APTS-RP-006: False Positive Rate Disclosure

Implementation: Document and disclose the methodology for identifying and filtering false positives. Include per-severity false positive rates measured across the assessment (for example, "2 of 50 Medium findings verified as false positives").

Key Considerations:

  • FP rates should be calculated and reported per severity level
  • Methodology must explain how false positives were identified

Common Pitfalls:

  • Omitting FP metrics from reports entirely
  • Reporting inflated accuracy without disclosing how false positives were determined

APTS-RP-007: Independent Finding Reproducibility

Implementation: Designate reviewers independent from the automated tool chain to manually reproduce a statistically significant sample of Critical findings (minimum 80% coverage).

Key Considerations:

  • Sample size should be defensible (for example, 100% for <5 findings, 80% for larger sets)
  • Validation should occur before report delivery to allow remediation

Common Pitfalls:

  • Validating findings after initial report without time for remediation
  • Using the same operator who ran tools for validation

APTS-RP-008: Vulnerability Coverage Disclosure

Implementation: Provide a coverage matrix mapping tested vulnerability classes to CWE/OWASP, clearly marking tested/excluded/partially-tested areas. Explain scope limitations and why certain vectors were excluded.

Key Considerations:

  • Coverage should account for architecture constraints (for example, "client-side testing excluded per scope")
  • Matrix should enable clients to assess residual risk from untested areas

Common Pitfalls:

  • Implying complete coverage when only a subset of vulnerability classes were tested
  • Failing to explain why certain CWEs were out-of-scope

APTS-RP-009: False Negative Rate Disclosure and Methodology

Implementation: Document the methodology used to assess false negative risk. Publish per-vulnerability-class FN rate estimates based on control gaps, tool detection limits, and known blind spots.

Key Considerations:

  • FN rates should be disclosed for each major vulnerability class
  • Methodology should explain benchmarking approach and limitations

Common Pitfalls:

  • Omitting FN estimates or providing only a single aggregate figure
  • Failing to explain which vulnerability classes have higher FN risk

APTS-RP-010: Detection Effectiveness Benchmarking

Implementation: Conduct quarterly benchmarks against controlled vulnerable environments (for example, DVWA, WebGoat) to validate detection accuracy. Document and trend results across tool versions and assessment methodologies.

Key Considerations:

  • Benchmarks must use consistent, reproducible vulnerable targets
  • Results should inform improvements to detection rules and methodologies

Common Pitfalls:

  • Running benchmarks inconsistently or with varying targets
  • Failing to correlate benchmark results with real assessment performance

APTS-RP-011: Executive Summary and Risk Overview

Implementation: Provide a non-technical executive summary covering risk posture, overall findings distribution, coverage achieved, and key remediation priorities. Include context for risk-based decision making without requiring technical expertise.

Key Considerations:

  • Summary should be actionable for C-level stakeholders
  • Include trend data and comparison to industry baselines where available

Common Pitfalls:

  • Writing executive summaries that are too technical
  • Omitting business-context framing of findings

APTS-RP-012: Remediation Guidance and Prioritization

Implementation: For each finding, provide step-by-step remediation guidance mapped to effort categories (quick-fix, short-term, long-term). Prioritize findings by risk and implementability to guide resource allocation.

Key Considerations:

  • Remediation steps should be specific to the assessed environment
  • Effort estimates should account for organizational factors (for example, change control requirements)

Common Pitfalls:

  • Generic remediation advice that doesn't address specific context
  • Prioritization by severity alone without considering effort or business impact

APTS-RP-013: Engagement SLA Compliance Reporting

Implementation: Document engagement timeline, any interruptions or scope changes, percentage of planned scope tested, and areas left untested or partially tested with reasons documented.

Key Considerations:

  • Timeline should show actual compared to planned dates for phased assessments
  • Scope changes must be documented with impact on coverage

Common Pitfalls:

  • Omitting scope gaps from reports
  • Failing to explain why certain areas were not completed

APTS-RP-014: Trend Analysis for Recurring Engagements

Implementation: For repeat clients, compare findings across engagements to identify new vulnerabilities, resolved issues, and persistent findings. Include trend metrics (closure rate, mean time to remediation, recurrence rate).

Key Considerations:

  • Trend analysis should enable clients to assess remediation effectiveness
  • Persistent findings may indicate systemic control issues requiring deeper investigation

Common Pitfalls:

  • Treating each engagement as independent without cross-engagement analysis
  • Failing to investigate reasons why findings persist across multiple assessments

APTS-RP-015: Downstream Finding Pipeline Integrity

Implementation: Maintain data fidelity through all post-assessment processes: deduplication logic, tenant isolation in multi-client environments, delivery tracking, and audit logs. Implement cryptographic signing for final deliverables.

Key Considerations:

  • Deduplication must preserve original evidence and discovery context
  • Delivery logs should provide proof of integrity to recipients

Common Pitfalls:

  • Losing original discovery context during deduplication
  • Failing to verify tenant isolation in multi-tenant assessment platforms
  • Lack of tamper-evident controls on final report deliverables

Implementation Roadmap

Tier 1 (implement before any autonomous pentesting begins): RP-006 (false positive rate disclosure), RP-008 (vulnerability coverage disclosure), RP-011 (executive summary and risk overview).

Start with RP-011 (executive summary) and RP-008 (coverage disclosure). Customers need these in every report. Add RP-006 (false positive disclosure) to establish trust in findings.

Tier 2 (implement within first 3 engagements): RP-001 (evidence-based finding validation), RP-002 (automated reproduction of critical findings), RP-003 (confidence scoring), RP-004 (finding provenance chain), RP-005 (cryptographic evidence integrity), RP-009 (false negative rate disclosure), RP-012 (remediation guidance), RP-013 (engagement SLA compliance), RP-014 (trend analysis, SHOULD), RP-015 (downstream pipeline integrity, SHOULD).

Prioritize RP-001 and RP-002 (evidence and reproduction) first. Findings without evidence are worthless. Then add RP-003 and RP-004 (confidence scoring, provenance) for auditability, and RP-005 (cryptographic integrity) for tamper-proof evidence chains.

Tier 3 (implement based on assessment maturity): RP-007 (independent finding validation during assessment), RP-010 (detection effectiveness benchmarking, SHOULD).