Safe AI reporting, proven before use.
FinSight Assurance is a multi-agent certification system for finance teams adopting AI-assisted financial reporting workflows.
The product helps a finance manager answer a high-risk operational question:
Which team members are ready to use AI in financial reporting, which ones need supervision, which ones must be blocked, and what evidence is needed next?
The submission is built for the Agents League Hackathon - Reasoning Agents track. It is a Foundry-aligned agentic product build with Foundry validation evidence, a Foundry IQ-ready synthetic source set, a seven-specialist-agent workflow record, a deterministic multi-agent orchestration core, a synthetic enterprise tool gateway, a dual-role review workspace, and an evaluation pack.
All source data is synthetic. Learners are represented by anonymous synthetic IDs such as EMP-001; no human names, real employee records, financial data, company records, customer data, supplier details, credentials, or confidential information are included.
- Jako Xie (
@Jako0309) - project owner, product direction, implementation, validation, and submission packaging. - Lingyu Gu (
@guccilucyg97-wq) - team member, finance-domain review, responsible AI framing, and learner-facing communication review.
FinSight Assurance is not a generic dashboard. It is a controlled reasoning workflow for deciding whether AI-generated work is safe enough to enter financial reporting.
- Problem: finance teams need to know who can safely use AI for variance explanations, Power BI narratives, and reporting commentary.
- Why agents: the decision requires role mapping, evidence retrieval, policy gating, workload-aware remediation, final certification, and manager rollout insight. A single chatbot or scorecard would hide those failure conditions.
- Workflow: seven specialist agents produce a verdict, blocked scope, missing evidence, remediation plan, and manager action.
- Tools: read-only synthetic tools ground the agents in learner profiles, role requirements, module evidence, policy signals, workload calendars, thresholds, and team readiness.
- Safety: the Policy and Privacy Gatekeeper can restrict or block use even when other evidence is positive.
- Human control: the system recommends certification status; managers approve, supervise, or block AI use.
- Reviewability: every case includes source files, tool-call trace, policy gate result, and audit metadata.
The project decomposes a real enterprise decision into specialist responsibilities:
- Case Intake Router identifies the synthetic staff case and reporting scope.
- Certification Requirement Mapper maps the role to mandatory modules, thresholds, and fail conditions.
- Evidence Curator checks module scores and workflow proof through approved synthetic sources.
- Policy and Privacy Gatekeeper applies reporting, privacy, and AI-output verification gates.
- Workload-Aware Remediation Planner schedules realistic next steps around close and review windows.
- Certification Assessment Agent returns the readiness verdict and practice checks.
- Manager Insight Agent converts individual decisions into team rollout guidance.
Each agent has a visible input, tool call, finding, and decision impact in the product workspace.
Use this path to verify the project quickly:
- Run the local synthetic tool API, then open
app/index.htmlto review the live manager and learner workspace. - Run
PYTHONPATH=src python3 -m finsight.evaluation.runnerand confirmPASS 5/5 evaluation cases. - Open
foundry/foundry_workflow_saved_v2.yamlto verify the seven-agent Foundry workflow record. - Open
docs/architecture_diagram.mdordocs/assets/architecture_diagram.svgto review the submission architecture diagram. - Open
screenshots/to review the clean seven-screen evidence set for the product, trace, controls, architecture, and live agent workbench. - Open
docs/rubric_alignment.mdto map the project to the public judging rubric. - Open
docs/tool_api_demo.mdandevaluations/tool_gateway_results.mdto verify the synthetic OpenAPI tool layer. - Open
docs/user_journey_and_accessibility.mdto review user stories, use cases, internal communication, and accessibility coverage. - Use the root README, architecture diagram, screenshots, demo video, and public GitHub URL for the Innovation Studio project form.
The repository is demoable without a continuously running Azure environment. Foundry validation evidence is preserved in evidence/foundry-validation/, and the local orchestration core makes the same reasoning flow repeatable for review.
Open the review workspace:
app/index.html
The repository also includes a root index.html redirect so the product demo opens cleanly from the project root.
The review workspace shows:
- whether each learner is approved, supervised only, or not allowed yet
- chart-led readiness, gate, risk, and workflow coverage cards
- queue filters for ready, review-only, and blocked cases
- an interactive assistant for plain-language case questions, using the live local Python reasoning engine when it is running and packaged synthetic evidence only as the offline fallback
- Guided reading, high-contrast, keyboard, and read-aloud controls for accessible review
- privacy, AI verification, and workflow evidence gates
- visible multi-agent reasoning stages
- seven-day workload-aware remediation plans
- learning support cards for newer staff, guided explanations, and module-level training gaps
- Microsoft IQ layer mapping
- source-backed dashboard definitions and readiness metrics
- grounded practice checks for the learner
- manager action queue, approval metadata, and cohort rollout controls
- learner safe-use guidance focused on what can be done now and what is blocked
Run the orchestration core:
PYTHONPATH=src python3 -m finsight.cli assess EMP-001
PYTHONPATH=src python3 -m finsight.cli assess EMP-002
PYTHONPATH=src python3 -m finsight.cli assess EMP-004
PYTHONPATH=src python3 -m finsight.cli team
PYTHONPATH=src python3 -m finsight.cli workflowRun the evaluation suite:
PYTHONPATH=src python3 -m finsight.evaluation.runnerExpected result:
PASS 5/5 evaluation cases
The evaluation runner checks verdict accuracy, required safety terms, seven specialist stages, tool-call evidence on every learner decision stage, and approved synthetic source citations.
The tool gateway validation in evaluations/tool_gateway_results.md confirms that all seven synthetic enterprise tools return source-bounded results and that the privacy-critical blocking case is enforced.
Regenerate the review workspace from the multi-agent engine:
PYTHONPATH=src python3 -m finsight.cli export-workspaceThis writes app/workspace-data.js. The browser workspace reads this exported trace, so the interface is not a separate static screen; it is generated from the same seven-agent workflow used by the CLI and evaluation runner.
Generate the OpenAPI schema for the synthetic tool gateway:
PYTHONPATH=src python3 -m finsight.cli openapi-toolsRun the local synthetic tool API:
PYTHONPATH=src python3 -m finsight.cli serve-tools --host 127.0.0.1 --port 8787Open http://127.0.0.1:8787/health to check the API health status, or http://127.0.0.1:8787/ for the API landing JSON. The product workspace remains http://127.0.0.1:8790/ when served locally, or app/index.html when opened from the file system.
Use the same local service as the live data engine for the product workspace. When app/index.html opens, it automatically checks GET /workspace on http://127.0.0.1:8787. If the service is running, the interface refreshes from the current seven-agent reasoning payload and the floating Ask FinSight assistant calls POST /chat for scoped, server-side answers about the active synthetic case. If the service is not running, the workspace clearly shows an offline fallback badge and keeps the checked-in review data so the submission remains easy to open and use.
Reviewers do not need to provide their own API key to run the repository. The local review service runs without a key by default so the product workspace, CLI evaluation, and synthetic tool gateway can be checked immediately.
Optional live-model testing can be enabled by the project owner without changing repository files. Set these environment variables in the terminal that starts serve-tools:
bash scripts/start_live_model.shThe helper tries to list Azure deployments through Azure CLI if it is available. If Azure CLI is not available, paste the full chat-completions endpoint from the Foundry or Azure deployment page. For Azure OpenAI, the endpoint should include the deployment path and api-version query string. The key stays in the local Python process, is not sent to the browser, is not written to disk, and is not needed by reviewers. A live model call happens only when the floating Ask FinSight assistant sends a chat request while those environment variables are configured.
Review the local API demo and Foundry attachment path:
docs/tool_api_demo.md
If the OpenAPI tool service is hosted publicly later, set FINSIGHT_TOOL_API_KEY as a platform environment variable and configure the same value in Foundry as the OpenAPI tool credential. Do not commit the key.
That key is a project-side hosted-service credential, not a reviewer requirement. It exists only for optional hosted OpenAPI deployment and credential-boundary validation.
| Case | Expected Verdict | Why It Matters |
|---|---|---|
EMP-001 |
Conditionally ready | Shows supervised-use logic when AI verification and Power BI evidence are incomplete. |
EMP-002 |
Not certified yet | Shows privacy-critical blocking logic for unsafe supplier payment prompting. |
EMP-004 |
Certified | Shows the positive path for a strong finance analyst with manager approval. |
EMP-005 |
Not certified yet | Shows that missing workflow evidence cannot be invented. |
| Team briefing | Manager rollout blocked | Shows cohort-level readiness, risk, and training priorities. |
Approved synthetic source set
(prepared for Foundry IQ knowledge grounding)
|
v
Foundry seven-agent workflow record
|
v
Local synthetic enterprise tool gateway
|
+--> get_learner_profile
+--> resolve_role_requirements
+--> evaluate_module_evidence
+--> classify_policy_signals
+--> get_workload_calendar
+--> read_certification_thresholds
+--> summarize_team_readiness
|
+--> Case Intake Router
+--> Certification Requirement Mapper
+--> Evidence Curator
+--> Policy and Privacy Gatekeeper
+--> Workload-Aware Remediation Planner
+--> Certification Assessment Agent
+--> Manager Insight Agent
|
v
Semantic model
|
v
FinSight multi-agent orchestration core
|
+--> Case Intake Router
+--> Certification Requirement Mapper
+--> Evidence Curator
+--> Policy and Privacy Gatekeeper
+--> Workload-Aware Remediation Planner
+--> Certification Assessment Agent
+--> Manager Insight Agent
|
v
Certification decision + role-specific briefing
|
+--> Foundry validation transcript
+--> Dual-role review workspace
+--> Evaluation results
- Microsoft Foundry: agent build, workflow design, and validation environment.
- Foundry Agent Service concepts: portal-first agent implementation path represented through agent instructions, workflow records, and validation evidence.
- Foundry IQ: Microsoft IQ target for approved synthetic certification sources. The same six sources are also embedded in the Foundry agent instructions when the selected model or region does not expose file search or knowledge attachment.
- Foundry Workflow: validated seven-specialist-agent workflow record in
foundry/foundry_workflow_saved_v2.yamlplus the reusable blueprint infoundry/workflow_blueprint.yaml. - Azure validation environment: used for Foundry validation evidence.
- GitHub: public project submission.
The Foundry validation build used the Japan East environment and gpt-oss-120b, which was the usable model available during validation. The Foundry agent instructions embed the approved synthetic source set when file search or uploaded knowledge attachment is unavailable. The repository preserves the Foundry instructions, workflow record, and validation evidence, and it also includes a local orchestration core to make the multi-agent workflow explicit, repeatable, and evaluable without requiring a continuously running Azure environment.
Foundry IQ is the Microsoft IQ layer targeted by the submission build. The approved source set contains six synthetic documents:
data/01_certification_requirements.mddata/02_role_profiles.mddata/03_reporting_policy.mddata/04_ai_usage_and_privacy_policy.mddata/05_learner_progress_and_workload.mddata/06_team_readiness_dataset.md
The system models work context through workload calendars, month-end close pressure, reporting pack deadlines, manager review windows, and available training capacity.
The system includes a semantic model in data/semantic_model.json covering learners, roles, modules, risks, thresholds, managers, workload slots, and certification decisions.
FinSight Assurance includes a local synthetic enterprise tool gateway rather than a paid third-party API. The tools are intentionally limited to approved synthetic sources:
get_learner_profileresolve_role_requirementsevaluate_module_evidenceclassify_policy_signalsget_workload_calendarread_certification_thresholdssummarize_team_readiness
The implemented local tool gateway is in src/finsight/tools.py. The local OpenAPI-compatible service is in src/finsight/tool_api.py, and the generated OpenAPI 3.0 schema is in foundry/finsight_tools_openapi.json. Every local agent trace records the tool call used by that specialist stage.
The Foundry Tools UI exposes custom tool connections through OpenAPI, MCP, or A2A endpoints. Because those routes require an externally hosted endpoint, this submission does not claim that the local Python functions are attached inside the Foundry portal. Instead, foundry/finsight_tools_openapi.json documents the endpoint contract that can be deployed later without changing the reasoning design. foundry/tool_definitions.json keeps the same tool contract in function-definition form for agent-framework style runtimes.
app/
index.html
styles.css
app.js
workspace-data.js
index.html
DISCLAIMER.md
LICENSE
SECURITY.md
data/
01_certification_requirements.md
02_role_profiles.md
03_reporting_policy.md
04_ai_usage_and_privacy_policy.md
05_learner_progress_and_workload.md
06_team_readiness_dataset.md
semantic_model.json
docs/
agent_instructions.md
architecture_and_orchestration.md
architecture_diagram.md
assets/
architecture_diagram.svg
challenge_alignment_checklist.md
evaluation_plan.md
foundry_alignment.md
logic_walkthrough.md
repository_safety_notes.md
rubric_alignment.md
source_manifest.md
tool_api_demo.md
user_journey_and_accessibility.md
foundry/
README.md
agent_cards.md
finsight_tools_openapi.json
foundry_workflow_saved_v2.yaml
specialist_agent_instructions.md
tool_definitions.json
workflow_blueprint.yaml
evaluations/
evaluation_cases.json
evaluation_results.md
tool_gateway_results.md
evidence/
foundry-validation/
screenshots/
src/
finsight/
tests/
test_tool_api.py
The Reasoning Agents starter kit asks for a multi-agent learning and certification-readiness system: understand requirements, build study plans, generate practice checks, assess readiness, provide feedback, and explain the agent workflow. FinSight Assurance adapts that pattern to a high-stakes internal AI-use certification programme for finance teams:
- maps certification requirements to organisational roles
- creates role-based and workload-aware study plans
- generates grounded practice questions from approved sources
- evaluates individual readiness with conservative fail conditions
- provides manager-level insight across cohort readiness and rollout risk
- uses Microsoft Foundry validation evidence, Foundry Agent Service implementation patterns, a Foundry IQ-ready source set, and a documented Foundry Workflow design
- uses seven read-only synthetic tools for learner profiles, requirements, evidence, policy signals, workload, thresholds, and team readiness
- uses synthetic data only
- includes evaluation and responsible AI controls
The project is intentionally framed as a certification-readiness system rather than a generic finance assistant. The domain is financial reporting AI-use approval, but the workflow remains the challenge workflow: requirements, learning evidence, study plan, practice checks, readiness verdict, feedback, and manager guidance.
FinSight Assurance does not replace finance managers, accounting judgment, reporting controls, or human review. AI-assisted commentary is treated as draft material. The system blocks certification when privacy, source traceability, AI verification, workflow evidence, or manager approval is insufficient.
The submitted solution, architecture, code, data, UI, documentation, and demo materials are original team work for this challenge. No copied third-party repository, private data, credentials, or unlicensed assets are included.
The final package is intentionally limited to the product, source code, synthetic data, Foundry workflow evidence, screenshots, and verification assets.
- Product walkthrough:
app/index.html - Foundry validation evidence:
evidence/foundry-validation/ - Full screenshot evidence set:
screenshots/ - Tool/API demo guide:
docs/tool_api_demo.md - Architecture diagram:
docs/architecture_diagram.md - Uploadable architecture diagram asset:
docs/assets/architecture_diagram.svg - Foundry workflow blueprint:
foundry/workflow_blueprint.yaml - Saved Foundry workflow shape:
foundry/foundry_workflow_saved_v2.yaml - Judging rubric alignment:
docs/rubric_alignment.md - Evaluation result table:
evaluations/evaluation_results.md
Public team attribution is limited to names, GitHub handles, and project roles. Do not publish private contact details, student IDs, tenant details, subscription details, account screenshots, or credentials in the repository.