language
license
apache-2.0
task_categories
tags
knowledge-graph
cybersecurity
mitre-attack
capec
cwe
cve
cpe
d3fend
atlas
car
engage
epss
kev
vulnrichment
ghsa
sigma
exploitdb
misp-galaxy
stix
threat-intelligence
triples
pretty_name
Security Knowledge Graph Triples (ATT&CK / CAPEC / CWE / CVE / CPE / D3FEND / ATLAS / CAR / ENGAGE / EPSS / KEV / Vulnrichment / GHSA / Sigma / ExploitDB / MISP Galaxies)
size_categories
configs
config_name
data_files
default
enterprise
split
path
train
data/enterprise.parquet
true
config_name
data_files
mobile
split
path
train
data/mobile.parquet
config_name
data_files
ics
split
path
train
data/ics.parquet
config_name
data_files
attack-all
split
path
train
data/attack-all.parquet
config_name
data_files
capec
split
path
train
data/capec.parquet
config_name
data_files
cwe
split
path
train
data/cwe.parquet
config_name
data_files
cve
split
path
train
data/cve.parquet
config_name
data_files
cpe
split
path
train
data/cpe.parquet
config_name
data_files
d3fend
split
path
train
data/d3fend.parquet
config_name
data_files
atlas
split
path
train
data/atlas.parquet
config_name
data_files
car
split
path
train
data/car.parquet
config_name
data_files
engage
split
path
train
data/engage.parquet
config_name
data_files
epss
split
path
train
data/epss.parquet
config_name
data_files
kev
split
path
train
data/kev.parquet
config_name
data_files
vulnrichment
split
path
train
data/vulnrichment.parquet
config_name
data_files
ghsa
split
path
train
data/ghsa.parquet
config_name
data_files
sigma
split
path
train
data/sigma.parquet
config_name
data_files
exploitdb
split
path
train
data/exploitdb.parquet
config_name
data_files
misp_galaxy
split
path
train
data/misp_galaxy.parquet
config_name
data_files
combined
split
path
train
data/combined.parquet
dataset_info
features
name
dtype
subject
string
name
dtype
predicate
string
Security Knowledge Graph Triples
Security data from 16 sources represented as Subject-Predicate-Object (SPO) triples in Parquet format, ready for knowledge-graph construction, graph-ML, RAG pipelines, and threat-intelligence analysis.
Sources: ATT&CK · CAPEC · CWE · CVE · CPE · D3FEND · ATLAS · CAR · ENGAGE · EPSS · KEV · Vulnrichment · GHSA · Sigma · ExploitDB · MISP Galaxies
Last updated: 2026-04-06T13:23:46Z
from datasets import load_dataset
ds = load_dataset ("s0u9ata/security-kg" , "enterprise" )
print (ds ["train" ][0 ])
# {'subject': 'T1059.001', 'predicate': 'rdf:type', 'object': 'Technique'}
Config
Description
Est. Triples
Status
enterprise (default)
Enterprise ATT&CK
42,041
Current
mobile
Mobile ATT&CK
5,307
Current
ics
ICS ATT&CK
3,756
Current
attack-all
ATT&CK combined (deduplicated)
49,622
Current
capec
CAPEC attack patterns
8,114
Current
cwe
CWE weaknesses
14,565
Current
cve
CVE vulnerabilities
3,546,666
Current
cpe
CPE platform enumeration
12,399,534
Current
d3fend
D3FEND defensive techniques
8,154
Current
atlas
ATLAS AI/ML techniques
1,420
Current
car
CAR analytics
1,617
Current
engage
ENGAGE adversary engagement
1,464
Current
epss
EPSS exploit prediction scores
649,788
Current
kev
KEV known exploited vulns
17,054
Current
vulnrichment
CISA Vulnrichment (SSVC, CVSS, CWE enrichment)
656,237
Current
ghsa
GitHub Security Advisories
327,142
Current
sigma
Sigma detection rules
32,750
Current
exploitdb
ExploitDB public exploits
346,303
Current
misp_galaxy
MISP Galaxy threat intelligence clusters
177,294
Current
combined
All sources merged (deduplicated)
18,237,724
Current
Knowledge Graph Structure
Group Campaign
\ /
uses
|
v
TECHNIQUE -----> Tactic
^ ^ ^
| | |
| | +-- D3FEND (counters)
| | +-- CAR (detects)
| | +-- Sigma (detects)
| | +-- ENGAGE (engages)
| | +-- ATLAS (related)
| | +-- MISP Galaxies (cross-refs)
| |
| +-- Mitigation (mitigates)
| +-- DataComponent (detects)
|
+-- maps-to -- CAPEC
|
related-weakness
|
v
CWE
^
|
related-weakness
|
CVE ----> CPE
^
|
EPSS (score)
KEV (exploited)
GHSA (advisory)
Vulnrichment (SSVC)
ExploitDB (exploit)
Each row is a single triple with three string columns:
Column
Description
Examples
subject
Entity ID
T1059.001, G0016, CAPEC-66, CWE-79, CVE-2024-1234, cpe:2.3:a:apache:httpd:*, D3-FE, AML.T0000, CAR-2024-01-001, EAC0001, GHSA-xxxx-yyyy-zzzz, EDB-16929
predicate
Property name or relationship type
rdf:type, name, uses, mitigates, epss-score, counters, ssvc-exploitation, exploits-cve, detects-technique
object
Value or target entity ID
Technique, PowerShell, T1059, CWE-89, 0.97500, SecurityAdvisory, SigmaRule, Exploit
Predicate
Description
Example object value
rdf:type
Entity type
Technique, Group, Malware, Tool, Tactic, Mitigation, Campaign, DataSource, DataComponent
name
Display name
PowerShell
description
Full description text
Adversaries may abuse PowerShell...
platform
Applicable platform
Windows, Linux, macOS
domain
ATT&CK domain
enterprise-attack
alias
Alternative name
Cozy Bear
is-subtechnique
Whether entity is a sub-technique
True, False
belongs-to-tactic
Tactic ATT&CK ID
TA0002
shortname
Tactic shortname
credential-access
url
ATT&CK website URL
https://attack.mitre.org/techniques/T1059/001
created / modified
Timestamps
2020-01-14 17:18:32...
ATT&CK Relationship Predicates
Predicate
Typical subject / object
Example
uses
Group/Campaign/Software / Technique
G0016 / T1059.001
mitigates
Mitigation / Technique
M1049 / T1059.001
subtechnique-of
Sub-technique / Parent technique
T1059.001 / T1059
detects
DataComponent / Technique
DC0001 / T1059.001
attributed-to
Campaign / Group
C0018 / G0016
Predicate
Description
Example object value
rdf:type
AttackPattern
AttackPattern
name / description
Display name / full text
SQL Injection
abstraction / status
Level / status
Standard, Stable
likelihood / severity
Attack likelihood / severity
High
child-of
Parent attack pattern
CAPEC-248
related-weakness
Related CWE
CWE-89
maps-to-technique
Mapped ATT&CK technique
T1190.002
Predicate
Description
Example object value
rdf:type
Weakness
Weakness
name / description
Display name / full text
Cross-site Scripting (XSS)
abstraction / status
Level / status
Base, Stable
likelihood-of-exploit
Exploitation likelihood
High
child-of
Parent weakness
CWE-74
related-attack-pattern
Related CAPEC
CAPEC-86
platform
Applicable platform
JavaScript
consequence-scope / consequence-impact
Impact
Confidentiality, Read Data
introduction-phase
Introduction phase
Implementation
Predicate
Description
Example object value
rdf:type
Vulnerability
Vulnerability
state
CVE state
PUBLISHED
description
English description
A remote code execution...
date-published / date-updated
Timestamps
2024-01-15T00:00:00.000Z
assigner
Assigning organization
microsoft
vendor / product
Affected vendor/product
Microsoft, Windows
affects-cpe
Affected CPE string
cpe:2.3:o:microsoft:windows_10:*
platform
Affected platform
x64
related-weakness
Related CWE
CWE-79
cvss-base-score / cvss-severity
CVSS metrics
9.8, CRITICAL
Predicate
Description
Example object value
rdf:type
Platform
Platform
part
CPE part type
application, operating_system, hardware
vendor / product / version
Components
apache, httpd, 2.4.51
title
English display name
Apache HTTP Server 2.4.51
created / modified
Timestamps
2021-10-07
Predicate
Description
Example object value
rdf:type
DefensiveTechnique or OffensiveTechnique
DefensiveTechnique
name / definition
Display name / definition
File Encryption
synonym
Alternative name
Disk Encryption
child-of
Parent technique
PlatformHardening
counters
Countered offensive technique
T1059
Predicate
Description
Example object value
rdf:type
Tactic, Technique, CaseStudy, Mitigation
Technique
name / description
Display name / full text
ML Supply Chain Compromise
maturity
Technique maturity
Reviewed
belongs-to-tactic
Parent tactic
AML.TA0001
subtechnique-of
Parent technique
AML.T0000
related-attack-technique
Linked ATT&CK technique
T1195
related-attack-tactic
Linked ATT&CK tactic
TA0001
uses-technique
Case study technique
AML.T0000
mitigates
Mitigated technique
AML.T0000
Predicate
Description
Example object value
rdf:type
Analytic
Analytic
title / description
Analytic name / full text
Suspicious PowerShell Commands
platform
Applicable platform
Windows
information-domain
Information domain
Host
analytic-type
Type of analytic
Situational Awareness
detects-technique
Detected ATT&CK technique
T1059
detects-subtechnique
Detected subtechnique
T1059.001
covers-tactic
Covered ATT&CK tactic
Execution
maps-to-d3fend
Linked D3FEND technique
D3-PSA
Predicate
Description
Example object value
rdf:type
EngagementActivity or AdversaryVulnerability
EngagementActivity
name / description
Display name / full text
Software Manipulation
engages-technique
Engaged ATT&CK technique
T1001
vulnerability-of
ATT&CK technique this adversary vulnerability applies to
T1001
addresses-vulnerability
Addressed adversary vulnerability
EAV0001
Predicate
Description
Example object value
epss-score
Exploit probability (0-1)
0.97500
epss-percentile
Score percentile (0-1)
0.99900
Predicate
Description
Example object value
rdf:type
KnownExploitedVulnerability
KnownExploitedVulnerability
kev-vendor / kev-product
Affected vendor/product
Microsoft, Windows
kev-name / kev-description
Vulnerability name/description
Windows Privilege Escalation
kev-date-added / kev-due-date
Dates
2024-01-15
kev-required-action
Required remediation action
Apply updates per vendor instructions.
kev-ransomware-use
Ransomware campaign use
Known, Unknown
related-weakness
Related CWE
CWE-269
Predicate
Description
Example object value
ssvc-exploitation
SSVC exploitation status
active, poc, none
ssvc-automatable
Whether exploitation is automatable
yes, no
ssvc-technical-impact
Technical impact level
total, partial
adp-cvss-base-score
CISA-analyzed CVSS base score
9.8
adp-cvss-severity
CISA-analyzed CVSS severity
CRITICAL
adp-related-weakness
CISA-assigned CWE
CWE-79
adp-affects-cpe
CISA-assigned CPE
cpe:2.3:o:microsoft:windows_10:*
Predicate
Description
Example object value
rdf:type
SecurityAdvisory
SecurityAdvisory
summary
Advisory summary
XSS vulnerability in example-package
date-published / date-modified
Timestamps
2024-01-15T00:00:00Z
severity
Severity level
HIGH, MODERATE, LOW, CRITICAL
related-cve
Associated CVE
CVE-2024-1234
related-weakness
Associated CWE
CWE-79
cvss-vector
CVSS v3 vector string
CVSS:3.1/AV:N/AC:L/...
affects-package
Affected package (ecosystem/name)
npm/example-package
fixed-in
Fixed version for package (ecosystem/name@version)
npm/example-package@2.0.1
Predicate
Description
Example object value
rdf:type
SigmaRule
SigmaRule
title / description
Rule name / full text
Suspicious PowerShell Download
status
Rule maturity
stable, test, experimental
level
Detection severity
critical, high, medium, low, informational
author / date
Rule author / creation date
Security Researcher, 2024-01-15
logsource-category
Log source category
process_creation, network_connection
logsource-product
Log source product
windows, linux
logsource-service
Log source service
sshd, sysmon
detects-technique
Detected ATT&CK technique
T1059.001
related-cve
Related CVE
CVE-2024-1234
Predicate
Description
Example object value
rdf:type
Exploit
Exploit
description
Exploit description
Apache HTTP Server RCE
date-published
Publication date
2024-01-15
author
Exploit author
Metasploit
exploit-type
Exploit category
remote, local, dos, webapps
platform
Target platform
linux, windows, aix
verified
Verified by OffSec
True
exploits-cve
Exploited CVE
CVE-2024-1234
Predicate
Description
Example object value
rdf:type
Galaxy entity type
ThreatActor, Ransomware, Botnet, RAT
name
Display name
APT1
description
Full description
(text)
galaxy
Galaxy cluster type
threat-actor, ransomware
synonym
Alternative name
Comment Crew
country
Country code (ISO 3166-1)
CN
cfr-suspected-state-sponsor
Suspected state sponsor
China
targets-country
Targeted country
United States
targets-sector
Targeted sector
Government
attribution-confidence
Confidence level
50
similar-to
Similar/duplicate entity
misp:<uuid>
uses
Uses technique/tool
misp:<uuid>
used-by
Used by actor
misp:<uuid>
variant-of
Variant relationship
misp:<uuid>
targets
Targets entity
misp:<uuid>
attributed-to
Attributed to entity
misp:<uuid>
misp-related
Generic relationship
misp:<uuid>
related-attack-id
Cross-link to ATT&CK
T1059.001, G0006
The converter downloads source data, extracts entity property triples and relationship triples, and writes them as Parquet files. The source code and full documentation are at:
github.com/S0UGATA/security-kg
To regenerate or update this dataset:
git clone https://github.com/S0UGATA/security-kg.git
cd security-kg
pip install -r requirements.txt
python src/convert.py
This produces fresh Parquet files in output/ from the latest data across all 16 sources.
Explore the Parquet files interactively at security-kg-viz .
Knowledge Graph Construction : Load triples into Neo4j, RDFLib, or NetworkX for graph queries
Graph ML : Train graph neural networks (GNNs) on security data structure for link prediction
RAG / LLM Grounding : Use triples as structured context for retrieval-augmented generation
Threat Intelligence : Query relationships between groups, techniques, vulnerabilities, and mitigations
Vulnerability Prioritization : Combine SSVC, EPSS, KEV, and ExploitDB data for risk-based triage
Defensive Gap Analysis : Find heavily-used ATT&CK techniques with insufficient detection coverage
Supply Chain Risk : Score open-source packages by linking GHSA advisories to CVE/EPSS/KEV enrichment
Security Automation : Programmatically map detections to techniques to tactics
Cross-Source Analysis Notebook
The repository includes a Jupyter notebook with 16 cross-source analyses and visualizations built on combined.parquet — covering SSVC patch prioritization, defensive gap analysis, kill chain tactic coverage, exploit weaponization timelines, ransomware CWE pipelines, supply chain package risk, and more.
SSVC Patch Prioritization (Vulnrichment + EPSS + KEV)
import pandas as pd
from datasets import load_dataset
# Load combined graph for cross-source queries
ds = load_dataset ("s0u9ata/security-kg" , "combined" )
df = ds ["train" ].to_pandas ()
# Build SSVC triage matrix: exploitation status × automatable × EPSS score
ssvc = df [df .predicate == "ssvc-exploitation" ][["subject" , "object" ]].rename (columns = {"object" : "exploitation" })
auto = df [df .predicate == "ssvc-automatable" ][["subject" , "object" ]].rename (columns = {"object" : "automatable" })
epss = df [df .predicate == "epss-score" ][["subject" , "object" ]].copy ()
epss ["epss" ] = epss .object .astype (float )
triage = ssvc .merge (auto , on = "subject" ).merge (epss [["subject" , "epss" ]], on = "subject" )
# Highest priority: actively exploited + automatable + high EPSS
critical = triage [(triage .exploitation == "active" ) & (triage .automatable == "yes" ) & (triage .epss > 0.9 )]
print (f"Immediate action: { len (critical )} CVEs" )
Defensive Gap Analysis (ATT&CK + Sigma + D3FEND + CAR)
# Find ATT&CK techniques heavily used by APT groups but poorly covered by detections
uses = df [(df .predicate == "uses" ) & df .subject .str .startswith ("G" )]
group_usage = uses .groupby ("object" ).subject .nunique ().rename ("groups_using" )
# Count detection sources per technique (Sigma + CAR + D3FEND + ENGAGE)
sigma = df [df .predicate == "detects-technique" ].groupby ("object" ).subject .nunique ().rename ("detections" )
d3fend = df [df .predicate == "restricts" ].groupby ("object" ).subject .nunique ().rename ("defenses" )
coverage = pd .DataFrame (group_usage ).join (sigma ).join (d3fend ).fillna (0 )
gaps = coverage [(coverage .groups_using > 10 ) & (coverage .detections < 5 )]
print (f"High-usage, low-detection techniques: { len (gaps )} " )
Supply Chain Risk (GHSA + CVE + EPSS + KEV + ExploitDB)
# Score open-source packages by aggregating risk from linked CVEs
ghsa_cve = df [df .predicate == "related-cve" ][["subject" , "object" ]].rename (columns = {"subject" : "ghsa" , "object" : "cve" })
packages = df [df .predicate == "affects-package" ][["subject" , "object" ]].rename (columns = {"subject" : "ghsa" , "object" : "pkg" })
epss_scores = df [df .predicate == "epss-score" ][["subject" , "object" ]].copy ()
epss_scores ["epss" ] = epss_scores .object .astype (float )
kev_cves = set (df [(df .predicate == "rdf:type" ) & (df .object == "KnownExploitedVulnerability" )].subject )
exploit_cves = set (df [df .predicate == "exploits-cve" ].object )
# Join package → GHSA → CVE → enrichment
risk = packages .merge (ghsa_cve , on = "ghsa" ).merge (epss_scores [["subject" , "epss" ]], left_on = "cve" , right_on = "subject" )
risk ["in_kev" ] = risk .cve .isin (kev_cves )
risk ["has_exploit" ] = risk .cve .isin (exploit_cves )
risk ["ecosystem" ] = risk .pkg .str .split ("/" ).str [0 ]
# Top ecosystems by high-risk CVE count
high_risk = risk [(risk .epss > 0.5 ) | risk .in_kev | risk .has_exploit ]
print (high_risk .groupby ("ecosystem" ).cve .nunique ().sort_values (ascending = False ).head (10 ))
CAPEC → CWE → CVE (Attack Pattern Chain)
capec = load_dataset ("s0u9ata/security-kg" , "capec" )["train" ].to_pandas ()
cve = load_dataset ("s0u9ata/security-kg" , "cve" )["train" ].to_pandas ()
# Find CWEs related to SQL Injection (CAPEC-66)
cwe_ids = capec [(capec .subject == "CAPEC-66" ) & (capec .predicate == "related-weakness" )].object .tolist ()
# Find CVEs with those CWEs
for cwe_id in cwe_ids :
related_cves = cve [(cve .predicate == "related-weakness" ) & (cve .object == cwe_id )].subject .unique ()
print (f"{ cwe_id } : { len (related_cves )} CVEs" )
D3FEND (Defensive Taxonomy)
ds = load_dataset ("s0u9ata/security-kg" , "d3fend" )
df = ds ["train" ].to_pandas ()
# All 497 defensive techniques in the D3FEND taxonomy
defenses = df [(df .predicate == "rdf:type" ) & (df .object == "DefensiveTechnique" )]
print (f"Defensive techniques: { len (defenses )} " )
# Find children of a category (e.g., all techniques under Network Traffic Analysis)
children = df [(df .predicate == "child-of" ) & (df .object == "NetworkTrafficAnalysis" )].subject .tolist ()
# Get their names
names = df [df .predicate == "name" ][["subject" , "object" ]]
print (names [names .subject .isin (children )].to_string (index = False ))
Source Licensing & Attribution
This dataset is published under the Apache 2.0 license. The underlying source data is provided under various licenses as detailed below. By using this dataset, you agree to comply with each source's respective terms.
Source
License
Attribution
ATT&CK
Custom royalty-free (MITRE)
© The MITRE Corporation. Reproduced and distributed with the permission of The MITRE Corporation.
CAPEC
Custom royalty-free (MITRE)
© The MITRE Corporation. Reproduced and distributed with the permission of The MITRE Corporation.
CWE
Custom royalty-free (MITRE)
© The MITRE Corporation. Reproduced and distributed with the permission of The MITRE Corporation.
CVE
Custom permissive (MITRE)
© The MITRE Corporation. CVE® is a registered trademark of The MITRE Corporation.
CPE / NVD
Public domain (NIST)
This product uses data from the NVD API but is not endorsed or certified by the NVD.
D3FEND
MIT License
© The MITRE Corporation. MITRE D3FEND™ is a trademark of The MITRE Corporation.
ATLAS
Apache 2.0
© MITRE.
CAR
Apache 2.0
© The MITRE Corporation.
ENGAGE
Apache 2.0 (GitHub repo ) / Custom restrictive (website ToU )
© The MITRE Corporation. Reproduced and distributed with the permission of The MITRE Corporation. Note: the GitHub repo is licensed Apache 2.0, but the website terms restrict use to internal/non-commercial purposes. Clarification pending with MITRE.
EPSS
Custom permissive (FIRST)
Jacobs, Romanosky, Edwards, Roytman, Adjerid (2021), Exploit Prediction Scoring System , Digital Threats Research and Practice, 2(3). See first.org/epss .
KEV
Public domain (U.S. Gov)
Source: CISA Known Exploited Vulnerabilities Catalog.
Vulnrichment
CC0 1.0 Universal
Source: CISA Vulnrichment.
GHSA
CC BY 4.0
Source: GitHub Advisory Database. Licensed under CC BY 4.0 .
Sigma
Detection Rule License 1.1
Source: SigmaHQ. Licensed under DRL 1.1 . Rule author attribution is preserved in triples.
ExploitDB
GPLv2+
Source: OffSec ExploitDB. Derived factual metadata (IDs, CVE mappings, dates) extracted under GPLv2+ .
MISP Galaxies
CC0 1.0 / BSD 2-Clause
Source: MISP Project. Dual-licensed under CC0 1.0 and BSD 2-Clause .
Apache 2.0 — see Source Licensing & Attribution for individual source terms.