Synthetic multiplex terrorist networks + multi-task GNN baselines for HVT detection, role inference, importance scoring, and layer-aware link prediction.
Purpose & Safety Notice (Defensive CT Research Only)
This repository is intended for defensive, lawful counter-terrorism (CT) and criminal-network analysis researchβe.g., disruption strategy evaluation, risk scoring, and resilience planning on multiplex networks. All data are 100% synthetic. This code is not intended for operational targeting, real-world surveillance, or analysis of real social networks. See Ethical Considerations for allowed and prohibited uses.
A purpose-built research sandbox for disruption and risk analysis on purely synthetic multiplex terrorist networks β designed to be configurable, leakage-aware, and reproducible.
- Generate multi-layer multiplex graphs (e.g., hierarchy, finance, communication, operation, ideology) with explicit knobs for structure strength, randomness, missingness/observability, false edges, and cross-layer copied edges (provenance-aware noise).
- Train a multi-task GNN (R-GCN / Transformer-style encoder options) for
- HVT detection (high-value target classification)
- Role inference (courier, financier, leader, operative, support)
- Node-level importance regression (continuous criticality score)
- Benchmark layer-aware link prediction on finance & communication edges with:
uniformnegativeshard_regionhard-negative sampling (negatives constrained by region)
- Reproduce end-to-end results via CLI (generator β PyG dataset β diagnostics β train β summarize) with seeds + configs tracked for clean comparisons.
Operational terrorist/extremist networks are difficult to study rigorously because they are often:
- Multiplex: interactions span hierarchy, financing, communication, operations, and ideology (with meaningful cross-layer dependencies).
- Noisy & incomplete: partial observability, layer-dependent missingness, spurious links (false edges), and systematic sampling/visibility bias.
- Risk-sensitive: analysts care about actionability (who to prioritize/disrupt), not only generic centralityβwhile models must avoid evaluation artifacts (e.g., edge/label leakage).
This repository provides a safe, reproducible sandbox that captures these realities without any real-world data:
- Config-driven multiplex generator (v3) with explicit knobs for structure strength, randomness, missingness/observability, false-edge injection, and cross-layer copied edges (provenance-aware noise).
- Multi-task GNN baselines (v3) for node-level objectives: HVT, role, and importance (continuous criticality).
- Layer-wise link prediction benchmarks (v3) (e.g., finance/communication) with uniform vs. hard-region negatives and a leakage-safe message-passing protocol to quantify per-layer signal as difficulty/noise increases.
- 5+ layers (relation types):
hierarchy,finance,communication,operation,ideology(extensible) - Node attributes:
region,group,role, plus continuous feature vectors - Config-driven difficulty knobs (examples):
- Layer structure strength / homophily (community tightness)
- Layer randomness (esp. communication)
- Missingness / observability (layer-dependent edge drops + visibility bias)
- False edges injection (spurious links)
- Cross-layer copied edges (provenance-aware noise / leakage-like effects)
hvt_ratio, role priors, burstiness / activity gating (if enabled in config)
- Presets:
configs/generator_easy.jsonconfigs/generator_baseline.jsonconfigs/generator_hard.json
Output is a single manifest:
multiplex.json(single source of truth for build/diagnostics/training).
Produces a single torch_geometric.data.Data object containing:
- Graph
x,edge_index,edge_typeedge_attr(optional, if built/used)- Optional provenance flags (if present in manifest):
edge_is_false,edge_is_copied
- Labels
y_role(multi-class)y_hvt(binary)y_imp(continuous importance / criticality)
- Splits
train_mask,val_mask,test_mask
- Metadata
role_mapping,region_mapping,group_mapping(where applicable)- Optional normalization stats for importance/regression targets
- Multi-task node model: shared encoder + task heads (role, HVT, importance)
- Implemented in
train_multitask_gnn_v3.py(R-GCN / Transformer-style encoder options depending on flags)
- Implemented in
- Single-task HVT baseline:
train_hvt_gnn_v3.py - Layer-wise link prediction:
train_linkpred_layer_v3.py- Negative sampling modes:
uniformhard_region(region-constrained hard negatives)
- Leakage-safe message passing: held-out positives for the target layer are removed from the encoder graph
- Optional edge-signal usage:
--edge_attr_agg(aggregate edge attributes into node signals)--include_edge_flags(aggregate provenance flags likeedge_is_false/edge_is_copied)
- Negative sampling modes:
- Shell-friendly scripts for end-to-end runs: generate β build β diagnostics β train β evaluate
- Diagnostics for knob validation:
basic_diagnostics_v3.py - Aggregation + plots across run folders:
plot_multitask_linkpred_summary.py(merges multi-task + link-pred outputs into a compact summary)
Recommended layout:
multiplex-terror-network-gnn/
βββ README.md
βββ requirements.txt
βββ .gitignore
β
βββ configs/
β βββ generator_easy.json
β βββ generator_baseline.json
β βββ generator_hard.json
β
βββ src/
β βββ __init__.py
β βββ data/
β β βββ multiplex_generator_v3.py
β β βββ build_pyg_dataset_v3.py
β β βββ basic_diagnostics_v3.py
β βββ models/
β β βββ train_multitask_gnn_v3.py
β β βββ train_hvt_gnn_v3.py
β β βββ train_linkpred_layer_v3.py
β βββ analysis/
β βββ plot_multitask_linkpred_summary.py
β
βββ data/
β βββ multiplex_easy/
β βββ multiplex_baseline/
β βββ multiplex_hard/
β βββ analysis/
β
βββ results/
βββ summary_all/
git clone https://github.com/Navy10021/multiplex-terror-network-gnn.git
cd multiplex-terror-network-gnnconda create -n terror-gnn python=3.10 -y
conda activate terror-gnnInstall PyTorch / PyG based on your OS + CUDA setup. Example (CUDA 12.x):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121Then install PyG following the official instructions for your exact PyTorch/CUDA combination.
pip install -r requirements.lock # fully pinned
# or
pip install -r requirements.txt # if you need to adjust torch/pyg for your CUDAOne command (auto-managed run folder + metadata)
python -m src.run_all \
--config configs/generator_baseline.json \
--size 1500 \
--seed 2025 \
--out_root results/This creates a UTC-timestamped run directory (pattern: run_<date>_<config-hash>_seed<seed>) containing:
multiplex.json(validated manifest)pyg_data.pt(PyG dataset)diagnostics/(plots + CSVs)run_metadata.json(config hash, git commit, CLI command)DATASET_CARD.md(layer-wise edge noise/copied rates + artifact paths)
Manifest validation enforces that node IDs are contiguous (0..N-1), meta.num_nodes matches the node list, and every edge/event endpoint existsβcatching malformed inputs early before building datasets or training models.
To validate an existing manifest on its own (e.g., after editing or before sharing), run:
python -m src.validation.schema data/multiplex_baseline/multiplex.json --summaryThis exits non-zero on any schema error and prints node/edge/layer/event counts when --summary is provided.
End-to-end in four steps (works for easy, baseline, or hard).
python src/data/multiplex_generator_v3.py \
--size 1500 \
--seed 2025 \
--out_dir data/multiplex_baseline \
--config configs/generator_baseline.jsonThis will create data/multiplex_baseline/multiplex.json.
python src/data/build_pyg_dataset_v3.py \
--manifest data/multiplex_baseline/multiplex.json \
--out_path data/multiplex_baseline/pyg_data.ptpython src/data/basic_diagnostics_v3.py \
--manifest data/multiplex_baseline/multiplex.json \
--out_dir data/analysis/multiplex_baselineMulti-task HVT + role + importance:
python src/models/train_multitask_gnn_v3.py \
--data_path data/multiplex_baseline/pyg_data.pt \
--hidden_dim 64 --num_layers 3 --lr 1e-3 --epochs 500Layer-wise link prediction (finance or communication):
python src/models/train_linkpred_layer_v3.py \
--data_path data/multiplex_baseline/pyg_data.pt \
--layer finance \
--hidden_dim 64 --num_layers 3 --lr 1e-3 \
--neg_mode hard_region \
--epochs 500Repeat with configs/generator_easy.json or configs/generator_hard.json to sweep difficulty.
This repo supports an optional notebook workflow for rapid prototyping, visualization, and debugging. A Jupyter notebook is provided (e.g., ./notebooks/multiplex-terror-network-gnn.ipynb) that:
- Generates configurable synthetic multiplex network data
- GNN-based model
Below are representative results from the current v3 summary export (multitask_linkpred_summary.csv) on the synthetic multiplex benchmarks (n=1500, seed=2025, HVT ratio=0.07).
| Difficulty | HVT F1 | HVT AUC | Role F1 (macro) | Importance RΒ² |
|---|---|---|---|---|
| baseline | 0.619 | 0.977 | 0.568 | 0.732 |
| hard | 0.611 | 0.959 | 0.647 | 0.704 |
Note: In the current
multitask_linkpred_summary.csvexport, link prediction columns are empty (NaN), which usually means the correspondinglinkpred_*_v3.jsonartifacts were not found/merged.
Runtrain_linkpred_layer_v3.pyfor each layer and re-runplot_multitask_linkpred_summary.pyto populate this table.
| Difficulty | Finance LP AUC/AP | Comm LP AUC/AP |
|---|---|---|
| baseline | β | β |
| hard | β | β |
For link prediction, we evaluate two negative-sampling protocols and (optionally) report the better AUC/AP per difficulty setting:
uniform: negatives sampled uniformly at randomhard_region: negatives sampled from the same region (harder discrimination)
Quick read (from the node tasks above): baseline is slightly better on HVT AUC and importance RΒ², while hard improves role macro-F1.
By default, training artifacts are saved next to the dataset (--data_path directory).
Example after running the commands above:
data/multiplex_baseline/
βββ multiplex.json
βββ pyg_data.pt
βββ multitask_metrics.json
βββ hvt_metrics.json # if you run train_hvt_gnn.py
βββ linkpred_finance_uniform.json
βββ linkpred_finance_hard_region.json
βββ linkpred_communication_uniform.json
βββ linkpred_communication_hard_region.json
βββ multitask_plots/
βββ loss_curves.png
βββ hvt_auc_curve.png
βββ ...
If you prefer a results/<difficulty>/<run_name>/... layout, the simplest option is:
- create the run directory
- place (or copy)
pyg_data.ptthere - train using
--data_pathpointing at that run directory
This works because the scripts write *_metrics.json and plot folders to os.path.dirname(data_path).
- Custom difficulty: copy a config under
configs/and tweakfinance_structure_strength,comm_structure_strength,comm_randomness, andhvt_ratio. Pass it via--configtomultiplex_generator_v2.py. - New node/edge features: extend
generate_multiplex_with_configinsrc/data/multiplex_generator_v2.pyand ensure they are preserved inbuild_pyg_dataset.py. - Model variants:
- Add heads or encoders in
src/models/train_multitask_gnn.pyfor alternative loss balancing or architectures. - Swap decoders or negative sampling in
src/models/train_linkpred_layer.pyto test other link-prediction strategies.
- Add heads or encoders in
- Reporting: regenerate summary plots with
src/analysis/plot_multitask_linkpred_summary.pyafter adding new runs.
This repository is provided for defensive and lawful research only. The following principles apply.
- Synthetic Data Only
All networks are 100% synthetic and created solely for experimentation. No real persons, organizations, communications, or operational datasets are included or required. - Defensive CT / Criminal Network Research
Appropriate use includes method development, benchmarking, and robustness studies (e.g., disruption simulation, HVT scoring as a research task, and resilience analysis under noise/partial observability). - Transparency & Reproducibility
The code is intended to support peer review and methodological comparison.
Do NOT use this codebase to:
- Conduct or support operational targeting of real individuals or groups.
- Perform unauthorized surveillance, doxxing, or collection/analysis of real social network data without explicit legal authority and ethical approval.
- Target or suppress legitimate political groups, civil society organizations, journalists, activists, or lawful protest.
- Enable discrimination, harassment, intimidation, or violations of privacy, due process, or human rights.
If you adapt ideas from this repo to any real-world context, you are responsible for:
- Obtaining appropriate legal authorization, ethics/IRB review, and data governance approvals.
- Applying minimization, access control, audit logging, and security controls to protect sensitive data.
- Ensuring outputs are used for defensive decision support, with human oversight and accountability.
A license has not been selected yet.
- For personal or academic experimentation, you may use the code as-is.
- For any commercial, operational, or redistributed use, please contact the maintainer.
If you plan to make this repository broadly reusable, consider adding an OSI-approved license (e.g., MIT, Apache-2.0) and including a LICENSE file at the project root.
If you use this repository in academic work, please cite it as:
Lee, Yoon-seop. (2025). Multiplex Terror Network GNN (GitHub repository).
https://github.com/Navy10021/multiplex-terror-network-gnn
BibTeX (optional):
@misc{lee2025multiplexterror,
author = {Lee, Yoon-seop},
title = {Multiplex Terror Network GNN},
year = {2025},
howpublished = {GitHub repository},
url = {https://github.com/Navy10021/multiplex-terror-network-gnn}
}For questions, issues, or collaboration:
- GitHub Issues: please open an issue in this repository.
- Email: iyunseob4@gmail.com
Contributions, bug reports, and ideas for new experiments are welcome.