A differentiable, liquid-democracy–inspired ensemble layer for PyTorch. Liquid Ensemble (LE) lets each expert partially delegate a sample to other experts via a learned delegation head
LE generalizes majority voting and relates to Mixture-of-Experts (MoE), but replaces hard routing/top‑k with soft, learnable vote delegation that is optimized end‑to‑end. Unlike hard-outing-MoE (i) it enables ensemble method to be used for the delegation problem, (ii) it allows for dynamic allocation of experts (harder samples recieve more experts), (iii) and it can operate under partiall information about the models, making it usefull for federative learning.
-
Learned delegation: Every expert outputs both a prediction and a normalized delegation vector
$d_i(x)$ over experts. -
Differentiable resolution: Two solvers convert the delegation graph into final expert powers
$p(x)$ :-
sink_one: closed‑form solution with a single absorbing sink; fast & stable. -
sink_many: iterative diffusion with per‑expert sinks; encourages specialization.
-
-
Uncertainty from structure: Confidence measures derived from the delegation matrix
$D(x)$ , power entropy, self‑delegation, and inter‑expert disagreement. -
Federated/partial observability friendly: Experts can learn to delegate toward experts with access to complementary views.
-
Auxiliary loss for sample load balancing and sample specializations.
-
Drop‑in layer: Works with regression, classification, or abitrary intermediate (embedding) layer. Tested on MLPs, CNNs, but could be easily adjusted to transformers or RNNs.
Here’s how to define simple MLP and CNN experts that are compatible with LiquidEnsembleLayer.
Each expert must return (prediction, delegation) where delegation is a probability distribution over experts.
import torch
import torch.nn as nn
import torch.nn.functional as F
from liquid.layers.liquid_ensemble_layer import LiquidEnsembleLayer
class MLPExperts(nn.Module):
def __init__(self, in_dim, out_dim, n_experts, hidden=64):
super().__init__()
self.predictor = nn.Sequential(
nn.Linear(in_dim, hidden), nn.ReLU(),
nn.Linear(hidden, out_dim)
)
self.router = nn.Sequential(
nn.Linear(in_dim, hidden), nn.ReLU(),
nn.Linear(hidden, n_experts), nn.Softmax(dim=-1)
)
def forward(self, x):
y = self.predictor(x)
d = self.router(x)
return y, d
class CNNExperts(nn.Module):
def __init__(self, in_channels, n_experts, out_dim=10):
super().__init__()
self.features = nn.Sequential(...) # e.g. conv layers + pooling + flatten
feat_dim = ... # match output of self.features
self.predictor = nn.Linear(feat_dim, out_dim)
self.router = nn.Sequential(
nn.Linear(feat_dim, n_experts),
nn.Softmax(dim=-1)
)
def forward(self, x):
h = self.features(x)
y = self.predictor(h)
d = self.router(h)
return y, d
class LiquidEnsembleModel(nn.Module):
def __init__(self, experts, solver="sink_one"):
super().__init__()
self.le = LiquidEnsembleLayer(experts, solver=solver)
def forward(self, x):
return self.le(x)
def loss(self, y_pred, y_true, aux_weight=1e-3):
task_loss = F.mse_loss(y_pred, y_true)
aux_loss = self.le.auxiliary_loss()
return task_loss + aux_weight * aux_lossFor a batch of inputs, each expert
- prediction
$y_i(x)$ - delegation
$d_i(x)$ (a probability over experts)
Stacking
Solvers
-
solve_delegation_one_sink(D, epsilon=0.01)— closed‑form via$(I - \tilde D)^{-1}$ with a single absorbing node; redistributes residual power uniformly. -
solve_delegation_many_sinks(D, epsilon=0.01, long_delegation_penalty=0.90, threshold=0.05)— iterative diffusion solution resolves residual power correctly.
- Evaluate RMSE and the quality of confidence
$c$ by Kendall’s$\tau$ between$c$ and error. - Study hyperparameters' impact on the RMSE ant
$\tau$ using shapley-iq analysis
- Compare block vs long designs for LE and MoE under fixed compute.
- Report scaling laws for each
If you use Liquid Ensemble in academic work, please cite:
@misc{vesely2025liquidensemble,
title={Liquid Ensemble Learning},
author={Viktor Vesel\\'y},
year={2025},
howpublished={\\url{https://github.com/viktorvesely/liquid}}
}Issues and PRs are welcome.