No items found.

TabTune Resampling: Context Engineering for Zero-Shot Imbalanced Tabular Learning

3 mins

Real-world tabular problems (fraud, anomalies, credit risk, medical screening) are rarely balanced, so models often optimize a loss dominated by the majority class, hit strong aggregate metrics like AUC, and still fail where it matters (minority recall, stable F1, rare-event calibration).
In the extreme, models can collapse into a “zero recall” regime where they essentially learn the majority class.

TabTune’s update makes resampling native and treats it as the primary lever under imbalance: instead of changing architectures or losses, TabTune reshapes the training context distribution by controlling which rows enter the model’s conditioning set, changing gradient exposure without changing model design.

Implementation · Resampling as the “context control plane”
Drop-in pipeline configuration (inference-time tuning)

Nothing about the model architecture changes. You control which examples enter the context, and that changes exposure.

tuning_strategy: "inference" context control: context_sampling_strategy switchable suite: uniform · stratified · balanced · hybrid ·
Python · TabTune
from tabtune.TabularPipeline.pipeline import TabularPipeline

pipeline = TabularPipeline(
    model_name="TabICL",           # TabPFN, TabDPT, OrionMSP, OrionBIX, Limix, etc.
    task_type="classification",
    tuning_strategy="inference",
    tuning_params={
        "context_size": 10000,
        "context_sampling_strategy": "balanced",   # ← switch strategies here
        "strat_set": 10,
        "hybrid_ratio": 0.7,
        "kmeans_centers": 2000,
        "min_pos": 50,
        "oversample_weight": 5.0,
        "sampling_seed": 42,
        "allow_replacement": True,
    },
)

pipeline.fit(X_train, y_train)
y_proba = pipeline.predict_proba(X_test)[:, 1]

Why context is the critical object (and why this boosts zero-shot)

For tabular foundation models that learn from a conditioning set (context), the effective “training signal” is not just the dataset distribution, it is the context distribution. TabTune calls this out explicitly: when model capacity is sufficient, “the context distribution often dominates downstream performance.”

Formally, let \( D = \{(x_i, y_i)\}_{i=1}^{n} \) be your training pool and \( C \) be a context of size \( m \) sampled from \( D \). A resampler defines a distribution \( q(i) \) over indices (or a constrained sampling procedure). Training or inference-time conditioning then depends on:

\[ C \sim q, \quad \text{not } \mathrm{Uniform}(D) \]

On imbalanced data, uniform \( q \) yields majority-class dominated gradients and conservative boundaries.

Resampling intervenes at the data selection layer to increase signal in underrepresented regions without adding architectural complexity.

Fine-tuning vs resampling (pragmatic view)

  • Fine-tuning can make sense when you have “clean” datasets (large, well-labeled, stable distribution) and you can afford gradient updates and careful calibration.
  • Resampling is the compute-efficient alternative when you want adaptation without retraining: you change \( q \), not \( \theta \). In TabTune this is wired as inference-time tuning (tuning_strategy="inference") so you can keep a strong zero-shot baseline and still adapt by context selection.

Implementation surface (one switch, multiple samplers)

Resampling is configured in tuning_params via context_sampling_strategy, plus sampler-specific knobs (e.g., hybrid_ratio, kmeans_centers, min_pos, oversample_weight).

Mathematical definitions for each sampling method

Let \( m = \) context_size. For classification with \( K \) classes, define class sets \( S_c = \{ i : y_i = c \} \) and counts \( n_c = |S_c| \).

1) Uniform sampling

Preserves original distribution.

\[ q_{\text{uni}}(i) = \frac{1}{n} \]

Sample \( m \) indices i.i.d. from \( q_{\text{uni}} \) (or without replacement if allow_replacement=False).

2) Stratified sampling

Stabilizes representation without rebalancing; classification keeps class proportions, regression uses quantile bins.

Classification (fixed proportions):

\[ m_c = \left\lfloor m \cdot \frac{n_c}{n} \right\rfloor, \qquad \sum_{c=1}^{K} m_c = m \]

Then sample \( m_c \) uniformly from each \( S_c \).

Regression (binned targets):

Bin \( y \) into \( B \) bins \( b(i) \in \{1, \dots, B\} \) using quantiles (qcut; fallback cut).

\[ m_b = \left\lfloor m \cdot \frac{n_b}{n} \right\rfloor \]

Sample \( m_b \) uniformly from each bin.

3) Balanced sampling

Forces equal representation across classes (or bins); strong recall, but shifts training vs inference distribution and increases threshold sensitivity.

Classification (equal mass per class):

\[ q_{\text{bal}}(i) = \frac{1}{K} \cdot \frac{1}{n_{y_i}} \]

Equivalently, set \( m_c \approx m / K \) and sample within each class.

Regression (equal mass per bin):

\[ q_{\text{bal-reg}}(i) = \frac{1}{B} \cdot \frac{1}{n_{b(i)}} \]

4) Weighted minority oversampling

Inverse-frequency weighted sampling with replacement plus a boost multiplier (oversample_weight) and a minimum enforced minority count (min_pos).

Binary case \( y \in \{0,1\} \), minority \( y = 1 \). Let \( \beta = \) oversample_weight.

\[ w(i) = \begin{cases} \beta \cdot \frac{1}{n_1} & y_i = 1 \\ \frac{1}{n_0} & y_i = 0 \end{cases} \qquad q_{\text{wos}}(i) = \frac{w(i)}{\sum_{j=1}^{n} w(j)} \]

Min-pos constraint

Enforce at least \( m_1 = \) min_pos positives by construction:

  • Sample \( m_1 \) indices from \( S_1 \)
  • Sample \( m - m_1 \) remaining indices from \( q_{\text{wos}} \) (or a background sampler)

This is explicitly not naïve duplication; it is intentional signal amplification.

5) SMOTE / SMOTENC (synthetic minority)

Used when imblearn is available. Pipeline: temporary NaN imputation → SMOTE (numerical) / SMOTENC (categorical) → subsample back to context size.

Numerical SMOTE generation

Pick minority anchor \( i \in S_1 \), neighbor \( j \in \mathcal{N}_k(i) \), sample \( \lambda \sim U(0,1) \):

\[ \tilde{x} = x_i + \lambda (x_j - x_i), \qquad \tilde{y} = 1 \]

SMOTENC abstraction

Interpolate numeric features as above; set categorical features via discrete operator over neighbors (e.g., per-feature mode):

\[ \tilde{x}^{(c)}_t = \mathrm{mode} \left\{ x^{(c)}_{r,t} : r \in \{i\} \cup \mathcal{N}_k(i) \right\} \]

Build an augmented pool and sample \( m \) points from it.

6) Diversity-based sampling (MiniBatch KMeans)

Focus is coverage, not balance. Workflow: impute → one-hot encode → MiniBatch KMeans → pick one representative per cluster.

Let \( \phi(x) \) be the impute + one-hot map. Fit KMeans with \( K_c = \) kmeans_centers producing centroids \( \mu_1, \dots, \mu_{K_c} \).

\[ c(i) = \arg\min_{k} \left\| \phi(x_i) - \mu_k \right\|_2^2 \]

Representative per cluster

\[ i_k = \arg\min_{i : c(i)=k} \left\| \phi(x_i) - \mu_k \right\|_2^2 \]

Context is \( \{ i_1, \dots, i_{K_c} \} \), then trim or fill to size \( m \) as needed.

7) Hybrid strategies (signal + coverage)

TabTune ships hybrid_balanced_diverse (classification) and hybrid_stratified_diverse (regression), mixing balanced/stratified sampling with diversity using hybrid_ratio.

Let \( \rho = \) hybrid_ratio.

\[ m_{\text{sig}} = \lfloor \rho m \rfloor, \qquad m_{\text{cov}} = m - m_{\text{sig}} \]
\[ C = C_{\text{sig}} \cup C_{\text{cov}} \]

where \( C_{\text{sig}} \) is sampled via balanced (classification) or stratified (regression), and \( C_{\text{cov}} \) via KMeans diversity.

Case studies: resampling as the “context control plane”

Case study · Resampling as the “context control plane”

Home Credit Default Risk (messy, highly imbalanced, production-like)

The core win wasn’t “a better model”. It was better context composition.

~307k rows 120+ features heavy missingness ~9% default rate
Baseline failure mode
Classical baselines hit ~0.919 accuracy but fell into the “zero recall trap”: defaulter recall ≈ 0.0%.
Sampling intervention
7 resampling strategies × 4 TFMs, sweeping context sizes from 1,024 → 50,000.
Impact
  • Balanced best mean AUC = 0.745
  • Balanced / Hybrid +0.04 to +0.06 AUC over Uniform
  • TFMs reached ~0.75–0.77 AUC, surpassing classical (~0.739)
  • MCC ≈ 0.21 vs ≈ 0.00 naive
  • Matched baseline at ~5k–10k context instead of ~246k samples
Takeaway
Resampling prevented majority-only contexts and made zero-shot inference actually sensitive to defaulters. The control plane is which examples enter the context.
Case study · Resampling as the “context control plane”

Lending Club Loan Default (large-scale, noisy, interaction-heavy)

When the dataset is big and messy, resampling improves the context diet TFMs learn from without needing retraining loops.

hundreds of thousands of loans 70+ features ~15–20% defaults signal-bearing missingness
Setup note
This is a single-table credit dataset with strong feature interactions and noisy labels. The goal is fast, data-efficient adaptation, not re-running full supervised training every time.
Sampling intervention
Same resampling strategy suite tested across multiple TFMs with varying context sizes.
Impact
  • Strategy leaderboard: Hybrid and Balanced dominate across TFM architectures, with Hybrid slightly ahead on aggregate (0.6863 vs 0.6833 mean Val AUC-ROC).
  • Head-to-head: Hybrid wins 70%+ of matchups against most strategies; Balanced is the closest competitor.
  • Scaling behavior: Balanced/Hybrid scale most consistently as context grows; SMOTE can become erratic at larger windows.
  • Practical ceiling reference: fully supervised XGBoost on the large dataset sits around 0.7177 AUC-ROC.
Takeaway
On a large, noisy credit dataset, the top strategies consistently improved the examples entering the context, delivering stronger validation AUC without needing full retraining pipelines.

Conclusion

This release turns context selection into a first-class control surface: instead of paying the full fine-tuning tax for every dataset, teams can adapt TFMs by sampling the right context (Balanced/Hybrid when imbalance is the bottleneck, Diversity/Hybrid when coverage is the bottleneck). In the Home Credit benchmark this translated into matching classical baselines at ~5k–10k context with no gradient retraining, i.e., “context selection replaces training”.

Net effect: faster, more flexible enterprise deployment of TFMs for heavy-use scenarios, including:

  • rapid “bring-your-own-table” predictive pipelines without multi-day retraining cycles,
  • dataset Q&A / analytical probing where you want strong zero-shot behavior but need quick, distribution-aware adaptation through context construction. 

Aditya Tanna
Research Scientist
Subscribe to Lexsi

Stay Up to Date With All the News & Updates

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.