"Vanilla LSTM borderline unusable" refers to standard, unmodified Long Short-Term Memory networks failing badly on FBA demand data.
Why Vanilla LSTMs Fail FBA Forecasting
Basic LSTMs (single layer, no attention/TCNs) crumble under Amazon's realities despite being designed for sequences:
Extreme Sparsity: New ASINs have <30 sales points—LSTMs need 2+ years for seasonality but overfit wildly on 5 sales/week, hitting 35%+ MAPE as hidden states memorize noise, not patterns.
Vanishing Gradients Persist: Even with gates, long Q4-to-July gaps (>90 timesteps) cause gradients to vanish; FBA's intermittent demand (zero-sales weeks) makes backprop ineffective beyond 30 days.
No Long-Range Dependencies: LSTMs struggle with FBA's multi-scale volatility—Black Friday spikes, fee changes, competitor surges 6 months apart. Sequential processing can't "remember" distant events like TCNs do via dilated convolutions.
1-Step Lag Trap: Default LSTMs learn trivial F(t+1)=F(t) solutions, ignoring true forecasting. FBA flash sales expose this—model predicts yesterday's velocity forever.
Code Evidence
# Vanilla LSTM backtest on sparse FBA data model = Sequential([LSTM(50, return_sequences=False), Dense(1)]) # After 100 epochs: val_loss plateaus at 0.45 (42% MAPE) # TCN equivalent: 0.22 (18% MAPE)