Mean-Reversion in Python: Bollinger Bands and RSI

Trend-following gets all the attention, but anyone who has run a moving-average strategy through a sideways market knows the other half of the story: in a range, a trend system buys every false breakout and sells every false breakdown until your account looks like it has been through a paper shredder.

A mean-reversion strategy does the opposite. It assumes that after an unusually sharp move, price tends to snap back toward its recent average. In this article we build one in Python using two of the most popular indicators in technical analysis — Bollinger Bands and the RSI — and, more importantly, we backtest it honestly so you can see when it works and when it quietly bleeds.

By the end you will have:

  • A clear, no-nonsense explanation of what Bollinger Bands and the RSI actually measure.
  • A runnable pandas implementation of both, from scratch.
  • A complete mean-reversion backtest on SPY, with a fair comparison to buy & hold.
  • A frank list of the ways this strategy can fool you.

1. Why mean-reversion, and when it works

Markets alternate between two behaviors. In a trending phase, today’s move predicts tomorrow’s: strength begets strength. In a mean-reverting phase, today’s move predicts the opposite: an overextended drop tends to be followed by a bounce.

A mean-reversion strategy is a bet on the second behavior. The logic:

  • Price has dropped sharply below its recent average.
  • We assume the drop is an overreaction.
  • We buy, expecting a snap back to the mean.
  • We sell once price has reverted.

This works beautifully in choppy, range-bound markets — exactly the environment where trend systems struggle. The flip side, which we will return to in the traps section, is that the same strategy is dangerous in a strong downtrend, where every dip looks like a buying opportunity right up until the bottom falls out. Knowing which regime you are in matters, which is why this pairs naturally with regime detection — see the HMM market regimes article.

We need two things: a way to measure “how far is price from its average” (Bollinger Bands) and a confirmation that the move is genuinely stretched (RSI).


2. Bollinger Bands and the RSI in plain English

Bollinger Bands wrap a price chart in a statistical envelope. Three lines:

  • A middle band — a simple moving average, typically 20 days.
  • An upper band — the middle band plus two standard deviations of price.
  • A lower band — the middle band minus two standard deviations.

Because the bands are built from a rolling standard deviation, they widen when volatility rises and contract when it falls. When price touches the lower band, it sits roughly two standard deviations below its recent mean — a statistically unusual place to be. That is our “stretched to the downside” signal.

The RSI (Relative Strength Index) is a momentum oscillator bounded between 0 and 100. It compares the size of recent up-moves to recent down-moves over a lookback window, classically 14 days. Convention:

  • RSI below 30 → “oversold” — selling pressure has been dominant and may be exhausted.
  • RSI above 70 → “overbought”.

Neither indicator is magic, and either one alone produces a lot of false signals. The point of combining them is confirmation: we only act when the price says stretched (lower Bollinger Band) and the momentum says exhausted (low RSI). Two weak signals that agree are stronger than one.


3. Setting up the environment

pip install yfinance pandas numpy matplotlib

Imports:

import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt

4. Getting the data and computing Bollinger Bands

We use the SPY ETF as a proxy for the S&P 500, on daily bars.

data = yf.download("SPY", start="2005-01-01", end="2025-01-01", auto_adjust=True)
data = data[["Close"]].dropna()
data["log_return"] = np.log(data["Close"] / data["Close"].shift(1))

Bollinger Bands are a few lines of pandas:

window = 20
n_std = 2

data["mid_band"] = data["Close"].rolling(window).mean()
rolling_std = data["Close"].rolling(window).std()
data["upper_band"] = data["mid_band"] + n_std * rolling_std
data["lower_band"] = data["mid_band"] - n_std * rolling_std

A useful derived quantity is the %B, which expresses where price sits inside the bands on a 0–1 scale (0 = on the lower band, 1 = on the upper band):

data["pct_b"] = (data["Close"] - data["lower_band"]) / (data["upper_band"] - data["lower_band"])

5. Computing the RSI from scratch

Plenty of libraries ship an RSI, but it is worth implementing once so you know exactly what you are trading. We use Wilder’s smoothing, the original definition, which is an exponential moving average with alpha = 1 / period:

def rsi(series: pd.Series, period: int = 14) -> pd.Series:
    delta = series.diff()
    gain = delta.clip(lower=0)
    loss = -delta.clip(upper=0)
    avg_gain = gain.ewm(alpha=1 / period, adjust=False).mean()
    avg_loss = loss.ewm(alpha=1 / period, adjust=False).mean()
    rs = avg_gain / avg_loss
    return 100 - (100 / (1 + rs))

data["rsi"] = rsi(data["Close"], period=14)
data = data.dropna()

Quick sanity check — the RSI should spend most of its life between 30 and 70, with brief excursions to the extremes:

print(data["rsi"].describe())

If your RSI is pinned near 0 or 100, you have a bug — usually a sign you forgot to separate gains from losses correctly.


6. From indicators to entry and exit rules

Mean-reversion is a stateful strategy: you are either in a position or you are not, and a single day’s data is not enough to know which. So we define an entry condition and a separate exit condition, then walk through time tracking the state.

The rules:

  • Enter long when the close is below the lower Bollinger Band and the RSI is below 30.
  • Exit when the close climbs back to the middle band (the mean has been reached) or the RSI rises above 50.
  • Otherwise, hold whatever position we already have.
entry = (data["Close"] < data["lower_band"]) & (data["rsi"] < 30)
exit_ = (data["Close"] >= data["mid_band"]) | (data["rsi"] > 50)

position = np.zeros(len(data))
in_position = False
for i in range(len(data)):
    if not in_position and entry.iloc[i]:
        in_position = True
    elif in_position and exit_.iloc[i]:
        in_position = False
    position[i] = 1.0 if in_position else 0.0

data["position"] = position

The explicit loop is not the fastest code in the world, but for 5,000 daily bars it runs instantly and it is impossible to misread. Clarity beats cleverness when a subtle bug means losing real money.

One non-negotiable step: shift the position by one bar before computing returns. The decision to be in the market today is made from yesterday’s close, because you cannot trade on a price that has not printed yet.

data["position"] = data["position"].shift(1).fillna(0)

Skip that line and you build look-ahead bias straight into the backtest — the single most common way a mean-reversion strategy looks brilliant on screen and fails live.


7. Backtesting the strategy

With a clean position series, the backtest is one multiplication:

data["strategy_ret"] = data["position"] * data["log_return"]
data["bh_ret"] = data["log_return"]

equity = np.exp(data[["strategy_ret", "bh_ret"]].cumsum())
equity.columns = ["Bollinger+RSI mean-reversion", "Buy & Hold"]

equity.plot(figsize=(14, 6),
            title="Mean-reversion strategy vs Buy & Hold — SPY")
plt.ylabel("Equity (cumulative, log-return based)")
plt.tight_layout()
plt.show()

Mean-reversion strategies are, by construction, out of the market most of the time — they only hold a position during the recovery from an oversold dip. So do not expect the equity curve to track buy & hold. Expect a flatter line that steps up occasionally and, crucially, sidesteps a chunk of the worst drawdowns.


8. Measuring performance honestly

A picture is not a result. We need numbers, and we need the same numbers for buy & hold so the comparison is fair.

def sharpe(r):
    r = r.dropna()
    return np.sqrt(252) * r.mean() / r.std()

def max_drawdown(equity_curve):
    peak = equity_curve.cummax()
    return (equity_curve / peak - 1).min()

def time_in_market(position):
    return position.mean()

strat = data["strategy_ret"]
bh = data["bh_ret"]

print(f"Sharpe strategy   : {sharpe(strat):.2f}")
print(f"Sharpe buy & hold : {sharpe(bh):.2f}")
print(f"Max DD strategy   : {max_drawdown(np.exp(strat.cumsum())):.2%}")
print(f"Max DD buy & hold : {max_drawdown(np.exp(bh.cumsum())):.2%}")
print(f"Time in market    : {time_in_market(data['position']):.1%}")

The pattern you will usually see: the strategy earns a fraction of buy & hold’s total return — because it is invested only a small share of the time — but its drawdowns are far shallower, and the return per day actually invested is high. That last point is what makes mean-reversion useful as a building block: it produces a stream of returns that is largely uncorrelated with a trend system, so the two combine well in a portfolio.

Judge the strategy on risk-adjusted terms and on what it adds to a blend, not on whether it beats the index outright. It will not, and it is not trying to.


9. The traps you must not ignore

Mean-reversion is genuinely useful, but it has sharp edges. Name them or they will find you.

  • It catches falling knives. The strategy buys oversold dips. In a sustained crash — 2008, March 2020 — “oversold” gets more oversold for weeks. Every backtest of a long-only mean-reversion system will show ugly losses in those windows. A regime filter or a hard stop-loss is not optional for live trading.
  • Look-ahead bias. Covered above, and worth repeating: the shift(1) is the difference between a real backtest and a fantasy. Any indicator computed on the same bar you act on must be lagged.
  • Parameter overfitting. The 20-day window, 2 standard deviations, RSI 14, the 30/50 thresholds — every one of those is a knob. Tune them all on the full history and you are curve-fitting. Validate with walk-forward optimization so the parameters are chosen out-of-sample.
  • Transaction costs. This strategy trades fairly often. Add a realistic per-trade cost — subtract cost * abs(data["position"].diff()) from the returns — and re-run. Edges that survive on paper sometimes do not survive 5 basis points of friction.
  • Survivorship bias on single stocks. SPY is an index and reasonably safe. Run the same logic on individual stocks and a name that mean-reverted nicely for years can also simply go to zero — the ultimate failed reversion.

A strategy whose failure modes you can recite is one you can manage. A strategy that “just works” in the backtest is one you do not understand yet.


10. Where to go next

A few directions to push this further:

  • Add a regime filter. Only take mean-reversion trades when a regime model says the market is ranging, not trending down. This directly addresses the falling-knife problem — a natural pairing with the HMM market regimes article.
  • Size by conviction. Instead of a binary 0/1 position, scale exposure with how stretched %B and the RSI are. A deeper dip gets more capital.
  • Add a short side. Mirror the rules — short when price pierces the upper band with RSI above 70. Be warned: shorting an index with a long-term upward drift is a structural headwind.
  • Test the indicators as features. Feed %B and the RSI into a classifier rather than hand-coding thresholds — see the feature selection article for how to do that without drowning in noise.

Conclusion

Bollinger Bands and the RSI are two of the most-Googled indicators in trading, and most of what is written about them is either hand-wavy chart astrology or an equity curve with the look-ahead bias quietly left in. Built carefully in Python and backtested honestly, they form a real, if modest, mean-reversion strategy: not a market-beater on total return, but a low-drawdown, low-correlation return stream that earns its place as a component of a larger book.

Trade safe, and remember: the market can stay oversold a lot longer than your stop-loss can stay untriggered.

Walk-Forward Optimization in Python: The Honest Way to Backtest a Trading Strategy


Every quant has been there. You build a strategy, sweep a few parameters on ten years of data, the equity curve looks beautiful, and then live trading turns it into a sawtooth of disappointment. The backtest wasn’t wrong — it was dishonest. It told you what the best parameters were with the benefit of hindsight. That’s not a strategy. That’s a memory.

The cure has a name: walk-forward optimization. It is boring to implement, slow to run, and the resulting equity curves are uglier — which is precisely why most tutorials skip it.

In this article we will:

  • See, with a concrete example, how an in-sample optimization lies.
  • Build a walk-forward optimizer in pure Python (pandas + numpy, no exotic dependencies).
  • Apply it to a tunable SMA-crossover strategy on SPY.
  • Compare the honest equity curve to the seductive one — and discuss what to do when the gap is wide.

1. The backtest that lied

Take the simplest tunable strategy on earth: a moving-average crossover. Go long when the fast SMA crosses above the slow SMA, flat otherwise. Two parameters: fast and slow.

The naive recipe:

  1. Download 20 years of SPY.
  2. Try every combination of fast in [5, 10, 20, 30] and slow in [50, 100, 150, 200].
  3. Pick the pair with the best Sharpe.
  4. Report that Sharpe as “the strategy’s Sharpe”.

That last step is where the lie lives. You have just searched a 16-cell grid for the cell that fits this specific history best. The Sharpe you report is the maximum of 16 random variables, not the expected performance of the strategy. On unseen data, the same parameters will almost always underperform — sometimes by a wide margin.

The fix is conceptually trivial: never evaluate a strategy on data you used to choose its parameters.


2. Walk-forward in plain English

Walk-forward optimization slices the timeline into a sequence of (train, test) windows that march forward in time:

|===== train 1 =====|= test 1 =|
        |===== train 2 =====|= test 2 =|
                |===== train 3 =====|= test 3 =|
                       ...

In each train window you pick the best parameters by your chosen metric. You then apply those frozen parameters to the next test window — and only the returns from the test windows count toward the final equity curve.

Two common flavors:

  • Rolling window: the train window has fixed length and slides forward. Old data is forgotten.
  • Anchored (expanding) window: the train window grows; only the start is fixed.

Rolling is closer to how a real trader behaves (“forget what worked five years ago, the regime has changed”). Anchored is more statistically efficient when you believe the strategy edge is stable. We’ll use rolling here.


3. Setting up the environment

pip install yfinance pandas numpy matplotlib

Imports:

import itertools
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt

4. Data and the tunable strategy

data = yf.download("SPY", start="2005-01-01", end="2025-01-01", auto_adjust=True)
data = data[["Close"]].dropna()
data["log_return"] = np.log(data["Close"] / data["Close"].shift(1))
data = data.dropna()

The strategy as a pure function — given parameters and a price series, return the strategy’s daily log-returns:

def sma_crossover_returns(close: pd.Series, log_ret: pd.Series,
                          fast: int, slow: int) -> pd.Series:
    sma_fast = close.rolling(fast).mean()
    sma_slow = close.rolling(slow).mean()
    signal = (sma_fast > sma_slow).astype(int)
    # shift by 1 to use yesterday's signal for today's return → no look-ahead
    return signal.shift(1) * log_ret

And the scoring metric — annualized Sharpe of daily log-returns:

def sharpe(r: pd.Series) -> float:
    r = r.dropna()
    if r.std() == 0 or len(r) < 20:
        return -np.inf
    return np.sqrt(252) * r.mean() / r.std()
&#91;/code&#93;

<hr />

<h2>5. The naive full-sample optimization (the trap)</h2>

<p>For reference, let's compute the in-sample optimum the way most blog posts do it:</p>

[code language="python"]
fast_grid = [5, 10, 20, 30]
slow_grid = [50, 100, 150, 200]
grid = [(f, s) for f, s in itertools.product(fast_grid, slow_grid) if f < s&#93;

scores = {}
for f, s in grid:
    r = sma_crossover_returns(data&#91;"Close"&#93;, data&#91;"log_return"&#93;, f, s)
    scores&#91;(f, s)&#93; = sharpe(r)

best = max(scores, key=scores.get)
print("Best in-sample params:", best, "Sharpe:", scores&#91;best&#93;)
&#91;/code&#93;

<p>You will get a Sharpe somewhere around 0.6–0.8, depending on the seed of history. Remember that number — we will see it shrink.</p>

<hr />

<h2>6. Building the walk-forward loop</h2>

<p>Three knobs:</p>
<ul>
  <li><code>train_years</code>: length of each training window.</li>
  <li><code>test_months</code>: length of each test window — also how often we re-optimize.</li>
  <li>The parameter grid (kept identical to be fair).</li>
</ul>

[code language="python"]
def walk_forward(close: pd.Series, log_ret: pd.Series,
                 grid: list, train_years: int = 5, test_months: int = 6) -> pd.DataFrame:
    train_days = train_years * 252
    test_days = test_months * 21

    records = []
    oos_returns = pd.Series(index=log_ret.index, dtype="float64")

    start = train_days
    while start + test_days <= len(log_ret):
        train_slice = slice(start - train_days, start)
        test_slice = slice(start, start + test_days)

        # optimize on the training window
        best_params, best_score = None, -np.inf
        for params in grid:
            r = sma_crossover_returns(close.iloc&#91;train_slice&#93;,
                                      log_ret.iloc&#91;train_slice&#93;, *params)
            s = sharpe(r)
            if s > best_score:
                best_score, best_params = s, params

        # apply frozen params on the test window
        # Note: we need a small lookback into training so SMAs are warm
        lookback = max(p[1] for p in grid)
        eval_slice = slice(start - lookback, start + test_days)
        r_test = sma_crossover_returns(close.iloc[eval_slice],
                                       log_ret.iloc[eval_slice], *best_params)
        r_test = r_test.iloc[lookback:]  # drop the warm-up portion
        oos_returns.iloc[test_slice] = r_test.values

        records.append({
            "train_end": log_ret.index[start - 1],
            "test_end":  log_ret.index[start + test_days - 1],
            "params":    best_params,
            "is_sharpe": best_score,
        })
        start += test_days

    return oos_returns.dropna(), pd.DataFrame(records)

Two details worth slowing down on:

  1. The warm-up lookback. When you start a new test segment, the slow SMA needs slow past observations to even exist. If you compute SMAs only on the test slice, you throw away the first ~200 trading days of every test window. Including a lookback into the training data fixes this without leaking information — the prediction at day t still only uses prices up to t.
  2. Shift by one. The strategy already shifts the signal by one inside sma_crossover_returns, so today’s position is decided by yesterday’s close. This is non-negotiable. Forget it once and your beautiful walk-forward is just an elaborate look-ahead bias.

7. Running it and stitching the out-of-sample curve

oos_returns, log = walk_forward(data["Close"], data["log_return"], grid,
                                train_years=5, test_months=6)

oos_equity = np.exp(oos_returns.cumsum())
naive_returns = sma_crossover_returns(data["Close"], data["log_return"], *best)
naive_returns = naive_returns.loc[oos_returns.index]
naive_equity = np.exp(naive_returns.cumsum())
bh_equity = np.exp(data["log_return"].loc[oos_returns.index].cumsum())

fig, ax = plt.subplots(figsize=(14, 6))
oos_equity.plot(ax=ax, label="Walk-forward (honest)")
naive_equity.plot(ax=ax, label="In-sample optimum (looks great)")
bh_equity.plot(ax=ax, label="Buy & Hold")
ax.set_title("SMA crossover on SPY — three views of the same strategy")
ax.set_ylabel("Equity (log-return cumulative)")
ax.legend()
plt.tight_layout()
plt.show()

The in-sample curve and the walk-forward curve will almost never agree. On SPY with this grid, the in-sample version reports a Sharpe near 0.7; the walk-forward usually lands between 0.2 and 0.4. That gap is your overfitting tax — it is what you were silently paying every time you took an in-sample backtest at face value.

print("Sharpe walk-forward :", sharpe(oos_returns))
print("Sharpe in-sample    :", sharpe(naive_returns))
print("Sharpe buy & hold   :", sharpe(data["log_return"].loc[oos_returns.index]))

8. Parameter stability — the diagnostic that matters most

A high walk-forward Sharpe means little if the optimizer is jumping wildly between parameter sets in adjacent windows. That’s a sign the “edge” you’re capturing is noise, and next month’s chosen parameters will be the wrong ones.

fig, axes = plt.subplots(2, 1, figsize=(14, 6), sharex=True)
log_plot = log.copy()
log_plot["fast"] = log_plot["params"].apply(lambda p: p[0])
log_plot["slow"] = log_plot["params"].apply(lambda p: p[1])
log_plot.set_index("test_end")[["fast"]].plot(ax=axes[0], marker="o")
log_plot.set_index("test_end")[["slow"]].plot(ax=axes[1], marker="o")
axes[0].set_title("Selected `fast` parameter over time")
axes[1].set_title("Selected `slow` parameter over time")
plt.tight_layout()
plt.show()

What you want to see: long flat stretches with occasional changes. What you don’t want to see: a different pair every window. If the picks look like white noise, the grid is searching too aggressively for the test period length — increase train_years, shrink the grid, or accept that the strategy has no robust edge here.


9. Pitfalls that quietly ruin walk-forwards

Even when you do the basic loop right, a handful of subtler mistakes can re-inject the very bias you were trying to remove.

  • Window-length p-hacking. If you also tune train_years and test_months by looking at the final equity curve, you are back to overfitting — one level up. Decide on these knobs a priori and don’t touch them.
  • No transaction costs. SMA crossovers flip often. A round-trip cost of even 5 bps can knock 30% off the apparent Sharpe. Subtract cost * abs(signal.diff()) from returns and rerun.
  • Survivorship and look-ahead in the data itself. This matters less for SPY but matters a lot for stock universes — only use data that was available as of each rebalance date.
  • Multiple testing. If you try this strategy, then ten others, then pick the one with the best walk-forward Sharpe, you are again selecting the maximum of N. Walk-forward protects against parameter overfit, not against strategy-shopping.
  • Insufficient test data. A 6-month test window contains ~126 daily returns. Sharpe estimated on that is wildly noisy. Stack many such windows before taking the result seriously — and never trust a single segment.

10. Where to go next

A few directions if you want to push this further:

  • Combinatorial Purged Cross-Validation (López de Prado, Advances in Financial Machine Learning). A more rigorous successor to walk-forward that handles overlapping label horizons and gives a distribution of out-of-sample Sharpes instead of a single number.
  • vectorbt has a from_walk_forward helper that runs the same logic ~100× faster on large grids, useful when you move from a 16-cell grid to a 10,000-cell one.
  • Bayesian optimization with scikit-optimize or optuna, instead of grid search, when each backtest is expensive.
  • Block bootstrap of the walk-forward returns to get a confidence interval on the final Sharpe — a 0.4 ± 0.3 reads very differently from a 0.4 ± 0.05.

Conclusion

Walk-forward optimization will not make a bad strategy good. What it will do is stop a bad strategy from looking good, which is more valuable than it sounds: it ends the cycle of building, deploying, and being surprised. The first time you walk-forward a strategy you were proud of and watch its Sharpe halve, it stings. It also saves you from the much more expensive version of that lesson, the one the market teaches with real money.

Trade safe, and remember: a backtest that doesn’t disappoint you is probably lying to you.