Methodology
The shared framework across all three strategies. Details specific to each one live on their own page.
A backtest can lie. With enough adjustable parameters, you can always find a combination that gives a beautiful historical track record on past data: that's the overfitting trap. The real challenge is separating strategies that have a chance to work in the future from those that just got lucky on history.
Frozen calibration, live observation
Each strategy displays a calibration date at the top of its page. Beyond that date, no parameter is modified: everything that happens afterwards is observed in live conditions. We call that zone "out-of-sample" (OOS). It is what actually validates the strategy, in contrast to the "in-sample" period where the rules were tuned.
A strategy that outperforms in-sample but disappoints OOS is probably overfit to past data. That is the main risk of backtesting. The performance shown here juxtaposes both periods so that you can judge for yourself.
SPX & Gold discipline
SPX & Gold is the centrepiece of the setup: it drives the allocation across equities, gold and bonds. Naturally, it has also been put through the most comprehensive battery of safeguards. Nine independent statistical tests each target a specific form of overfitting. No mechanism is adopted until it has passed all of them.
- Nested cross-validation. Rather than evaluating performance on the full dataset (which was used to choose the parameters in the first place), the history is split into five independent periods. For each period, the strategy is tuned on the other four and evaluated blind on this one. The reported performance is the average of the five blind evaluations, not the best fit.
- Crash-conditional evaluation. A risk-protective strategy must show its edge during crises, not just on average. Performance is measured separately across five historical stress regimes (dot-com 2000-2002, financial crisis 2007-2009, Q4 2018 volatility, COVID Q1 2020, 2022 bear market). To pass validation, at least four of five regimes must show outperformance vs a passive benchmark.
- Transaction costs included. All displayed performance includes a five basis points round-trip cost model weighted by effective turnover. Prevents optimising a strategy that no longer holds once real brokerage fees apply.
- Multiple testing correction. When testing hundreds of candidates, some will pass statistical thresholds by pure chance. The Romano-Wolf step-down procedure controls the family-wise error rate at 5%: the probability of accepting even a single false positive across all theories is capped at 5%.
- Multiple out-of-sample windows test. The candidate must perform across eight rolling non-overlapping out-of-sample windows (2003 → today, offset eighteen months apart). Avoids overfitting to a single OOS period: if the strategy only holds on one or two of these windows, it is rejected.
- Leave-one-crash-out test (LOCO). For each historical crash, the strategy is re-trained excluding that specific crash from the training set, then evaluated blind on the excluded window. If performance holds on unseen crashes, that is direct evidence the strategy isn't overfit to the five specific known crashes.
- Cross-market check on four foreign indices. The same signals are applied to four foreign indices (UK, Germany, France, Japan). If the strategy holds on markets it never saw during calibration, that is a strong signal that the mechanisms capture a general economic phenomenon, not a peculiarity of the S&P 500.
- Independent anti-overfitting probes. Several diagnostic tests are run in parallel to the main pipeline, for instance by generating candidates without access to the dates of known historical crises. If conclusions diverge between the probes and the main pipeline, the mechanism is treated as suspect.
- Blind vs canonical mode. Half of candidate theories are generated without reference to any named historical period ("blind" mode), the other half with full access. If both pools converge on similar mechanisms, that is a strong signal the strategy captures a real economic phenomenon rather than an ad hoc pattern.
Stocks discipline
The Stocks strategy does not rely on a parametric backtest but on stock-by-stock fundamental analysis. Rigour comes from a narrow scope, strict separation of analysis steps, and continuous challenge. Here are the main safeguards.
- Defined scope. Selection focuses on US large caps (market capitalisation above $2 billion). No mid or small caps, no non-US stocks, no emerging markets. This scope guarantees liquidity, accounting transparency, and a universe deep enough to analyse thoroughly.
- Four separate analysis tracks. Each stock is evaluated across four distinct topics: business quality, valuation, risks and catalysts, falsification conditions. Each evaluation is produced independently of the others to keep judgments uncontaminated. A consolidation step then arbitrates.
- Business quality assessed without knowing the price. The step that judges business quality explicitly ignores the stock price, market cap, and all market ratios (PE, P/FCF, EV/EBITDA). Separates "is this business good?" from "is it at a reasonable price?". Prevents a high price from rationalising a mediocre business, or a low price from masking a failing business.
- Double evaluation on stocks under thematic pressure. When a stock is exposed to an unfavourable trend (technological substitution, sector disruption), business quality is evaluated twice: first without knowing the pressure, then with. The gap between the two scores measures the business's real robustness against the threat, independent of narrative intuition.
- Recommendations from triangulated review. When it is time to act, two parallel reviews with opposing objectives are produced: one on potential buys while ignoring current positions (prevents after-the-fact justification of what is already owned), the other reviewing each held position against strict capital-protection criteria. A synthesis step then arbitrates the conflicts, without being able to encroach on either review's scope.
- Explicit falsification conditions. Every file lists numeric conditions that would invalidate its thesis (e.g., "if revenue growth falls below 10% year-on-year"). These conditions are automatically re-tested periodically against fresh numbers. This discipline prevents ex-post rationalisation biases on held positions.
- Conditional re-audit and automatic requalification. Files whose numbers no longer fit the original analysis are systematically re-examined. An automatic mechanism also periodically re-reads previously dismissed stocks: if converging positive signals appear (growth recovery, return to profitability, margin recovery), the file is reopened. No favouritism, no deliberate forgetting.
Bitcoin discipline
This strategy rests on a single core mechanism whose quality only emerges through time and through variant testing. The discipline consists of stressing this mechanism across different time windows, measuring the degradation when a component is removed, and tracking how many alternatives have been tried without success.
- Mechanism centred on the notion of bands. The decision engine relies primarily on adaptive trend bands that follow the price and determine entries and exits. Everything else (macro filters, exit thresholds) is grafted around this skeleton.
- Walk-forward validation across 11 windows. The strategy is re-evaluated on 11 independent rolling windows (24-month train, 6-month test). A modification must improve cumulative return AND remain stable across at least 55% of these windows to be adopted.
- Systematic ablation tests. Each macro component grafted onto the strategy has been tested by removing it individually. All degrade results when removed. Every component is therefore justified by its measured contribution.
- Over 300 variants tested. More than 300 variants have been tested across hyperparameters, filters, exits, timeframes and other settings. None survive walk-forward stability validation. The current parametrisation is a robust local optimum, not the best fit of an exhaustive search.
- Dual-engine validation. Two independent implementations of the decision engine run in parallel. No modification is adopted until both engines converge on the same signals. Divergences flag a bug or a numerical artefact before it contaminates a decision.
What these strategies are NOT
- They are not day-trading or scalping. Positions typically last several weeks to several months.
- No leverage, no shorts, no options, no margin. Cash spot buys and sells only.
- No short-term macro predictions. No position is taken by speculating on the next central-bank decision or an upcoming macro data release.
- No hidden "secret sauce". The method is openly articulated, without revealing the exact parameters that constitute the competitive edge.
- Not a performance guarantee. These are historical results published transparently, not a promise about the future.
- No personalised advice, no tax optimisation. This site presents strategies and their public results, not recommendations tailored to your situation.