Financial-ML:

Predict S&P 500 Outperformers with ML

A machine learning system for equity selection that predicts which S&P stocks will outperform the market benchmark. The project combines market data, company fundamentals and sentiment indicators to construct long-only portfolios, using through a rigorous ML pipeline: data ingestion → feature engineering → expanding-window cross-validation → portfolio construction → comprehensive backtesting against SPY.

Key Results

Metric Value
Sharpe Ratio 0.93
Annual Return 20.2%
Max Drawdown -22.9%
Alpha vs Random 1.72%
Win Rate 69.8%
Cumulative Portfolio Returns vs SPY
Cumulative Portfolio Returns vs SPY

Data & Features

Data Source Features Purpose
Market Data
Yahoo Finance
r12 (12m return)
mom121 (momentum)
vol3, vol12 (volatility)
Momentum and risk regime signals
Fundamentals
SEC EDGAR
BookToMarket
ROE, ROA, NetMargin
Leverage, Asset Growth, Net Share Issuance
Value, quality, and financial health
Sentiment
VIX Index
VIX percentile (12-month rolling) Market stress detection

Models Evaluated

Why Random Forest won: It maintains a wider range of confidence scores, making it better at ranking stocks. Boosting methods were too aggressive and didn't improve results.

How It Works

The Pipeline

  1. Collect Data: Download market prices (Yahoo Finance), fundamentals from SEC filings, and VIX sentiment
  2. Engineer Features: Extract 13 features from the data (momentum, volatility, value ratios, regime indicators)
  3. Train Model: Use 15 years of historical data with time series cross-validation to prevent overfitting
  4. Generate Predictions: Random Forest model predicts which stocks will outperform S&P 500 next month
  5. Construct Portfolio: Select top 10% highest-confidence stocks, equally weight, rebalance monthly
  6. Validate Results: Backtest against S&P 500 benchmark with real transaction costs

Model Validation

The model uses expanding-window time series cross-validation, where each month's predictions are tested on future data the model has never seen. This prevents overfitting and mimics real-world deployment.


Quick Start

Installation

git clone https://github.com/pmatorras/financial-ml.git
cd financial-ml
python -m venv .venv
source .venv/bin/activate
pip install -e .

Run Full Pipeline

make data     # Collect market, fundamentals, sentiment
make train    # Train models
make backtest # Analyze and backtest

Quick Test

make test    # Run pipeline on subset with debug mode


For advanced usage, flags, and development modes, see the GitHub repository

← Back to Home