5–15% ROR from Backtesting Intraday Low-Frequency Trading on 10 Fortune 500 Stocks Last Week.

Emad Ezzeldin ,Sr. DataScientist@UnitedHealthGroup

11 min readFeb 24, 2024

Step-by-step beginner guide to low-frequency Intraday trading with end-to-end Python pipeline from Data Acquisition, Machine Learning, and finally Backtesting and Strategy Evaluation.

Data: One Month of 1 min data Yahoo Finance Python API 6K minutes 9 am- 3 pm Trading Window, Train/Test split: 0.5 without shuffling (most time recent 3K test), Feature Processing: Min-MaxScaling Expanding Window, Feature Engineering: 5 Lags Predicting 5 Steps forward and Lags cannot be from a previous day. Trading Strategy: Compounding Re-investment-based Trading Strategy 5 min Trading Frequency.

Pipeline manually ran for 10 Fortune 500 stocks — 5–15% ROR Backtesting Test Data.

Outline

1- Data Acquisition

2- Feature Processing

3- Feature Engineering

4- ML

5- Backtesting

5.1- ML Output Processing

5.2- Extracting Test Data

5.3- Add Dummy

5.4- Specify Trading Frequency

5.5- Backtesting

5.5.1- ML Trading

5.5.2- Ideal Trading

5.5.3- Random Trading

6- Conclusion

1- Data Acquisition

import yfinance as yf

# Define the ticker symbol
tickerSymbol = 'tsla'

# Get data on this ticker
tickerData = yf.Ticker(tickerSymbol)

# Get the historical prices for this ticker
# '1m' interval for minute-by-minute data, adjust 'start' and 'end' as needed

#tickerDf1 = tickerData.history(period='1d', start='2024-01-22', end='2024-01-26', interval='1M')
tickerDf2 = tickerData.history(period='1d', start='2024-01-29', end='2024-02-02', interval='1M')
tickerDf3 = tickerData.history(period='1d', start='2024-02-05', end='2024-02-09', interval='1M')
tickerDf4 = tickerData.history(period='1d', start='2024-02-12', end='2024-02-19', interval='1M')
tickerDf1 = tickerData.history(period='1d', start='2024-02-19', end='2024-02-23', interval='1M')

# Display the data
print(tickerDf1 [["Close"]].head())
print(tickerDf2 [["Close"]].head())
print(tickerDf3 [["Close"]].head())
print(tickerDf4 [["Close"]].head())

# Display the data
print(tickerDf1 [["Close"]].tail())
print(tickerDf2 [["Close"]].tail())
print(tickerDf3 [["Close"]].tail())
print(tickerDf4 [["Close"]].tail())

import pandas as pd
import numpy as np

data = pd.concat ([tickerDf1["Close"],tickerDf2["Close"],tickerDf3["Close"],tickerDf4["Close"]]).reset_index ()
data = pd.concat ([tickerDf1["Close"],tickerDf2["Close"],tickerDf3["Close"],tickerDf4["Close"]]).reset_index ()
data ["Datetime"] = pd.to_datetime (data ["Datetime"])
data              = data.sort_values ("Datetime").reset_index().drop ("index",axis = 1)
data ["time"]     = data.index
data.head()
Datetime Close time
0 2024-01-29 09:30:00-05:00 185.639999 0
1 2024-01-29 09:31:00-05:00 185.098404 1
2 2024-01-29 09:32:00-05:00 184.977005 2
3 2024-01-29 09:33:00-05:00 185.017700 3
4 2024-01-29 09:34:00-05:00 185.553696 4

2- Feature Processing

def normalize(window):
    #print (window)
    min_val = window.min()
    #print (min_val)
    max_val = window.max()
    #print (max_val)
    #print (window - min_val)
    normalized = (window - min_val)  / (max_val - min_val)
    #print (normalized)
    return normalized.values [-1]


def reverse_normalize(data):
    data ["reverse_close_scaled"] = np.NaN
    min = data ["Close"].min()
    max = data ["Close"].max()
    max_minus_min = max - min

    for i in range (4,data.shape [0]):
        data ["reverse_close_scaled"].iloc [i] = (data ["close_scaled"].iloc [i] * max_minus_min ) + min
    return data


data ["close_scaled"] = data['Close'].expanding(5).apply( normalize , raw=False)
data
Datetime Close time close_scaled
0 2024-01-29 09:30:00-05:00 185.639999 0 NaN
1 2024-01-29 09:31:00-05:00 185.098404 1 NaN
2 2024-01-29 09:32:00-05:00 184.977005 2 NaN
3 2024-01-29 09:33:00-05:00 185.017700 3 NaN
4 2024-01-29 09:34:00-05:00 185.553696 4 0.869827
... ... ... ... ...
6232 2024-02-22 15:55:00-05:00 197.720001 6232 0.812595
6233 2024-02-22 15:56:00-05:00 197.610199 6233 0.808640
6234 2024-02-22 15:57:00-05:00 197.470001 6234 0.803590
6235 2024-02-22 15:58:00-05:00 197.300003 6235 0.797467
6236 2024-02-22 15:59:00-05:00 197.380005 6236 0.800349
6237 rows × 4 columns


data = reverse_normalize(data)
data
Datetime Close time close_scaled reverse_close_scaled
0 2024-01-29 09:30:00-05:00 185.639999 0 NaN NaN
1 2024-01-29 09:31:00-05:00 185.098404 1 NaN NaN
2 2024-01-29 09:32:00-05:00 184.977005 2 NaN NaN
3 2024-01-29 09:33:00-05:00 185.017700 3 NaN NaN
4 2024-01-29 09:34:00-05:00 185.553696 4 0.869827 199.308970
... ... ... ... ... ...
6232 2024-02-22 15:55:00-05:00 197.720001 6232 0.812595 197.720001
6233 2024-02-22 15:56:00-05:00 197.610199 6233 0.808640 197.610199
6234 2024-02-22 15:57:00-05:00 197.470001 6234 0.803590 197.470001
6235 2024-02-22 15:58:00-05:00 197.300003 6235 0.797467 197.300003
6236 2024-02-22 15:59:00-05:00 197.380005 6236 0.800349 197.380005
6237 rows × 5 columns

data.plot ("time" , "close_scaled")

3- Feature Engineering

df = data.copy ()
df ["target"] = df ["close_scaled"]
lag_cols  = {}
dist_cols = {}
lags      = 5
gap       = 5


for i in range (1,lags):

    lag_cols ["lag_{}".format (i)]    = df ["target"].shift (i+gap)

    df ["datetime_lag_{}".format (i)] = df ["Datetime"].shift (i+gap)
    df ["datetime_dist_{}".format (i)]= df ["Datetime"] -    df ["datetime_lag_{}".format (i)]

    dist_cols ["dist_{}".format (i)]         = df ["datetime_dist_{}".format (i)].apply (lambda x : x.seconds if x.seconds <= ((lags+gap)*120) else np.NaN )

    df                                = df.drop ("datetime_dist_{}".format (i),axis =1 )
    df                                = df.drop ("datetime_lag_{}".format (i),axis =1 )

lags_df = pd.DataFrame (lag_cols)
dist_df = pd.DataFrame (dist_cols).apply (pd.to_numeric)
dist_df_scaled = dist_df.copy()


for c in dist_df.columns : dist_df_scaled [c] = (dist_df [c] - dist_df.min().min() ) / ( ( (lags+gap) * 120 ) - dist_df.min().min()   )


df      = pd.concat ([ df , lags_df, dist_df_scaled],axis =1)


interact_cols = {}
for i in range (1,lags): interact_cols ["interact_{}".format (i)] = df ["lag_{}".format (i)] * df ["dist_{}".format (i)]
inter_df = pd.DataFrame (interact_cols)
df = pd.concat ([df,pd.DataFrame (interact_cols)],axis =1)

df 
Datetime Close time close_scaled reverse_close_scaled target lag_1 lag_2 lag_3 lag_4 dist_1 dist_2 dist_3 dist_4 interact_1 interact_2 interact_3 interact_4
0 2024-01-29 09:30:00-05:00 185.639999 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 2024-01-29 09:31:00-05:00 185.098404 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 2024-01-29 09:32:00-05:00 184.977005 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 2024-01-29 09:33:00-05:00 185.017700 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 2024-01-29 09:34:00-05:00 185.553696 4 0.869827 199.308970 0.869827 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
6232 2024-02-22 15:55:00-05:00 197.720001 6232 0.812595 197.720001 0.812595 0.800349 0.806472 0.806065 0.812595 0.0 0.071429 0.142857 0.214286 0.0 0.057605 0.115152 0.174127
6233 2024-02-22 15:56:00-05:00 197.610199 6233 0.808640 197.610199 0.808640 0.814936 0.800349 0.806472 0.806065 0.0 0.071429 0.142857 0.214286 0.0 0.057168 0.115210 0.172728
6234 2024-02-22 15:57:00-05:00 197.470001 6234 0.803590 197.470001 0.803590 0.808453 0.814936 0.800349 0.806472 0.0 0.071429 0.142857 0.214286 0.0 0.058210 0.114336 0.172815
6235 2024-02-22 15:58:00-05:00 197.300003 6235 0.797467 197.300003 0.797467 0.810073 0.808453 0.814936 0.800349 0.0 0.071429 0.142857 0.214286 0.0 0.057747 0.116419 0.171503
6236 2024-02-22 15:59:00-05:00 197.380005 6236 0.800349 197.380005 0.800349 0.809353 0.810073 0.808453 0.814936 0.0 0.071429 0.142857 0.214286 0.0 0.057862 0.115493 0.174629
6237 rows × 18 columns

df = df.dropna()
df
Datetime Close time close_scaled reverse_close_scaled target lag_1 lag_2 lag_3 lag_4 dist_1 dist_2 dist_3 dist_4 interact_1 interact_2 interact_3 interact_4
13 2024-01-29 09:43:00-05:00 185.470001 13 0.403951 186.374660 0.403951 0.615211 0.235731 0.000000 0.869827 0.0 0.071429 0.142857 0.214286 0.0 0.016838 0.000000 0.186392
14 2024-01-29 09:44:00-05:00 184.935196 14 0.130492 178.782502 0.130492 0.965732 0.615211 0.235731 0.000000 0.0 0.071429 0.142857 0.214286 0.0 0.043944 0.033676 0.000000
15 2024-01-29 09:45:00-05:00 184.932098 15 0.128908 178.738529 0.128908 1.000000 0.965732 0.615211 0.235731 0.0 0.071429 0.142857 0.214286 0.0 0.068981 0.087887 0.050514
16 2024-01-29 09:46:00-05:00 184.927002 16 0.126302 178.666179 0.126302 0.255975 1.000000 0.965732 0.615211 0.0 0.071429 0.142857 0.214286 0.0 0.071429 0.137962 0.131831
17 2024-01-29 09:47:00-05:00 184.949997 17 0.138060 178.992619 0.138060 0.403896 0.255975 1.000000 0.965732 0.0 0.071429 0.142857 0.214286 0.0 0.018284 0.142857 0.206942
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
6232 2024-02-22 15:55:00-05:00 197.720001 6232 0.812595 197.720001 0.812595 0.800349 0.806472 0.806065 0.812595 0.0 0.071429 0.142857 0.214286 0.0 0.057605 0.115152 0.174127
6233 2024-02-22 15:56:00-05:00 197.610199 6233 0.808640 197.610199 0.808640 0.814936 0.800349 0.806472 0.806065 0.0 0.071429 0.142857 0.214286 0.0 0.057168 0.115210 0.172728
6234 2024-02-22 15:57:00-05:00 197.470001 6234 0.803590 197.470001 0.803590 0.808453 0.814936 0.800349 0.806472 0.0 0.071429 0.142857 0.214286 0.0 0.058210 0.114336 0.172815
6235 2024-02-22 15:58:00-05:00 197.300003 6235 0.797467 197.300003 0.797467 0.810073 0.808453 0.814936 0.800349 0.0 0.071429 0.142857 0.214286 0.0 0.057747 0.116419 0.171503
6236 2024-02-22 15:59:00-05:00 197.380005 6236 0.800349 197.380005 0.800349 0.809353 0.810073 0.808453 0.814936 0.0 0.071429 0.142857 0.214286 0.0 0.057862 0.115493 0.174629
6089 rows × 18 columns

X = df.drop (["Datetime" , "Close" , "close_scaled" , "target"  ,"reverse_close_scaled","time"],axis =1)
print (X.head())
print (X.tail())
print (X.shape)
y = df ["target"]
print ()
print (y.head())
print (y.shape)

lag_1     lag_2     lag_3     lag_4  dist_1    dist_2    dist_3  \
13  0.615211  0.235731  0.000000  0.869827     0.0  0.071429  0.142857   
14  0.965732  0.615211  0.235731  0.000000     0.0  0.071429  0.142857   
15  1.000000  0.965732  0.615211  0.235731     0.0  0.071429  0.142857   
16  0.255975  1.000000  0.965732  0.615211     0.0  0.071429  0.142857   
17  0.403896  0.255975  1.000000  0.965732     0.0  0.071429  0.142857   

      dist_4  interact_1  interact_2  interact_3  interact_4  
13  0.214286         0.0    0.016838    0.000000    0.186392  
14  0.214286         0.0    0.043944    0.033676    0.000000  
15  0.214286         0.0    0.068981    0.087887    0.050514  
16  0.214286         0.0    0.071429    0.137962    0.131831  
17  0.214286         0.0    0.018284    0.142857    0.206942  
         lag_1     lag_2     lag_3     lag_4  dist_1    dist_2    dist_3  \
6232  0.800349  0.806472  0.806065  0.812595     0.0  0.071429  0.142857   
6233  0.814936  0.800349  0.806472  0.806065     0.0  0.071429  0.142857   
6234  0.808453  0.814936  0.800349  0.806472     0.0  0.071429  0.142857   
6235  0.810073  0.808453  0.814936  0.800349     0.0  0.071429  0.142857   
6236  0.809353  0.810073  0.808453  0.814936     0.0  0.071429  0.142857   

        dist_4  interact_1  interact_2  interact_3  interact_4  
6232  0.214286         0.0    0.057605    0.115152    0.174127  
6233  0.214286         0.0    0.057168    0.115210    0.172728  
6234  0.214286         0.0    0.058210    0.114336    0.172815  
6235  0.214286         0.0    0.057747    0.116419    0.171503  
6236  0.214286         0.0    0.057862    0.115493    0.174629  
(6089, 12)

13    0.403951
14    0.130492
15    0.128908
16    0.126302
17    0.138060
Name: target, dtype: float64
(6089,)

import matplotlib.pyplot as plt

plt.plot ([i for i in range (y.shape [0] )] , (y   ) )

4- Machine Learning

import numpy as np
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin, clone
from sklearn.ensemble import RandomForestRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error, make_scorer
from sklearn.model_selection import RandomizedSearchCV, TimeSeriesSplit, cross_val_score
from scipy.stats import randint
from random import randrange
from sklearn.model_selection import KFold

model_NN = MLPRegressor(
    hidden_layer_sizes=(250, 200, 100, 100,100),  # More layers and different numbers of neurons
    activation='tanh',  # Different activation function
    solver='sgd',  # Different optimizer
    learning_rate_init=0.01,  # Initial learning rate
    learning_rate='adaptive',  # Adaptive learning rate
    max_iter=1000,  # More iterations
    batch_size=64,  # Smaller batch size
    alpha=0.0001,  # L2 regularization
    early_stopping=True,  # Enable early stopping
    n_iter_no_change=20,  # Patience for early stopping
    random_state=0
)

c = -3000
model_NN.fit (X.iloc [:c,:] ,y.iloc [:c])
y_pred  = model_NN.predict (X)

fig, ax = plt.subplots()

ax.plot([i for i in range (y[:c].shape [0]) ], y[:c] )
ax.plot([i for i in range (y[:c].shape [0]) ], y_pred[:c] ,color ="red")
plt.show()

fig, ax = plt.subplots()

ax.plot([i for i in range (y[c:].shape [0]) ], y[c:] )
ax.plot([i for i in range (y[c:].shape [0]) ], y_pred[c:] ,color ="red")
plt.show()

5- Backtesting

5.1 ML Output Processing

df1 = pd.DataFrame ({"close_lag": X ["lag_1"]  , "close":y , "close_future_prediction":y_pred})
df1
 close_lag close close_future_prediction
13 0.615211 0.403951 0.437628
14 0.965732 0.130492 0.755104
15 1.000000 0.128908 0.881142
16 0.255975 0.126302 0.491102
17 0.403896 0.138060 0.498690
... ... ... ...
6232 0.800349 0.812595 0.786757
6233 0.814936 0.808640 0.795245
6234 0.808453 0.803590 0.791639
6235 0.810073 0.797467 0.794571
6236 0.809353 0.800349 0.792970
6089 rows × 3 columns

def reverse_normalize(df,df1):
    for col in df1.columns :

      min = df ["Close"].min()
      max = df ["Close"].max()
      max_minus_min = max - min

      for i in range (df1.shape [0]):
          df1 [col].iloc [i] = (df1 [col].iloc [i] * max_minus_min ) + min
    return df1

#def Normalize_data_daily ()

df1  = reverse_normalize(df,df1)
df1.tail ()
 close_lag close close_future_prediction
6232 197.380005 197.720001 197.002643
6233 197.785004 197.610199 197.238323
6234 197.604996 197.470001 197.138195
6235 197.649994 197.300003 197.219597
6236 197.630005 197.380005 197.175149
df1.shape 
6089 rows × 3 columns

5.2 Extracting Test Data

c2  = -3000
df1 = df1.iloc [c2:,:]
df1 
close_lag close close_future_prediction
6232 197.380005 197.720001 197.002643
6233 197.785004 197.610199 197.238323
6234 197.604996 197.470001 197.138195
6235 197.649994 197.300003 197.219597
6236 197.630005 197.380005 197.175149
df1.shape 
3000 rows × 3 columns

5.3 Add Dummy Prediction for Performance Reference

df1 ["close_future_dummy_prediction"] = df1 ["close_lag"] + [np.random.randint(2000) for _ in range (df1.shape [0]) ]

5.4 Trading Frequency

trading_frequency = 5 # Trade Every 5 Minutes

df1        = pd.concat ( [df1.iloc [i,:] for i in range (0,df1.shape[0],trading_frequency)] ,axis=1).T
df1.head()
 close_lag close close_future_prediction close_future_dummy_prediction
3174 197.441387 197.500804 197.126962 1281.441387
3179 197.467793 198.486560 197.121485 1492.467793
3184 198.679534 198.286174 197.855233 723.679534
3189 198.130421 198.213585 197.769216 1187.130421
3194 198.365390 197.176207 197.836965 1971.365390
df1.tail ()
close_lag close close_future_prediction close_future_dummy_prediction
6212 197.660004 197.764999 197.242132 1082.660004
6217 197.690002 198.267593 197.184165 392.690002
6222 198.125793 197.710007 197.564658 2029.125793
6227 197.857193 197.785004 197.412925 1575.857193
6232 197.380005 197.720001 197.002643 431.380005
df1.shape 
(600, 4)

5.5 Trading Strategy and Backtesting

#If predicted future value higher than current plus threshsold , then Trade.
threshold       = 0.1 # $0.1
initial_capital = 10000 # $10000

5.5.1 ML Performance

def automatic_trading(df, initial_capital=10000):
    # Initialize investment account
    capital = initial_capital
    shares = 0

    capital_list = [initial_capital]
    shares_list  = [0]
    value_list   = []
    #threshold    = 5

    for index, row in df.iterrows():
        if row['close_future_prediction'] > ( row['close_lag'] + threshold ) :
            # Buy condition
            if capital > 0:
                shares += (capital  / row['close_lag'])   # Invest all capital into shares
                capital = 0

        else:
            # Sell condition
            if shares > 0:
                capital += shares * row['close_lag']  # Convert all shares back to capital
                shares = 0
        shares_list.append (shares)
        capital_list.append (capital)


        current_value = capital + shares * row ['close_lag']
        value_list.append (current_value)


        # Compound reinvestment is implicitly handled as all capital/shares are always fully invested

    # Calculate the final value (remaining capital + value of shares)
    final_value = capital + shares * df.iloc[-1]['close_lag']

    return final_value , capital_list , shares_list , value_list

automatic_trading(df1, initial_capital=10000) [0]
10647.997933121447

fig, ax = plt.subplots()

v = automatic_trading(df1, initial_capital=10000) [-1]
ax.plot([i for i in range (len (v)) ], v )
plt.show()

5.5.2 Ideal Performance

If Future Close Value was known while Trading!

def automatic_trading(df, initial_capital=10000):
    # Initialize investment account
    capital = initial_capital
    shares = 0

    capital_list = [initial_capital]
    shares_list  = [0]
    value_list   = []

    for index, row in df.iterrows():
        if row['close'] > ( row['close_lag'] + threshold ):
            # Buy condition
            if capital > 0:
                shares += (capital  / row['close_lag'])   # Invest all capital into shares
                capital = 0

        else:
            # Sell condition
            if shares > 0:
                capital += shares * row['close_lag']  # Convert all shares back to capital
                shares = 0
        shares_list.append (shares)
        capital_list.append (capital)


        current_value = capital + shares * row ['close_lag']
        value_list.append (current_value)


        # Compound reinvestment is implicitly handled as all capital/shares are always fully invested

    # Calculate the final value (remaining capital + value of shares)
    final_value = capital + shares * df.iloc[-1]['close_lag']

    return final_value , capital_list , shares_list , value_list

automatic_trading(df1, initial_capital=10000) [0]
18256.368816967864

fig, ax = plt.subplots()
v = automatic_trading(df1, initial_capital=10000) [-1]
ax.plot([i for i in range (len (v)) ], v )
plt.show()

5.5.3 Random Performance


def automatic_trading(df, initial_capital=10000):
    # Initialize investment account
    capital = initial_capital
    shares = 0

    capital_list = [initial_capital]
    shares_list  = [0]
    value_list   = []

    for index, row in df.iterrows():
        if row['close_future_dummy_prediction'] > ( row['close_lag'] + threshold ):
            # Buy condition
            if capital > 0:
                shares += (capital  / row['close_lag'])   # Invest all capital into shares
                capital = 0

        else:
            # Sell condition
            if shares > 0:
                capital += shares * row['close_lag']  # Convert all shares back to capital
                shares = 0
        shares_list.append (shares)
        capital_list.append (capital)

        current_value = capital + shares * row ['close_lag']
        value_list.append (current_value)


        # Compound reinvestment is implicitly handled as all capital/shares are always fully invested

    # Calculate the final value (remaining capital + value of shares)
    final_value = capital + shares * df.iloc[-1]['close_lag']

    return final_value , capital_list , shares_list , value_list

automatic_trading(df1, initial_capital=10000) [0]
9996.891109188578

fig, ax = plt.subplots()

v = automatic_trading(df1, initial_capital=10000) [-1]
ax.plot([i for i in range (len (v)) ], v )
plt.show()

6- Conclusion

Well, ML performance on test data gave $10600 ROI of $10K, while ideal performance is $18K, and random $9996. ML/Ideal ROI ratio is (ML– random /Ideal-random) (10600–9996 / 18000–9996) 7.5% of the maximum possible performance.

5–15% ROR from Backtesting Intraday Low-Frequency Trading on 10 Fortune 500 Stocks Last Week.

Pipeline manually ran for 10 Fortune 500 stocks — 5–15% ROR Backtesting Test Data.

Outline

1- Data Acquisition

2- Feature Processing

3- Feature Engineering

4- Machine Learning

5- Backtesting

5.1 ML Output Processing

5.2 Extracting Test Data

5.3 Add Dummy Prediction for Performance Reference

5.4 Trading Frequency

5.5 Trading Strategy and Backtesting

5.5.1 ML Performance

5.5.2 Ideal Performance

5.5.3 Random Performance

6- Conclusion

Written by Emad Ezzeldin ,Sr. DataScientist@UnitedHealthGroup