Quantitative Trading

A field of trading systems.

Quantitative Trading is a field of trading systems that involves the use of algorithms, and complex mathematical formulations, to automate the trading (buy and sell) signals. There are various ways a computer is used to find profitable trades. 

These algorithms can be divided into two parts - Machine Learning algorithms and High-Frequency trading Arbitrage strategies

Mathematicians and Statisticians come up with various alphas or financial models, backtest them over historical data to generate output, and analyze the strategy. 

Some commonly used metrics to understand the efficiency of a strategy are the Sharpe ratio, Drawdown, Profit factor, and percentage of profitable trades. 

To backtest a strategy, various platforms and software are used. Backtesting Platforms are the platforms/software that helps backtest simple strategies, portfolio optimization, futures testing, etc. 

This way the algorithms' hyperparameter can be tuned for maximum profits. Here are some of the most commonly used backtesting platforms. 

  1. TradingView
  2. MetaTrader 5
  3. AmiBroker
  4. MetaStock
  5. NinjaTrader
  6. MATLAB
  7. Python
  8. Excel

Let us understand the difference between the two and how they are used to generate profits.

Machine Learning Algorithms:

As the term suggests, the Machine learns patterns in the trend. 

The researcher inputs historical data also known as time-series data, the machine tries to find a mathematical mapping between the date and the price of the stock (it can be closing, opening, low, or high; closing is the most frequently used price data). 

Based on trained past data, the machine tries to predict future outcomes. The Machine learning model is always back-tested before usage. The Machine learning model might need to be updated regularly, as the seasonality, and trend cycles may change over a period of time. 

High-Frequency Strategies:

High-frequency trading strategies use the concept of arbitrage. Let us first understand the term arbitrage before proceeding further. Arbitrage is an investment strategy that involves buying and selling of assets on different exchanges, that small difference constitutes profit for the investor. 

Since these are high-frequency signals, these small profits add up to millions for investors. 

Quantitative Trading: Machine Learning Algorithms

Machine algorithms

Following are the most commonly used Machine Learning Algorithms for time series analysis:

  1. Recurrent Neural Network
  2. Long Short term Memory

Classic Methods

  1. Multi-Layer Perceptron
  2. ARIMA
  3. Bayesian Neural Network
  4. Radial Basis Functions Neural Networks
  5. Generalized Regression Neural Network
  6. K-Nearest Neighbor
  7. CART Regression Trees
  8. Support Vector Regression
  9. Gaussian Processes

Topical Methods

  1. Convolutional Neural Network
  2. Attention Mechanism
  3. Transformer Neural Network
  4. Kaggle
  5. LightGBM
  6. Decision Trees
  7. XGBoost
  8. AdaBoost

Challenges in Time Series Analysis:

  1. One of the major challenges from a mathematical point of view is the data or the sample size available. For any data, several samples can be generated, but time series data for a particular stock will remain unique, hence the sample size is one.
  2. The other major challenge from a mathematical point of view is for a time series, we capture only a particular cycle or trend in the market and not the complete cycle from the very beginning. 
  3. Many time series models work as a black box. No one can understand mathematical formulation or any correlation in many of the models, hence there is no one to supervise the results.

Recurrent Neural Network:

This Network maintains time-to-time internal data (state) and hence can handle temporal dependence of data. 

Long Short term Memory:

LSTMs are a modified version of RNNs. They take into consideration the temporal dynamics of sequential data and take care of the large gradients. Thus, it resolves the issue of an outlier (or a short-term change) in the time series data. 

The above two algorithms are one of the most important algorithms as they train from a temporal perspective, hence giving better results. 

These time steps can be converted into a vector input of one variable. This is how a Multi-Layer perceptron works. Hence in some autoregression models, MLP outperforms LSTM. 

ARIMA

ARIMA stands for Auto-Regressive Integrated Moving Average model. This model gives a mathematical formula followed in the time series. Hence, it is one of the most popular time series models. 

Looking at the mathematical formula of the trend one can understand the market at a deeper level.

For forecasting a non-linear time series, there are a series of algorithms like Generalized Regression Neural Network, RBF neural Network, Bayesian Neural Network, etc. 

These algorithms use function approximation theory, which uses all kinds of functions such as polynomials, Fourier, finite elements, etc for approximation. Support Vector Machine algorithms can also train non-linear time series. 

CART Regression Trees:

CART Regression Trees can produce accurate results based on input, as they take care of common issues with ML algorithms such as overfitting. However, they cannot detect huge gradients or drifts. 

Random Forest and Decision Trees: 

Random Forest is an ensemble method consisting of decision trees. Each decision tree is either a subset of data or a subset of the factors considered for modeling. Random Forest similar to CART regression trees can take care of common machine learning model issues. 

High-Frequency Trading Strategies

There are various types of arbitrage strategies such as Latency Arbitrage, Statistical Arbitrage, and index Arbitrage:

Strategy

These Arbitrage strategies rely on speed, and price inefficiencies in the market. Let us understand these strategies in detail.

What is Latency?

Latency is a term used for network or transmission speed. It estimates the time taken for data to travel from one point to another. The major factor determining the latency value is the distance. 

What is Latency Arbitrage?

As we have understood what latency is, let us try to look at how it is used to the advantage by the HFT firms. In the case of trading, there is a distance between the firms' servers and the broker or exchanges' servers. 

Hence to overcome this time delay, firms have spent a huge amount of money on better, faster infrastructure and reduced the latency to its lowest. Due to a decrease in latency, these firms are able to deal in newer prices before the retail investors or other non-quant investors. 

This time lead-in price viewing helps firms gain profits in a fraction of a second. This type of strategy relies on the small profits gained from time disparity between the Quant investors and non-quant investors. 

What is Statistical Arbitrage?

Statistical Arbitrage also known as stat arb, is a technique based on cointegration pairs trading. Cointegration is a concept that involves the use of mean reversion principles along with hedging strategies. 

For pairs trading, stocks of similar companies (i.e., of the same sector/industry) are taken and hedged against each other. 

If there is a possibility of one company's increase in market share, Quant will open a long position in that company and a short position for its competitors. Mean reversion analysis is applied over several stocks in diverse portfolios, for a small period of time to reduce exposure. 

This strategy includes the use of a predetermined stock portfolio with minimal risk. 

For statistical Arbitrage, correlation values between various stocks are calculated, and highly correlated stocks are taken as pairs.

These strategies are not risk-free, hence they are combined with other High-frequency trading algorithms. This way HFTs can take advantage of small price deviations.  

Statistical Arbitrage includes CVaR portfolio optimization. CVaR stands for conditional value-at-risk. This quantity helps measure the risk of the complete portfolio, it is an extended version of value-at-risk. (1 - CVaR score) gives the worst risk exposure possible for the portfolio.

Trading strategy

What is Index Arbitrage?

As the term suggests, these strategies utilize the price discrepancy between market indices. Generally, the market indices traded at two different exchanges will have a small price difference, this is where index arbitrage comes into play. 

The other possibility is if there are two different market indices with a similar standard value. If the price deviates between the two, index arbitrage can be used. 

One of the most upcoming analyses is sentiment analysis on news articles. Quants have been trying to figure out the public's reaction to various types of news articles and how it is going to affect the future trend.

What is Sentiment Analysis?

Sentiment Analysis is also known as Opinion Mining. This Analysis indicates the use of Natural Language processing, text analysis, and computational linguistics to understand and predict the public's emotions. 

Based on pre-built models, a passage is fed to the model and the model predicts or calculates the sentiment score. Generally, the higher sentiment score indicates a better mood or positive reaction, and the lower(can be negative) score indicates a negative or bearish reaction.

Besides sentiment scores, these models also provide different scores for various emotions for a deeper understanding or public reaction towards the market. There is VADER (Valence Aware Dictionary for sentiment Reasoning), a pre-built model for these analyses. 

This model along with the words takes into consideration punctuations, capitalization of words, degree modifiers, and conjunctions to understand the fed passage.

Sentiment analysis explained

Sentiment Analysis requires web scraping to automate news reading. 

Alternative Data Trading Strategies:

  1. Brand Value Factor: Opposed to the popular belief of investing in big brands for stable income, many investors invest in unpopular brands to earn an unpopular premium.

  2. Google Search Strategy: One can use google search to find stocks with the lowest attention to measuring the volatility of the stocks

  3. Newspaper Picture and Text Pessimism: These strategies are like sentiment analysis, they use newspapers to understand investors' pessimism

  4. Technology Momentum: This strategy involves technology stocks that have the potential to innovate, this can be judged based on the number of patents from the company. These intangible assets lead to pricing inefficiencies in the market. Thus, creating opportunities for Quant traders to take advantage of these deviations.

  5. Management Diversity: This strategy is similar to sentiment analysis, instead of emotions, they aim for diversity in upper management. They use NLP over various reports such as annual reports, webpages, etc, to determine the diversity. More diversity leads to better returns. 

  6. Lexical Density of Filings: This strategy again uses Natural Language Processing to determine the amount and quality of information the company disseminates. Determining factors are the verbal and financial content of the company filings. 

Hence, the higher the lexical score, the higher the chances that the company will perform better. 

Quant traders also aim at other markets called Obscure and Small Market. What are small and obscure markets in trading and how do Quants benefit from them?

Stocks, ETFs at Obscure and Small Markets tend to trade at a premium or heavy discounts. This opens up big opportunities for investors. These are called Closed-end funds, they are mutual funds that trade like stocks. 

There are two ways of portfolio or fund management - Active and Passive management. Closed-end funds are actively managed as opposed to ETFs that are passively managed. Active management though has a high risk and ensures higher returns over a short period of time.

Trading strategy alternative

Hence, active management plus heavy discount over Net Asset Value prices lead to high returns (profits) on investment.  

Take note that active management provides higher returns only for a short period of time, for longer periods passive management has been able to produce higher returns. 

The most important part of any machine learning modeling is the cross-validation part. Even for time-series analysis cross-validation is very important before moving further with the financial metrics analysis. 

One of the most commonly used cross-validation methods is k-fold cross-validation. The k in k-fold represents the number of sets the dataset will be divided into for training. 

Suppose we use 5-fold cross-validation, we divide the training dataset into 5 parts, and each part is separately trained and tested on a common test dataset. This way we overcome the issue of overfitting and we can evaluate the model in a more robust way. 

Let us understand another frequently used cross-validation method for backtesting HFT strategies:

Roll forward cross-validation: 

This method as the term suggests uses roll forward similar to moving average for cross-validation, the only requirement is the test dataset should always be after the training dataset. 

We need to keep in mind two things for cross-validation since this is a time series, one shouldn't shuffle the dataset. The other thing to remember is that the test dataset should always come after the training dataset. One cannot train on future datasets and predict past datasets.

Machine Learning Package Course

Everything You Need To Master Algo Trading using Python

To Help You Thrive in One of the Most Future Proof Careers on Wall Street.

Learn More

Researched and authored by Punit Manjani | LinkedIn

Free Resources

To continue learning and advancing your career, check out these additional helpful WSO resources: