Jul 04, 2023

how many X variables do typical quant models use

Hey yall,

Sorry in advance if it's a stupid question, but I'm a recent grad w no experience (applying to grad programs) and I like creating predictive models for fun and practice. BUT A) my models suck and B) cuz comparing that to like RenTech's annual performance, there's definitely a correct way to do it, I'm just clueless. Since quant HFs are known for having massive IT architecture & computing power, I was wondering if anyone from the industry can tell me how many variables a firm might use for a model. Like if done manually I figure it could be up to a hundred, but maybe blackbox ML methods use +1000 variables anywhere from basic macro release data to search engine data, etc of "strange" but statistically significant variables. Because with enough computing power, I'd guess even a million of the strangest variables could be used so long as its proven to be significant.

I can't find the answer online - HFs being pretty secretive to begin with - and I don't have any connections in the industry yet so I'm hoping you could help out. Just super curious.

Thanks!

23 Comments

activism_no_not_ESG

Most Helpful

Just a student, not in the industry, but I am familiar with a few funds and their models. When you speak of variables included in models, I assume you're likely referring to the factor-model approach to asset management. Essentially, when a fund (say BlackRock) has like $20bn to manage using a quantitative strategy, what they'll generally do is build a model that foremost calculates the expected return of a stock. This expected return is how much they expect that stocks price to increase/decrease. They determine this output, expected return, often by using a multivariable linear model of different variables that have been shown to do a good job of predicting the return of a stock. For example, BlackRock may find that the expected return of a stock can generally be predicted by a company's leverage ratio and their earnings yield. In that case, the formula would look something like this, where x is the return of any given stock:

expected return(x) = leverage_ratio * coefficient_1 + earnings_yield * coefficient_2

These coefficients, coefficient_1 and coefficient_2, are called factor loadings and represent a company's sensitivity to each of these coefficients. Say through a simple regression we determine that the values of coefficient_1 and coefficient_2 are -0.005 and 4 and we use Apple which has a leverage ratio of 4.7 and a earnings yield of 3.1%. Therefore, we would calculate the expected return of Apple over the next year as:

expected return(AAPL) = -0.005(4.7) + 4(0.031)

expected return(AAPL) = 10.05%

The model does not have to be linear at all, this was just an example, but many hedge funds and asset managers find the robustness of linear models is quite useful. Now the amount of variables really depends on the approach/philosophy of the manager, but for teams that carefully research each factor, you generally see single or low to mid double digits factors.

Once you have all these factors, you apply your formula for expected returns to the universe of stocks you look at, whether that be just North American equities or worldwide, then use reconcile expected return forecasts with expectations of risk (which is a whole different discussion) to create a diversified portfolio of stocks each with different weightings (which is also a whole different discussion). Folks in the industry please correct me if any of this is wrong, I am still learning too :). Hope this helps.

Quote

Report

Other

VC VP Job Description

junior2012

I think the correct answer of how many variables depends on exactly on the nature of the problem you are trying to solve.

Generally the more independent (effective) data points you have, the more parameters you can reasonably fit.

Ex: Deep learning (with thousands of parameters) might work well if you are predicting minutely returns, but would never work for predicting annual returns.

A linear regression with one RHS variable might work better for predicting annual returns.

Quote

Report

Other

trying_my_best

maybe blackbox ML methods use +1000 variables

I'd guess even a million of the strangest variables could be used so long as its proven to be significant.

There's a thing called overfitting. We're not trying to predict the past, we're trying to predict the future

comparing that to like RenTech's annual performance, there's definitely a correct way to do it

Mind elaborating what you mean by this? Cuz there's no singular one 'correct' way to do things

Quote

Report

Other

High Yield Bonds Rates

VP in PE - LBOs

I took the rentech comment to mean that, by virtue of their performance, there’s evidence that a successful solution (not the only solution or necessarily always correct) can be found, hence its worth asking the question about number of variables (vs rentech having THE solution). I could be wrong though

Quote

Report

Other

trying_my_best

That's fair

Quote

Report

Other

sunny-fz

Nailed it, thank you.

Quote

Report

Other

Factor Market Definition Economics Overview

sunny-fz

Thanks for asking for clarification. I misused the word "correct" - I get that it's not like there's a single discrete correct way (or at least one reasonably achievable), but some people's models and modeling are closer to flawless prediction than others, and RenTech's ways are among the best ones developed yet. It seems like RenTech's ~60% annual return is the gold standard benchmark and I'm nowhere near that but if I improve my skills, my returns would closer and closer resemble that 60% (withholding any freak anomaly events) or exceed it which is improbable anytime soon. I figured the secret to their success is probably not something as universally done as factoring FRED release data, so maybe they just throw in a jillion variables to a HFT algo and let it run which is accomplishable with a supercomputer's worth of in-house computing power. But I also get that a really good backtested model with so many variables might just be overfitting hence my question how many variables typical HFs use cuz right now, my idea of a very successful model's range of Xs is from single digits to infinity...

Quote

Report

Other

trying_my_best

I get you. In that case, why not just do what's comfortable for you? And use the textbook vanilla econometrics best practices for building your models. When I was first starting out - I was overwhelmed, so I took it one step at a time.

1) Use economic intuition and fundamental research to have an initial guess on which variables would be the drivers of my target

2) Plug them in ONE AT A TIME. Vanilla finetuning methods to see how each variable contributes or takes away from explanatory power and goodness of fit. Very vanilla metrics out there, like adjusted R square, RMSE, AIC and BIC, testing joint coefficients and restricted models, you name it. Imo, metrics that punish overfitting and reward parsimony, are more important than blind R square

3) Check for omitted variable bias and model error (heteroskedasticity, autocorrelation etc)

I'd like to chip in my opinion on RenTech too - first of all literally no one knows what they do. And secondly, they're known to be applying a next level type of 'ghost pattern' discrete math that econ guys like us can't even comprehend - meaning that it might not be just plugging variables into multi-factor models, that type of econometric analysis that we can 'get'. I suspect that if it were smtg as tangible as multi-factor modelling, the best quants in the world that MIT and Stanford and other quant firms pump out each year might be able to catch on relatively quickly.

Quant is a spectrum, and I don't think RenTech lies anywhere near our side of the spectrum

Quote

Report

Other

PM in HF - Other

You probably will not like my answer but seems you are overwhelmed and quoting RenTech performance is a sure sign of it.

The way to start is to look at something that is proven out; black-scholes example read the actual theory behind it and start to play with the variables to see what happens. ML/deep/neural networks etc…are just tools in the tool kit you need to see the actual logic behind why building a strong model than be applied to the future.

Quote

Report

Other

Authored by: Certified Hedge Fund Professional - Portfolio ManagerCertified Hedge Fund Pro

sunny-fz

Actually, that's incredibly helpful, thank you. You assumed correctly I am clueless where to begin - and I have a better understanding of how the tools do their thing than how to apply them to finance - so Black-Scholes sounds like a good place to start learning. Any other subjects you suggest I familiarize myself with?

Quote

Report

Other

activism_no_not_ESG

Hahaha Black-Scholes would definitely not be the place to start learning quant. I would start with modern portfolio theory and basic statistics. I'd recommend Active Portfolio Management by Grinold and Kahn or Asset Management by Andrew Ang.

Quote

Report

Other

Heikin Ashi Technique

7 replies

morgantire

I was wondering if anyone from the industry can tell me how many variables a firm might use for a model.

Depends on what you call a variable. Lots of quant strategies (especially in high frequency) work with only one or two variables, like price and volume. But they'll feature engineer that into many different features, like volume-weighted-price, or the moving average price over various time periods, etc. Do you count that as 1-2 variables or as many variables?

Quote

Report

Other

Authored by: Certified Asset Management Professional - PartnerCertified Asset Management Pro

sunny-fz

Ahh I see. I would count each distinct price or volume metric as a variable, so many variables. I forgot HFT primarily uses price and volume, stuff intrinsic to a security, so macroeconomic conditions and data wouldn't be factored in (or are they...?).

Quote

Report

Other

activism_no_not_ESG

At the HFT level, it is pretty much only volume and price because they trade on such short time horizons. For example, knowing a company's PE ratio won't help because it is a data point you often only get on a quarterly basis whereas you can get price and volume by the millisecond

Quote

Report

Other

coffecoffeecoffee

I don't think anyone can quote you a number of variables used. The number of variables and quality are balanced within the bias-variance tradeoff. Quant funds don't really use any blackbox models to trade on. It's essentially just linear regression. If you read active management by kahn, it will provide a decent understanding of the framework that quant funds work under.

Quote

Report

Other

Quant in HF - Other

Nihil ad ratione illo dolorem est. Ut et at aut quia. Minus et ducimus consequatur sit doloremque cum.

Earum aspernatur nulla excepturi similique ut vero reiciendis nulla. In tempore fugit fuga iure consequatur autem cum. Eligendi voluptate quis atque aperiam accusantium. Aspernatur libero fuga qui aliquid suscipit eos. Similique aut laboriosam culpa dicta consectetur unde soluta et.

Eum aliquam possimus sunt et eos corporis. Voluptatem a illo nam saepe rerum sit voluptatem. Magni maxime ut veritatis dolorem molestiae. Dolores harum dolore nihil aut. Necessitatibus illum aspernatur nostrum in possimus nostrum ea.

Quote

Report

Other

Authored by: Certified Hedge Fund Professional - QuantCertified Hedge Fund Pro

+25	No transparency to book and P&L - is this normal?	9	26m
+13	Interview at Macro HF , what should I expect?	3	13h
+13	Would you work at this distressed fund?	4	4h
+12	SM HF to LO as a junior?	3	6h
+11	Feeling unqualified for new job	3	8h
+9	MMHF undergrad intern working banking hours?	3	1d
+9	Fundamental HF case study: what is the model actually testing?	1	5d
+6	Walleye 2027 Roles	0	5d
+6	MMHF case studies	5	3d

1	redever	99.2
2	kanon	99.0
3	Secyh62	99.0
4	BankonBanking	99.0
5	DrApeman	98.9
6	dosk17	98.9
7	Betsy Massar	98.9
8	CompBanker	98.9
9	GameTheory	98.9
10	Jamoldo	98.8

Elite Career Bootcamp. Top Job Offer Guaranteed.

Elite Career Bootcamp. Top Job Offer Guaranteed.

how many X variables do typical quant models use

Elite Career Bootcamp. Top Job Offer Guaranteed.

Elite Career Bootcamp. Top Job Offer Guaranteed.

how many X variables do typical quant models use

See All Comments - 100% Free

Trending Content - Hedge Fund Forum