Regression/ML Modeling in Commodities
Currently delving into a python project to build a fully automatable U.S. S&D model for crude. I'm using the EIA api to quickly pull and filter data but I'm struggling with what data to actually use as inputs for the model. Should I use both supply and demand data as inputs or is just inventories fine? I guess I'm struggling with what the best practice actually is...I know using rolling regressions is somewhat commonplace in S&T at banks but can any traders or analysts comment on what kind of inputs I should be using, what kind of ML model makes the most sense, key things to keep in mind when creating such a model, etc. I don't want to create anything overly complicated just a bit lost on what sort of analysis is actually considered valuable on the trade floor. Thanks!
Start with the variable you are interested in predicting, do some EDA, and find out what variables you have on the supply and demand side that correlate to this variable. This will give you a sense of linear relationships between your target and the data you have.
If you want to look at non-linear relationships among your data then look at the Mutual Independence.
In terms of the type of model, start with a simple linear regression using multiple variables and use this as a benchmark score once you have identified the right variables to include. Then, if you want to and have time, explore using more complex models like trees (CatBoost is a great start).
This is how I would approach this from a data science perspective, where I know nothing of the relationships between target and data. Obviously, with experienced traders around mentoring you, there is the benefit of their input on model creation, but when doing so blindly or when working in a new domain, adopt this type of systematic approach.
Hope this helps and Merry Christmas.
Crude doesn’t really lend itself to a lot of convenient statistical modeling. There are some pockets where there are decent input-output relationships that you can model purely statistically. But much of it involves understanding how oil moves around physically, the internal logic of market participants, and modelling the supply chain by adapting what you know about the commodity. And this can also be done in a systematic way, but not necessarily an ML way.
Do you think it's more worthwhile to focus on building a model to predict each specific variable (US oil demand for example) and throwing that into a larger model to project the forward crude balance or would that be pretty worthless?
That’s generally the standard practice. Try to solve for each component. Add the components up. Then see if the totality makes sense or not. And if not, find what has to change to make the balance feasible.
Odit accusantium fugit nemo quia sunt omnis. Dolorum fuga optio sequi sunt voluptatibus qui saepe provident.
Quasi adipisci numquam eum esse facilis nemo officiis odio. Hic debitis laborum nobis architecto. Est nihil aut quo aut. Animi perspiciatis voluptatum dolor ut ut illum corrupti. Minus natus ea ea non reprehenderit. Aut sint consequatur culpa aut.
Ipsa voluptatem velit aut repellat consequatur voluptatem consequatur. Quaerat voluptas enim facilis commodi et illo laudantium. Rerum corrupti sequi a eligendi modi quidem quam. Qui eos enim atque voluptatem.
See All Comments - 100% Free
WSO depends on everyone being able to pitch in when they know something. Unlock with your email and get bonus: 6 financial modeling lessons free ($199 value)
or Unlock with your social account...