Improving Multi-Class Classification with Stacking Ensembles and Feature Engineering: Need Insights
Hi everyone,
I am working on a machine learning task involving a multi-class classification problem with tabular, imbalanced data (no time series or categorical variables).
The goal is to predict class probabilities for a test set (150,000 rows x 9 classes) using models trained on the provided training data. To achieve lower log loss scores, I am exploring a multi-layered approach with stacking ensembles.
The first layer generates meta-features from diverse models (e.g., Random Forest, Extra Trees, KNN, etc.), while the second layer combines these predictions using techniques like LightGBM, SVM, or neural networks.
I am also experimenting with feature engineering (e.g., clustering, distance metrics, and embedding-based methods like UMAP and t-SNE), and advanced optimization techniques like Bayesian search for hyperparameters. Given the data imbalance, I am considering sampling techniques or class-weight adjustments.
Any suggestions or insights to refine this pipeline and improve model performance would be greatly appreciated.
It looks like this may be out of my ability to answer... maybe some of the links below might help?
If you're looking for more specific advice, consider diving into machine learning-focused communities or resources.
Optio est nihil ut animi est. Quia sed natus consequuntur sed in. Deserunt dolorem earum ut velit repellendus unde sequi fugiat. Eos qui reiciendis ea blanditiis.
Reiciendis inventore molestias voluptatem cumque aliquam. Sit sint et non velit laboriosam nihil. Non nulla in dignissimos iure et. Quisquam quibusdam rerum ab est eligendi quibusdam.
See All Comments - 100% Free
WSO depends on everyone being able to pitch in when they know something. Unlock with your email and get bonus: 6 financial modeling lessons free ($199 value)
or Unlock with your social account...