The data has about 40 features and 500,000 instances. And the data is sparse. I wish to fit a svm model with the data. To fit svm, I need to first scale the data. However, if the data contains many outliers, scaling is likely to not work very well. So the problem is how can I find outliers in the data?
↧