I’m using the Adult dataset that can be found here: http://archive.ics.uci.edu/ml/datasets/Adult
After taking a sample of the dataset, I use the svm function of e1071 to obtain the accuracy with a linear kernel.
adult.df = read.csv("sample_adult.csv")
adult.df$X = NULL
Income = adult.df$Income...50k
summary(svm(formula=factor(Income)~., data=adult.df, type="C-classification", cost=1, kernel="linear", cross = 10))
This returns:
Number of Classes: 2
Levels:
<=50K >50K
10-fold cross-validation on training data:
Total Accuracy: 100
Single Accuracies:
100 100 100 100 100 100 100 100 100 100
However, I’ve implemented a holdout method of testing the accuracy (this is rather tailored to the dataset):
holdout <- function(data, params) {
# randomize the dataset
data <- data[sample(1:nrow(data)), ]
# we use an 50/50 split. train on 50% of the data, test on the other 50%
training.set = data[1:(nrow(data)/2),]
t.income = training.set$Income
testing.set = data[(nrow(data)/2 + 1):nrow(data),]
# train a model on the training set
model = NULL
if(is.null(params$degree)){
model = svm(formula=t.income~., data=training.set,
type=params$type, cost=params$cost,
kernel=params$kernel, cross=10)
} else {
model = svm(formula=t.income~., data=training.set,
type=params$type, cost=params$cost,
kernel=params$kernel, degree = params$degree, cross=10)
}
print(summary(model))
# test each point in the testing set
wrong = 0
for(i in 1:nrow(testing.set)){
prediction = predict(model, testing.set[i,])
if(prediction != training.set[i,length(training.set)]) {
wrong = wrong + 1
}
}
return(wrong/nrow(testing.set))
}
If I run the holdout on the same SVM:
>holdout(adult.df,list(type="C-classification", cost=1, kernel="linear"))
...
10-fold cross-validation on training data:
Total Accuracy: 100
Single Accuracies:
100 100 100 100 100 100 100 100 100 100
[1] 0.39
As you can see… The holdout and CVE values are entirely different. I think my holdout code is correct, and my implementation of the svm function is the problem. Please, any help would be appreciated.
Thank you!