Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The classical solver used to train kernel SVMs and implemented in libsvm [1] has a time complexity in between o(n^2) and o(n^3) where n is the number of labeled samples in the training set. In practice it becomes intractable to train a non-linear kernel SVM as soon as the training set is larger than a few tens of thousands of labeled samples.

Deep Neural Networks trained with variants of Stochastic Gradient Descent on the other hand have no problem scaling to training sets with millions of labeled samples which makes them suitable to solve large industrial-scale problems (e.g. speech recognition in mobile phones or computer vision to help moderate photos that are posted on social networks).

SVMs can be useful in the small training sets regime (less than 10 000 training examples). But for that class of problems, it's also perfectly reasonable to use a single CPU (with 2 or 4 cores) with a good linear algebra library such as OpenBLAS or MKL to train an equally powerful fully connected neural network with 1 or 2 hidden layers. Hyper-parameter tuning for SVMs can be easier for beginners using SVMs with the default kernels (e.g. RBF or polynomial) but with modern optimizers like Adam implemented in well designed and well documented high-level libraries like Keras it has become very easy to train neural networks that just work.

Also for many small to medium scales problems that are not signal-style problems [2], Random Forests and Gradient Boosted Trees tend to perform better than SVMs. Most Kaggle competitions are won with either linear models (e.g. logistic regression), Gradient Boosting, neural networks or a mix of those. Very few competitors have used kernel-based SVMs in a winning entry AFAIK.

[1] https://en.wikipedia.org/wiki/Sequential_minimal_optimizatio...)

[2] By "signal-style" I mean problems such as image or audio processing.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: