Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Took me a while to get back to you, but essentially dplyr is fantastic for readability and reproducibility. Reading through someone else's analysis, or even my own long after the fact, is orders of magnitude easier than base R, data.table, or pandas typically are.

data.table's advantage lies in its speed. It is by far the fastest of the three options. In just about every benchmark it either is significantly faster than pandas or at the very least is approximately equal.

Pandas is lauded by people who strictly use Python, and it really is fantastic considering how ridiculous data manipulation would be in Python without it. But its also the only option a Python user really has, so they've become married to the idea that it is best.

Basically, if you are using Python, use pandas. If you have an option, go for data.table for speed, dplyr for clarity, or a mix of the two if desired.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: