Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think many of the gotchas and annoying parts of base R are solved by using tools from the tidyverse: http://github.com/tidyverse. For example, the pain of needing to specify `stringsAsFactors=FALSE` is solved in the tibble package by setting a sensible default.

At any rate, at least it's not Pandas and matplotlib...



> At any rate, at least it's not Pandas and matplotlib...

There's a certain yin and yang to the space, isn't there? You get to choose between a hacky language that has pretty good tooling built on top of it, or a pretty good language with hacky tooling built on top of it.

I think that Python is probably winning because being a decent language gives you a decent escape hatch, whereas no amount of great libraries can save you from having to go through the bizarro language.

That said, R may be bizarro, but at least, once you learn it, it's predictable. Whereas I'm not sure even Pandas really knows whether a given call to .loc will copy or refer to the original data.


Pandas has a lot of defensive copying. I love Pandas, but I think it demands a lot from the user. When I started using it I was new to traditional coding (i.e. knowing anything about datastructures), and have come from R. Over time as I've learned a real amount about legitimate data structures, I've become far far better with Pandas.


What data structures do you recommend learning about?


I literally did not understand the fundamentals when I started Python (e.g. Dictionaries, triples)


I can't stand the non-standard evaluation of the tidyverse. It works great for writing one-off scripts, but as soon as you start trying to put it into functions or your own package it's just not worth the pain of quosures and the tidyeval nonsense that changes every 6 months.


I used to feel similarly, but I think it's much more stable than it was even a year ago; `!!`, `enquo`, and `:=` is good enough for the vast majority of users who want to write their own NSE functions now.


I hate it as well. And then you get people going on about "functional style" while throwing referential transparency out the window.


That particular issue can also be solved by setting options(stringsAsFactors = FALSE). Anyway, the default behaviour was the sensible thing to do at the time: https://simplystatistics.org/2015/07/24/stringsasfactors-an-...


Beware of adding that to your .RProfile though. The author makes a good argument for why that's a bad idea: your code will execute differently on someone else's machine.

http://arrgh.tim-smith.us/factors.html#fn:options


That’s true. It may create more problems than it solves. I don’t change the default value myself, setting stringsAsFactors=FALSE when required is good enough for me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: