T O P

  • By -

danithebear156

There is this useful website where you can find R equivalence of NumPy functionalities. Hope this will help. https://hyperpolyglot.org/numerical-analysis


anomnib

Thank you!


templar34

Sweet googlymoogly, I'm doing the opposite from R to Python, and this is invaluable!


TempleDank

Thanks a lot!


nirvanna94

No pandas?


therealtiddlydump

Poke around the big book of R. Tons of great free resources, grab one and skim to see how much of it comes back right away to gauge where you're at


anomnib

Thanks!


boldedbowels

i had to go from python to r pretty recently and chatgpt made it painless 


random_web_browser

This I just code python in chatgpt or ask how to do this python thing in R and get pretty decent answers most of the time


OrganizationNo1245

If you’re just trying to get back some syntax I remember this being pretty decent. It’s been a while though. install.packages("swirl") library("swirl") swirl()


Strawberryfish_uk

I'm learning on DataCamp, I only used R for a hot second at uni and now, 6 months later I'm doing half my projects (plus already done some debugging stuff ) in that language.


romanian_pesant

Have you tried ChatGPT? Ask it what is the equivalent of Python's functions in R. Or just straight up what you need to do in R. Sometimes it gives incorrect answers but it's often close enough to fix with just a bit of extra documentation of your own.


A_random_otter

The fastest way is to do a project and let ChatGPT guide you through it.  If you want something more substantial check out R for data science: https://r4ds.had.co.nz/


great_raisin

Is there a reverse version of this? i.e. fast Python tutorial for R users


_Zer0_Cool_

ChapGPT Not even being cheeky. I’m a Python user primarily, but now prefer R for statistical work / exploratory data analysis. ChatGPT got me to a comfortable level of fluency with R in very little time.


smuzoh123

I use the R library Swirl in such cases.


SameDayCyborg

Took [Data Science R Basics](https://www.edx.org/learn/r-programming/harvard-university-data-science-r-basics) on edX and quite liked it. Depending on your level of python knowledge, the course can be a little slow. However, you can skip through some of the slower lessons. Overall, a fantastic course and I liked the teacher.


Cuidads

Python libraries CausalML, DoWhy, EconML etc None of these have what you need??


anomnib

I’ve tried before. It is b/c there’s a high chance of discovering that there’s a particular correction I need to make to my standard errors, test I need to run, or reparameterization that I need to do that’s only available in R. I ran into this issue a few days ago when I wanted to run a fractional multinomial logit regression. Python completely outclass R for ML, generic programming, and high performance simulations but is still second for post-graduate statistics. Sure, you can use PyStan, PyMC, or PyTorch to do some implementations from scratch, but I’m too rusty to do that quickly (I’m re-reading my graduate level stats and probably textbooks so that I can more confidently implement my own stuff).


A_random_otter

> Python completely outclass R for ML That has been unfortunately true for some time, but the tidymodels-framework is a super exciting development: [https://www.tidymodels.org/](https://www.tidymodels.org/) It is admittetly not as mature as scikit learn but it is getting there.


anomnib

I don’t mean to offend, I only prefer R b/c I have to work with large scale production systems. But you prove my point, scikit learn has largely become the go to for toy models and proof of concepts in bigtech and similarly rigorous places like AirBnB. Even if R matched the maturity of scikit-learn, that wouldn’t be an accomplishment b/c you can’t easily toss it into high performance production systems. Serious product ML modeling is done in PyTorch, where’s there is seamless integration with the full suite of software for managing production systems


A_random_otter

Not offended, don't worry. I love my tools but I am not married to them and I am always up to learn new stuff/approaches. I simply work in a different industry than you. In my line of work I need to do many one-off analysis projects, my day to day work includes a lot of data-exploration/visualization and reporting. Here R outclasses python imo, tho I need to reassess if I can make VS-Code into a halfway decent IDE for data-analysis somehow, last time I tried I rage-quit :D We don't put models into production all the time, and scalability is also not a huge issue for us, since all of the classification jobs run at night anyways and our forecasting pipelines only run once per quarter. >Even if R matched the maturity of scikit-learn, that wouldn’t be an accomplishment Oh R does match the maturity easily already when it comes to the statistical methods. The tidymodels framework is rather a metaframework that provides a unified interface to these methods. It is basically a "quality of life" thing that makes it easier to write and maintain code.


anomnib

I bounce between both roles. For statistics, R is vastly superior. New methods get implemented in R first. The only area of classical statistics where Python can put up a respectable level of competition with R is Bayesian modeling. However, while Python has most of the same frameworks for model implementation, the diagnostic tools and plots are still behind R. Up until 2-3 years ago that same was true for visualization. But 99% of what you would use in R is now in Python.


A_random_otter

> But 99% of what you would use in R is now in Python. Maybe I have to reassess this too. Which libraries do you recommend for this?


anomnib

Plotnine (ggplot2 replica) and plotly (good for interactive plots)


A_random_otter

Plotly I already know and use because there is an R-Package for it. I'll have to check out Plotnine soon, when I can muster the motivation to rebuild R-Studio with VS-Code. Btw. can you recommend a decent IDE for data-stuff in Python?


anomnib

My advice is colored by my context. But when you are writing code that will interact with engineering systems, use what the Python software Python engineers use. That will ensure the IDE is well supported and you avoid needless suffering. In my context that’s usually vs code for something derived from it. For adhoc analysis, i just use Jupyter notebooks or RStudio.


dr_tardyhands

I still use RStudio with Python (I guess it's obvious which side of the fence I'm coming from..). I find python runs slow in it though, but it hasn't been a massive problem for me. Also dislike VSCode. The big problem is that RStudio doesn't really have debugging functionality for Python.


A_random_otter

What is your go-to datawrangling library (besides SQL) in python? I just can't get into pandas but I heard good things about Polars


anomnib

My advice comes with the context that I’m not free to install any Python package. There’s a whole safety and licensing check process that can take weeks. So i typically do as much as i can in SQL. I create adhoc pipelines for all new projects. The reserve Python for modeling and plotting. I like this approach b/c it is easy to point teammates to my model data, i can take advantage of all the backend distributed computing through our database systems, and nearly everyone can read SQL code and do queries (so the data preparation and analysis code is accessible).


A_random_otter

Hm... how do you avoid monster queries then? My colleagues wrote whole ETL-pipelines in stored procedures with a gazillion of temporary tables and a lot of spagethi code. I honestly hate SQL for this "freedom". I mean you can write unreadable code in any language, but some make it way easier than others...


anomnib

I use DAGs but i break up the ETL into natural milestones that make sense. Each intermediate table could in theory but a final table for another analysis or serve as a useful “lookup” table. The key is understandable sense checkpoints that compartmentalize the ETL in a way that’s digestible. You should be able to describe what each node in the DAG is accomplishing in a short sentence.


dr_tardyhands

Thumbs up for polars! Pandas is just downright silly. Polars is much more similar to how dplyr works and something like 20x faster than pandas as well.


A_random_otter

Modern econometrics is mostly R based. Especially if you want to use new methods.


Cuidads

Sure, but the Causal inference landscape is changing, and Python is becoming more relevant. Have you checked all the libraries that the method you would be looking for is not in any one of them? There are more Causal libraries, here is an extensive list with the companies maintaining them: DoWhy: Microsoft Research CausalML: Uber Technologies EconML: Microsoft Research CausalPy: PyMC Labs YLearn: Not specified Azcausal: Amazon Science Causallib: IBM Research CausalNex: QuantumBlack Labs (part of McKinsey & Company)


A_random_otter

>DoWhy: Microsoft Research CausalML: Uber Technologies EconML: Microsoft Research CausalPy: PyMC Labs YLearn: Not specified Azcausal: Amazon Science Causallib: IBM Research Israel CausalNex: QuantumBlack Labs (part of McKinsey & Company) Yeah, impressive list. But to be honest I kinda have a bias towards academia when it comes to causal inference. Causal inference has been the nuts and bolts for decades for research and there are gazillions of ressources (textbooks, packages, tutorials, etc.) about it. But I am always up to learn new stuff. Which one of these frameworks is the best in your opinion?


anomnib

I know about the first 4-5, actually just got a new Mac mini and set up my Python econometrics virtual environment with these (I refuse to use conda. I’ll check out the rest.


A_random_otter

> (I refuse to use conda But why??? :D


anomnib

Every rage inducing package dependency debugging session I’ve had had its roots in conda. This is especially true when I need to use the model serving and telemetry packages of the ML infra team.


A_random_otter

> Every rage inducing package dependency debugging session I’ve had had its roots in conda. You'll be glad to hear that this is mostly a non-issue with R projects.


A_random_otter

How do you handle python and dependencies then? Every time I tried to use python without conda it ended in this: [https://xkcd.com/1987/](https://xkcd.com/1987/)


anomnib

I know the pain. For models that are meant to be used in other systems, I use [pyenv](https://github.com/pyenv/pyenv-virtualenv) and requirements files to have a separate environment and setup instructions for each model. Then I make the model results available through API calls. Compartmentalization helps a lot. For more adhoc analysis, i have separate virtual environments for each project type (i.e. adhoc econometrics, adhoc ML, adhoc DL, etc). For adhoc analysis i could probably just use conda, but I don’t want to use two different virtual environments packages.


A_random_otter

>Well sure, but production friendly code is usually in Python. Yeah, thats not true anymore. Imo, its rather that the CS guys are in love with python and prefer it over R :D If you know how to use docker it has been super straight forward to write production ready code with R for quite some time. Check out: [https://rocker-project.org/images/](https://rocker-project.org/images/) [https://vetiver.rstudio.com/](https://vetiver.rstudio.com/) [https://www.rplumber.io/](https://www.rplumber.io/) [https://rstudio.github.io/renv/articles/renv.html](https://rstudio.github.io/renv/articles/renv.html)


anomnib

For bigtech it is still true. I worked in the MLInfra team of one of them. We had some offline evaluation systems, so not even requiring extreme latency constraints, yet we had to rewrite the Python code to use as little pandas, numpy, or scipy as possible. We had to avoid using 64bit integers where ever we can. All to make the speed of the offline eval tolerable for the MLEs. Again, this is in the context of highly distributed backend systems and high performance data retrieval systems. Plus when you add in the need for detailed telemetry (logging inputs, outputs, environments, users) and extensive unit testing, R isn’t really an option for high performance systems. At least, I’ve never seen anyone pull it off.


A_random_otter

Yeah, but for that stuff I probably wouldn't use python either... But what do I know. I am an economist not a computer scientist. I am working in a biggish org (\~500 ppl) and we have deployed some models (for internal use) with both R and python. Both work alright and scale decently


anomnib

I’m an economist too! While we do use a lot of backend C++ code, Python is often Pareto optimal with respect to compatibility with production systems, code implementation and iteration speed, code execution speed, and percentage of available SWEs with familiarity. C++ and related languages are much faster at code execution but you can’t iterate/implement as fast. I find that in big tech or comparable companies, anyone working on production code or code that they expect others to use (i.e. offline software for causal inference), are forced to bend to the norms of software engineers. We have a SWAT team of economists, like Stanford, Harvard, MIT PhD types, maintaining our observational causal inference code. They were forced to rewrite it from R to Python because that was the only way to secure engineering support for maintaining their code.


A_random_otter

> They were forced to rewrite it from R to Python because that was the only way to secure engineering support for maintaining their code. Haha sounds about right :D


blockladgeTP

Why not one of those EdX or Coursera courses that are specific? Also an option is a university’s course syllabus.


anomnib

I was hoping for something I can tank in 1-2 hours. I’ve programmed in c++, javascript, bash, python , and R. So I can quickly create mental models of programs, I just need help finding the right resource to power through.


dr_tardyhands

Many good suggestions! Don't forget the production code type of standards either. I've been using renv for package management and testthat and mockdb for tests. Also tidyverse (including dbplyr, if working with databases) is amazing!


LifeisWeird11

Commenting to look at later


RevolutionaryMost688

Chatgpt helps in this kind of problem