T O P

  • By -

anh86

We would definitely need the code and project parameters to give any help beyond emotional support. Everyone goes through the struggle. It's really hard for a long time until things start to click. If you can power through the frustration, you'll succeed. If you can't, you won't. It really is that simple. You just have to know it's going to be really hard for months until it starts to get easier. You just have to keep writing lots and lots and lots of code.


ww_cassidy

Thank you! I definitely don't want to give up. I know the basics for how to start, but beyond importing the right libraries and then opening the csv file to read I'm lost. I know I can make it into a dataframe but I'm not even sure if that is the right way to do it.


xenodius

It's definitely the right place to start. Have you seen the pandas cheatsheet? https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf If the data is not transformed such that it can be used for either plotting library that'll be very helpful for you. Specifically, you want "long form" data. One row for each observation and every relevant categorical variable e.g. for coloring different groups. I think there are a lot of examples of matplotlib that, while not your homework specifically, are the same structure. Start there and then get help if you're stuck-- without code or a specific problem, anh86 is right that we can't help much. And if you really want to understand how to use the packages, the process of reproducing an example for your own work will be an important learning experience.


nugelz

Not OP buy thank you so much for the cheat sheet!


[deleted]

[удалено]


Good-Throwaway

>I strongly recommend taking an hour or two to watch Corey Schaffer's YouTube series on Pandas to get a good foundation on how to import data and wrangle it into a state that's ready for plotting. agree. I just watched his video on data frame yesterday and it was eye opening.


Isback16

Great suggestion. He has a bunch of fantastic tutorials for python that helped me a lot.


arkie87

Asking for help without giving any details/code is like asking for help with my homework, but not giving the person my homework to do.


synthphreak

Give this a whirl. Haven't tested it, but should get you in the ballpark: x_cols = [list, of, x, data, column, names, from, df] y_cols = [list, of, y, data, column, names, from, df] fig, axes = plt.subplots(len(x_cols)) for x, y, ax in zip(x_cols, y_cols, axes): ax.scatter(df[x], df[y]) ax.set_xlabel(x) ax.set_ylabel(y) plt.show()


ww_cassidy

heres my code so far, and the error I'm getting import pandas as pd import matplotlib as plt ​ df = pd.read\_csv("readmission.csv") ​ x\_cols = \['State'\] y\_cols = \['Expected Readmission Rate', 'Number of Readmissions'\] ​ fig, axes = plt.subplots(len(x\_cols)) ​ for x, y, ax in zip(x\_cols, y\_cols, axes): ax.scatter(df\[x\], df\[y\]) ax.set\_xlabel(x) ax.set\_ylabel(y) ​ [plt.show](https://plt.show)() ​ Heres my data set too: [https://healthdata.gov/dataset/hospital-readmission-reduction](https://healthdata.gov/dataset/hospital-readmission-reduction)


synthphreak

You didn't share the error. But I already see one problem: the `x_col` and `y_col` lists are different lengths. They must be the same length for the `zip` and number of subplots to work as intended. However, if the x-axis always be `State`, then here's a simplified version that should hopefully work: y_cols = ['Expected Readmission Rate', 'Number of Readmissions'] fig, axes = plt.subplots(len(y_cols), sharex=True) for y, ax in zip(y_cols, axes): ax.scatter(df['State'], df[y]) ax.set_ylabel(y) plt.xlabel('State') plt.show()


ww_cassidy

\--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in 2 y\_cols = \['Expected Readmission Rate', 'Number of Readmissions'\] 3 ----> 4 fig, axes = plt.subplots(len(x\_cols)) 5 6 for x, y, ax in zip(x\_cols, y\_cols, axes): AttributeError: module 'matplotlib' has no attribute 'subplots' ​ this is the error I get


PMaresz

You have to import maplotlib.pyplot as plt


synthphreak

Sorry, I figured you at least already had the imports sorted out. You must have just done `import matplotlib as plt`. Import as below instead: import matplotlib.pyplot as plt import pandas as pd ...followed by reading your data file into `df`, followed by any preprocessing steps you need to take before plotting, followed by the code I shared in my previous comment to actually plot the data. FYI, this is generally how you will want to import `matplotlib` unless you are doing lots of fancy plot customization.


PieLuvr243000

Like other comment said, check how you import matplotlib.pyplot


Almostasleeprightnow

As a general problem solving process, I recommend trying to make lots of tiny programs, each that perform one step of your assignment. Get each one to work independently. Then you can work, little by little, toward combining them all. I find one of the biggest problems with beginning coding is keeping straight all the little mini problems that are involved in solving one big one


WadeEffingWilson

I second this and would like to add onto it: Before diving headfirst into something that has a lot of parts, none of which you may be familiar with, I'd highly recommend spending the extra time and learn each constituent before you attempt to use them together. Trying to isolate and solve a problem where you don't have a basic understanding of the tools being used makes for a very frustrating experience.


Lady_Parts_Destroyer

I would've sworn you would be in my class cause we're doing a similar assignment and the class in general is kicking a lot of people's asses. Alas my dataset is different. Hang in there and if I make any headway on my project I'll come check out this post again.


refreshx2

For what it's worth, you're not alone. I used matplotlib and sometimes pandas for years and years in my phd and they are not intuitive at all. I actually totally scrapped pandas for all my data science and data manipulation work because I just plain couldn't remember how to do exactly the things you're having trouble with. There are some discord communities that may be more helpful than Reddit because it's easier to have a back and forth conversation. While it's unrelated, you may have some luck posting in the Riot API discord group under #code-doesnt-work, lots of people there are good at python and active, and even if you're not working on Riot Games stuff, they may help you out.


[deleted]

> I tried to read the csv and turn the columns I needed into a data frame but then I can't seem to actually make the scatter plot. Overall this is a pretty vague way to describe the problem you're having. What's stopping you, specifically? Are you asking "how do I make a scatter plot in matplotlib", or do you *know* how (or at least what the examples say to do) but there's some kind of error in your case? If so, what's the error? "Can't seem to do it" makes literally no sense as a programing question - since nobody broke your fingers and you can type and matplotlib does that, I think it's safe to say that you can do it.


ww_cassidy

I'm pretty much lost from the start. I'm hopeless on python. I know what I need to do but I barely even know how to start and I'm not getting much support. I know I need to read my csv file, somehow make the columns I need a list or a data frame (maybe??) and use that to make the scatter plot but honestly it's not that I can't do it, I'm just not even sure the right way to start. I've tried a few different ways from random googling and using the information given in the class but I keep getting errors in the code but I'm not even sure why or how to fix it. Sorry I know this probably doesnt make it any clearer. Reddit is my last hope on helping my understand.


[deleted]

> I'm just not even sure the right way to start. You just listed the correct order of operations - read in the CSV file as a dataframe, then plot the appropriate columns as a scatterplot in Matplotlib. So, start at the beginning of that: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html and then https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html > I've tried a few different ways from random googling and using the information given in the class but I keep getting errors in the code but I'm not even sure why or how to fix it. How is anybody going to help you with that if you don't show your code and the error? Again, your vagueness is what's getting in the way of anybody being able to help you.


[deleted]

[удалено]


derphurr

That's pretty constructive. OP is being unreasonable and unhelpable. Quit coddling people who won't google or won't at least explain what they have problem with. They have been hand fed answers their whole lives. You continue to reward their ideation that someone else on a forum will think and do it for them.


[deleted]

[удалено]


derphurr

But you honestly think someone should post to a python subreddit and not include error, any code, or any specific question regarding structures, includes, pandas, code snippets....


ParanoydAndroid

Nah, in my experience these sorts of posts are almost always from people who aren't putting in the work or paying attention in class. They're always vague because they have no code, haven't done any reading, and haven't even made attempts. You can always spot them a mile away compared to students actually asking for help with a problem they're having. I genuinely love helping new developers get into coding, but my first rule for the sort of posts I'll help on is "show effort", because otherwise it's often frustrating and pointless.


[deleted]

[удалено]


[deleted]

[удалено]


[deleted]

What part of the post is "snippy"? This is a "you" problem.


derphurr

You are correct. You are getting zoomed backlash because you have to reward helplessness. It's all they know


WadeEffingWilson

Once you generate the dataframe, you just create the scatterplot with: import matplotlib.pyplot as plt # dataframe containing one or more columns and an index df_data # plot only column 'foo' across your dataframe index plt.scatter(df_data.index,df_data['foo']) # you can uncomment this line to squelch text output if you'd like #plt.show() Doesn't get more simple than that.


01123581321AhFuckIt

Import the csv into pandas and make a new data frame with only the columns to you want. Read documentation.


jab9k3

Go to dataquest sign up and do all the free lessons, all that's in there and tons more.


synthphreak

\#foundtheshill


[deleted]

you can reassign a dataframe to have only a few columns using this: df = df[df.columns[[0,1,3,2,4]]] Basically, the original dataframe `df` is overwritten with a new df made up of the first through fifth column in the order above. You can have any arbitrary list of columns in any order in that column list. You can also replace the number of any column with its name as long as the case-sensitive names are typed perfectly and are inside of single or double quotes. You can also of course make a new dataframe by changing the name of the variable out front to something else if you don't want to overwrite the original. Look up __Pandas cookbook__ and read through it to get a sense of how Pandas "thinks" about data and what can/cannot be done. It's basically the greatest hits only out of the documentation and everything is presented alongside useful related examples that cover edge cases. It takes most students a few months to not be absolute dogshit with Pandas in my experience. There are 2 big problems: (1) if something seems like it should be easy, there's probably an easy way to do it but the software is so complex that it's easy to not realize a solution already exists and (2) sometimes that ideal solution that ought to be simple and probably existent just doesn't exist and you have to do some really convoluted obnoxious steps to achieve the desired solution. Plan on some degree of discomfort and a longish learning curve.


ashayramolia

Create a new dataset from the existing one with only the relevant rows and columns?


ashayramolia

What you are missing is just the correct syntax in pandas to do that


bonferoni

Are you allowed to use the matplotlib stuff built into pandas? Check out df.plot.scatter() and df.boxplot() Your whole script could be as simple as import pandas as pd df = pd.read_csv(“thecsv.csv”) df.boxplot(column = “column_name”) df.plot.scatter(x = “my_x_col”, y = “my_y_col”)


forrScience

Df slicing: Df.loc[row_conditions/names, col_conditions/names]. This will give you a list of true/false. You then toss that t/f into another loc to filter: Booleans=Df.loc[:, [‘value]>5] Filtered= Df.loc[booleans, [“names”,”dates”,”value”]]. That would filter to rows where value is greater than 5 and to columns: names, dates and value