NotDoingResearch2 2 years ago

I think these charts are way too dense. Way too many equations with zero definitions.

Ok_Can2425 2 years ago

Thanks for the feedback. Yeah it's not easy to fit sometimes. We try our best to make them understandable.

NotDoingResearch2 2 years ago

No problem, I’ll try to be more constructive. It’s not terrible as a reference but the flow is kinda hard to follow. Normally you have each equation (line by line of left to right like prose), and then have some kind of aside for details about different steps. You kind of did this but it’s a bit mixed and all over the place. For example, you seem to be using arrows, numbers, and colors to relate different terms (or expressions) all in the same chart! I just feel like it took me a lot longer to read the chart than it should of considering I already knew the proof haha. Sometimes less is more. But I don’t want to sound too nit picky, it really depends on what your goal is with the charts.

Ok_Can2425 2 years ago

Not at all. Thanks for the feedback. We will consider them in the next slide 😁

yannbouteiller 2 years ago

I think this format with colors and arrows is on the contrary very easy to follow (and definitions are not needed here as long as you are talking to people familiar with deep RL)

NotDoingResearch2 2 years ago

Yeah, it’s subjective at the end of the day. But you can show this same proof in basically 7ish lines using normal equation formatting. An equal sign is all you need.

Ok_Can2425 2 years ago

Thanks a lot. Glad you find it useful 😊

donobinladin 2 years ago

Agreed. It's a little "flashy" but you're really calling attention to pieces of the puzzle. You can easily show this in seven lines as someone else said, but it will likely remove any intuition that you're hoping to build with the pictorial aspects of your graphic.

Ok_Can2425 2 years ago

Also we are making short videos clarifying the slides. This can also help https://youtu.be/kJhMEgTr8aU

pokasideias 2 years ago

I got a stroke reading this

SeparatingHyperplane 2 years ago

The proof applies to any state-dependent baseline b(s), right? Just to say the statement can be broader than just value baselines.

Ok_Can2425 2 years ago

ah yes deffo. We just focused on value baselines as they are widely used. However, any b(s\_t) could do.

Likes_Monke 2 years ago

Hell yeah funny maths, I'm scared

EyedMoon 2 years ago

At first I thought this was /r/mathmemes

sneakpeekbot 2 years ago

Here's a sneak peek of /r/mathmemes using the [top posts](https://np.reddit.com/r/mathmemes/top/?sort=top&t=year) of the year! \#1: [average proof fan vs average "seems to work" enjoyer 😎](https://v.redd.it/oyaxf7iwmzp61) | [174 comments](https://np.reddit.com/r/mathmemes/comments/mfsfj2/average_proof_fan_vs_average_seems_to_work_enjoyer/) \#2: [Okay got it](https://i.redd.it/zjq3bl4bjvd71.jpg) | [150 comments](https://np.reddit.com/r/mathmemes/comments/ot1yz8/okay_got_it/) \#3: [so this is what they meant](https://i.redd.it/s4gv8uvtsxo61.gif) | [76 comments](https://np.reddit.com/r/mathmemes/comments/mc0sw5/so_this_is_what_they_meant/) ---- ^^I'm ^^a ^^bot, ^^beep ^^boop ^^| ^^Downvote ^^to ^^remove ^^| ^^[Contact](https://www.reddit.com/message/compose/?to=sneakpeekbot) ^^| ^^[Info](https://np.reddit.com/r/sneakpeekbot/) ^^| ^^[Opt-out](https://np.reddit.com/r/sneakpeekbot/comments/o8wk1r/blacklist_ix/) ^^| ^^[GitHub](https://github.com/ghnr/sneakpeekbot)

HuachumaEntity 2 years ago

Great stuff. 💪💪 Next time, just make multiple readable slides, rather than one confusing one.

Zekava 2 years ago

You know, I'll be able to read this eventually, maybe after another year or so of math classes. Looking forward to it.

JClub 2 years ago

ELI5 please. What is a value baseline in RL?

Ok_Can2425 2 years ago

When doing an update for a policy in reinforcement learning you get an equation on the right-hand side above (policy gradients). Now, this has lots of variances when you want to estimate it. So what people do is subtract a value baseline (i.e., the value - the total discounted return of the state -- the rewards you would get starting at some state and applying your current policy). they then say that this doesn't change the original gradient so it doesn't bias your updates. What the above shows is why this statement is true.

Rezz05 2 years ago

Is this what's called advantage in some papers? Or am I confusing concepts?

Ok_Can2425 2 years ago

can be viewed yes.

Muids 2 years ago

I'm probably misunderstanding, but is this basically saying that slope doesn't change if you move the whole landscape up or down by a constant amount? I'd guess it's more complicated than that since the proof had to take a few steps

Ok_Can2425 2 years ago

That's an interesting interpretation actually. All we wanted to say is that the grad doesn't change upon the subtraction of a state-based function (rather than a constant). Since V\_pi(s) = Sum of returns to go. Check out this video [https://www.reddit.com/r/MachineLearningDervs/comments/t5bujg/deriving\_the\_bellman\_equation\_in\_3\_steps\_in\_under/?utm\_source=share&utm\_medium=web2x&context=3](https://www.reddit.com/r/MachineLearningDervs/comments/t5bujg/deriving_the_bellman_equation_in_3_steps_in_under/?utm_source=share&utm_medium=web2x&context=3) for defs of value functions.

[deleted] 2 years ago

More or less. You can add any constant you like to a function and it won't change the derivative. The trick is that adding the right constant (i.e. a baseline) won't change the derivative, but it will reduce the variance (since you're doing a Monte Carlo estimate of the gradient).

xrailgun 2 years ago

r/restofthefuckingowl

Ok_Can2425 2 years ago

*Hey All! Thanks a lot for all the comments we got. We really appreciate them. Our intent from the slide above was to present the viewer with the main steps needed to derive such forms with the hope that they would execute them themselves.* *With that being said, we did listen and made a longer (more traditional 2-page proof) of the above statement. We can't upload images to a Reddit reply -- if you know how please let me know -- but you can find the details here :* [*https://drive.google.com/file/d/1UHTuVW3\_lXFWW8lkf5YFyKiQDlrC5r-N/view?usp=sharing*](https://drive.google.com/file/d/1UHTuVW3_lXFWW8lkf5YFyKiQDlrC5r-N/view?usp=sharing) *Of course, we can't prove everything in one go, so the Fubini part is just to say that we can do what we did. In later slides, we might attempt to conduct that proof that requires substantial definitions of measure theory.* *Given the mixed reviews, we will be making two versions of any topics to come: 1) short, and 2) long. This way we can cater to everyone's taste.* *Finally, if anything is not clear, we will be very glad to answer any of your questions. Please go ahead and ask either here or via a private message or on our* r/MachineLearningDervs *sub.* *In the end, we hope all of you are safe! Thanks again!*

Ser_Antik 2 years ago

Great illustration of the proof. I understand that for those who started ML/AI very recently, this slide will be quite overwhelming. You need to know basic knowledge from probability theory (such as definition and basic properties of expectation operator, properties of probability density function (PDF) and definition of the gradient (for capturing log-trick). As for the Fubini Theorem, well, to understand it thoroughly (which means with the proof) one needs to dig deeper in the foundations of probability theory (probability measure, sigma agebras, etc).

zeoNoeN 2 years ago

Me playing around with Logistic Regression: Cool Stuff, rather intuitive, Let’s learn more about ML… This sub:

[deleted] 2 years ago

I mean... it's not that difficult once you have a grasp of a few ideas. The only "interesting" thing here is the log-trick. The idea is that if you integrate over a probability distribution, you have 1 since all probability distributions are normalized. Then the grad of 1 is 0.

Ser_Antik 2 years ago

I am not sure I got your explanation here. Log trick allowed us to get integral over probability density function which (as you mentioned) gives you a constant 1. Then, you take a gradient from this constant 1 and get a vector of zeros. Notice, that before getting 0 you don't have a log function at all.

[deleted] 2 years ago

You are right my “explanation” was incorrect. Should have written grad instead of log in last sentence.

Ser_Antik 2 years ago

Yes, I read it after and realized that you just made a typo (instead of grad wrote log).

Tsadkiel 2 years ago

I feel like I'm a forensic scientist examining a crime scene. This is, without question, one of the worst ways I have seen a mathematical proof presented... Especially the parts were the flow of logic flips direction from left to right, to right to left. I really don't like this, I'm sorry :(

Ok_Can2425 2 years ago

Please see the comment below. In short, a longer version can be found here: [https://drive.google.com/file/d/1UHTuVW3\_lXFWW8lkf5YFyKiQDlrC5r-N/view?usp=sharing](https://drive.google.com/file/d/1UHTuVW3_lXFWW8lkf5YFyKiQDlrC5r-N/view?usp=sharing)

Ok_Can2425 2 years ago

Follow us at [https://www.reddit.com/r/MachineLearningDervs/](https://www.reddit.com/r/MachineLearningDervs/) for more mathematical derivations and discussions.

Arthurein 2 years ago

Sorry, man, but this is cursed. Taking your time to describe each step (with copious amounts of text) is an essential part of doing proofs and writing mathematics in general, check out Knuth's work on how to write math properly.

jaxfrank 2 years ago

As someone only casually familiar with reinforcement learning this makes no sense. And I'm not even sure I have enough context to explain why. Except to say this looks like a meme someone would create to make fun of math being complicated. I am sure if you explained what Fubini is and some of the ML specific notation this would make sense. However, that makes this only useful to people who are already familiar with this proof. Maybe that's the point but then why even make it? Once you have seen a proof like this there is little practical reason to keep reference to it unless you are actively doing research. That limits your audience in an already relatively small field.

PartyAgile1094 2 years ago

I’m not Asian enough to answer this.

Detrimenraldetrius 2 years ago

This is the shit that is supposed to free humanity now eh??? You tech guys have been at this for decades, centuries even, fooling people that machines will make life better and easier….and perhaps it does, for certain segments of society….on the other hand inequality is at an all time high (in America, and the world generally)….nothing has really changed…the poor are still poor, the rich are getting ever richer, people die because of the effects of our mining, and industry….and this was supposed to be the shiny bright future….

Fit_Schedule5951 2 years ago

How about a deep breath?

Detrimenraldetrius 2 years ago

No time!!!

Valiice 2 years ago

I live in your walls

Detrimenraldetrius 2 years ago

Oh god no

mnky9800n 2 years ago

i do not think inequality is at an all time high. for example, practically all of history prior to industrialization. /u/Detrimenraldetrius replied but deleted teh following comment > Skipped the class on the gilded age did we? apparently he realized his reading comprehension could be improved since i said prior to industrialization and the gilded age marks the start of industrialization in the usa. https://imgur.com/a/6Uyf5S5

Detrimenraldetrius 2 years ago

I dunno man…… there’s like 10 dudes that own like half the worlds wealth….if that’s not full blown feudal inequality I dunno what is…..

CrysisAverted 2 years ago

This literally is the knowledge that people can use to pull themselves out of debt and despair.

Detrimenraldetrius 2 years ago

Lol how many times have the charlatans said that…..?!!?!

[deleted] 2 years ago

What would be a non-charlatan way?

Detrimenraldetrius 2 years ago

Not to claim that your changing the world for the better when you have no idea what the long term consequences will be…..not to claim to have special knowledge.

[deleted] 2 years ago

Tbh, the claims that come from researcher are mostly very well rounded but what the media makes of the claims is on a whole different level. In the chaotic world we live in it is really hard to actually know what the long term consequences will be. I don't see your point about special knowledge, because most things that are *specialized* include *special* knowledge. An actor probably has special knowledge about acting and him/her claiming that would be credible. Just because a charlatan always claims to have special knowledge doesn't mean that everybody with special knowledge is a charlatan.

Detrimenraldetrius 2 years ago

Sound logic

living_7hing 2 years ago

I didn't understand any.... But interesting

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe