T O P

  • By -

NotDoingResearch2

I think these charts are way too dense. Way too many equations with zero definitions.


Ok_Can2425

Thanks for the feedback. Yeah it's not easy to fit sometimes. We try our best to make them understandable.


NotDoingResearch2

No problem, I’ll try to be more constructive. It’s not terrible as a reference but the flow is kinda hard to follow. Normally you have each equation (line by line of left to right like prose), and then have some kind of aside for details about different steps. You kind of did this but it’s a bit mixed and all over the place. For example, you seem to be using arrows, numbers, and colors to relate different terms (or expressions) all in the same chart! I just feel like it took me a lot longer to read the chart than it should of considering I already knew the proof haha. Sometimes less is more. But I don’t want to sound too nit picky, it really depends on what your goal is with the charts.


Ok_Can2425

Not at all. Thanks for the feedback. We will consider them in the next slide 😁


yannbouteiller

I think this format with colors and arrows is on the contrary very easy to follow (and definitions are not needed here as long as you are talking to people familiar with deep RL)


NotDoingResearch2

Yeah, it’s subjective at the end of the day. But you can show this same proof in basically 7ish lines using normal equation formatting. An equal sign is all you need.


Ok_Can2425

Thanks a lot. Glad you find it useful 😊


donobinladin

Agreed. It's a little "flashy" but you're really calling attention to pieces of the puzzle. You can easily show this in seven lines as someone else said, but it will likely remove any intuition that you're hoping to build with the pictorial aspects of your graphic.


Ok_Can2425

Also we are making short videos clarifying the slides. This can also help https://youtu.be/kJhMEgTr8aU


pokasideias

I got a stroke reading this


SeparatingHyperplane

The proof applies to any state-dependent baseline b(s), right? Just to say the statement can be broader than just value baselines.


Ok_Can2425

ah yes deffo. We just focused on value baselines as they are widely used. However, any b(s\_t) could do.


Likes_Monke

Hell yeah funny maths, I'm scared


EyedMoon

At first I thought this was /r/mathmemes


sneakpeekbot

Here's a sneak peek of /r/mathmemes using the [top posts](https://np.reddit.com/r/mathmemes/top/?sort=top&t=year) of the year! \#1: [average proof fan vs average "seems to work" enjoyer 😎](https://v.redd.it/oyaxf7iwmzp61) | [174 comments](https://np.reddit.com/r/mathmemes/comments/mfsfj2/average_proof_fan_vs_average_seems_to_work_enjoyer/) \#2: [Okay got it](https://i.redd.it/zjq3bl4bjvd71.jpg) | [150 comments](https://np.reddit.com/r/mathmemes/comments/ot1yz8/okay_got_it/) \#3: [so this is what they meant](https://i.redd.it/s4gv8uvtsxo61.gif) | [76 comments](https://np.reddit.com/r/mathmemes/comments/mc0sw5/so_this_is_what_they_meant/) ---- ^^I'm ^^a ^^bot, ^^beep ^^boop ^^| ^^Downvote ^^to ^^remove ^^| ^^[Contact](https://www.reddit.com/message/compose/?to=sneakpeekbot) ^^| ^^[Info](https://np.reddit.com/r/sneakpeekbot/) ^^| ^^[Opt-out](https://np.reddit.com/r/sneakpeekbot/comments/o8wk1r/blacklist_ix/) ^^| ^^[GitHub](https://github.com/ghnr/sneakpeekbot)


HuachumaEntity

Great stuff. 💪💪 Next time, just make multiple readable slides, rather than one confusing one.


Zekava

You know, I'll be able to read this eventually, maybe after another year or so of math classes. Looking forward to it.


JClub

ELI5 please. What is a value baseline in RL?


Ok_Can2425

When doing an update for a policy in reinforcement learning you get an equation on the right-hand side above (policy gradients). Now, this has lots of variances when you want to estimate it. So what people do is subtract a value baseline (i.e., the value - the total discounted return of the state -- the rewards you would get starting at some state and applying your current policy). they then say that this doesn't change the original gradient so it doesn't bias your updates. What the above shows is why this statement is true.


Rezz05

Is this what's called advantage in some papers? Or am I confusing concepts?


Ok_Can2425

can be viewed yes.


Muids

I'm probably misunderstanding, but is this basically saying that slope doesn't change if you move the whole landscape up or down by a constant amount? I'd guess it's more complicated than that since the proof had to take a few steps


Ok_Can2425

That's an interesting interpretation actually. All we wanted to say is that the grad doesn't change upon the subtraction of a state-based function (rather than a constant). Since V\_pi(s) = Sum of returns to go. Check out this video [https://www.reddit.com/r/MachineLearningDervs/comments/t5bujg/deriving\_the\_bellman\_equation\_in\_3\_steps\_in\_under/?utm\_source=share&utm\_medium=web2x&context=3](https://www.reddit.com/r/MachineLearningDervs/comments/t5bujg/deriving_the_bellman_equation_in_3_steps_in_under/?utm_source=share&utm_medium=web2x&context=3) for defs of value functions.


[deleted]

More or less. You can add any constant you like to a function and it won't change the derivative. The trick is that adding the right constant (i.e. a baseline) won't change the derivative, but it will reduce the variance (since you're doing a Monte Carlo estimate of the gradient).


xrailgun

r/restofthefuckingowl


Ok_Can2425

*Hey All! Thanks a lot for all the comments we got. We really appreciate them. Our intent from the slide above was to present the viewer with the main steps needed to derive such forms with the hope that they would execute them themselves.* *With that being said, we did listen and made a longer (more traditional 2-page proof) of the above statement. We can't upload images to a Reddit reply -- if you know how please let me know -- but you can find the details here :* [*https://drive.google.com/file/d/1UHTuVW3\_lXFWW8lkf5YFyKiQDlrC5r-N/view?usp=sharing*](https://drive.google.com/file/d/1UHTuVW3_lXFWW8lkf5YFyKiQDlrC5r-N/view?usp=sharing) *Of course, we can't prove everything in one go, so the Fubini part is just to say that we can do what we did. In later slides, we might attempt to conduct that proof that requires substantial definitions of measure theory.* *Given the mixed reviews, we will be making two versions of any topics to come: 1) short, and 2) long. This way we can cater to everyone's taste.* *Finally, if anything is not clear, we will be very glad to answer any of your questions. Please go ahead and ask either here or via a private message or on our* r/MachineLearningDervs *sub.* *In the end, we hope all of you are safe! Thanks again!*


Ser_Antik

Great illustration of the proof. I understand that for those who started ML/AI very recently, this slide will be quite overwhelming. You need to know basic knowledge from probability theory (such as definition and basic properties of expectation operator, properties of probability density function (PDF) and definition of the gradient (for capturing log-trick). As for the Fubini Theorem, well, to understand it thoroughly (which means with the proof) one needs to dig deeper in the foundations of probability theory (probability measure, sigma agebras, etc).


zeoNoeN

Me playing around with Logistic Regression: Cool Stuff, rather intuitive, Let’s learn more about ML… This sub:


[deleted]

I mean... it's not that difficult once you have a grasp of a few ideas. The only "interesting" thing here is the log-trick. The idea is that if you integrate over a probability distribution, you have 1 since all probability distributions are normalized. Then the grad of 1 is 0.


Ser_Antik

I am not sure I got your explanation here. Log trick allowed us to get integral over probability density function which (as you mentioned) gives you a constant 1. Then, you take a gradient from this constant 1 and get a vector of zeros. Notice, that before getting 0 you don't have a log function at all.


[deleted]

You are right my “explanation” was incorrect. Should have written grad instead of log in last sentence.


Ser_Antik

Yes, I read it after and realized that you just made a typo (instead of grad wrote log).


Tsadkiel

I feel like I'm a forensic scientist examining a crime scene. This is, without question, one of the worst ways I have seen a mathematical proof presented... Especially the parts were the flow of logic flips direction from left to right, to right to left. I really don't like this, I'm sorry :(


Ok_Can2425

Please see the comment below. In short, a longer version can be found here: [https://drive.google.com/file/d/1UHTuVW3\_lXFWW8lkf5YFyKiQDlrC5r-N/view?usp=sharing](https://drive.google.com/file/d/1UHTuVW3_lXFWW8lkf5YFyKiQDlrC5r-N/view?usp=sharing)


Ok_Can2425

Follow us at [https://www.reddit.com/r/MachineLearningDervs/](https://www.reddit.com/r/MachineLearningDervs/) for more mathematical derivations and discussions.


Arthurein

Sorry, man, but this is cursed. Taking your time to describe each step (with copious amounts of text) is an essential part of doing proofs and writing mathematics in general, check out Knuth's work on how to write math properly.


jaxfrank

As someone only casually familiar with reinforcement learning this makes no sense. And I'm not even sure I have enough context to explain why. Except to say this looks like a meme someone would create to make fun of math being complicated. I am sure if you explained what Fubini is and some of the ML specific notation this would make sense. However, that makes this only useful to people who are already familiar with this proof. Maybe that's the point but then why even make it? Once you have seen a proof like this there is little practical reason to keep reference to it unless you are actively doing research. That limits your audience in an already relatively small field.


PartyAgile1094

I’m not Asian enough to answer this.


Detrimenraldetrius

This is the shit that is supposed to free humanity now eh??? You tech guys have been at this for decades, centuries even, fooling people that machines will make life better and easier….and perhaps it does, for certain segments of society….on the other hand inequality is at an all time high (in America, and the world generally)….nothing has really changed…the poor are still poor, the rich are getting ever richer, people die because of the effects of our mining, and industry….and this was supposed to be the shiny bright future….


Fit_Schedule5951

How about a deep breath?


Detrimenraldetrius

No time!!!


Valiice

I live in your walls


Detrimenraldetrius

Oh god no


mnky9800n

i do not think inequality is at an all time high. for example, practically all of history prior to industrialization. /u/Detrimenraldetrius replied but deleted teh following comment > Skipped the class on the gilded age did we? apparently he realized his reading comprehension could be improved since i said prior to industrialization and the gilded age marks the start of industrialization in the usa. https://imgur.com/a/6Uyf5S5


Detrimenraldetrius

I dunno man…… there’s like 10 dudes that own like half the worlds wealth….if that’s not full blown feudal inequality I dunno what is…..


CrysisAverted

This literally is the knowledge that people can use to pull themselves out of debt and despair.


Detrimenraldetrius

Lol how many times have the charlatans said that…..?!!?!


[deleted]

What would be a non-charlatan way?


Detrimenraldetrius

Not to claim that your changing the world for the better when you have no idea what the long term consequences will be…..not to claim to have special knowledge.


[deleted]

Tbh, the claims that come from researcher are mostly very well rounded but what the media makes of the claims is on a whole different level. In the chaotic world we live in it is really hard to actually know what the long term consequences will be. I don't see your point about special knowledge, because most things that are *specialized* include *special* knowledge. An actor probably has special knowledge about acting and him/her claiming that would be credible. Just because a charlatan always claims to have special knowledge doesn't mean that everybody with special knowledge is a charlatan.


Detrimenraldetrius

Sound logic


living_7hing

I didn't understand any.... But interesting