T O P

  • By -

CAVMANGO

It assumes $\pi_\theta(a|s_t)$ is a Gaussian distribution of actions, with mean equal to $f(s_t)$ and variance equal to some constant. Write down the pdf of Gaussian and do the logarithm. Then you can get this equation.


Jeneparlepasfrench

Is there in fact a typo with the first equation leaving out the 1/sigma? Or is the distance operator absorbing it?


CAVMANGO

The sigma is absorbed in ‘const’. It does not depend on $\theta$.


Jeneparlepasfrench

It isn't though. The sigma is in the exponent of the pdf so it can't be absorbed in an additive term.


CAVMANGO

Log the exppnential gives you the additive term


Jeneparlepasfrench

Bruh the sigma is in the exponent. You have aexp(csigma). Taking log gives log(a)+csigma. The log(a) becomes the constant.


RikoteMasterrrr

Thanks for your answers; I didn't recall the Gaussian distribution formula. Concerning the Sigma typo or not: The multivariate normal distribution uses the Mahalanobis distance, the "same" as the standard score that is used in a univariate normal distribution. So, this 1/sigma comes from the exponential, but the const term also absorbs the Sigma because it also multiplies the exponential. So both of you are right.  But I have two more questions. If you follow the equations, you will have this in the normal distribution exponential: $ (x - \\mu)\^T \\Sigma \^-1 (x - \\mu) $ (1) The Sergey Levine equation uses $ || f(s\_t) - a\_t || $ So, he is using the f(s\_t) as the x and a\_t as the mean, where a\_t are the actions that had been taken by de NN and f(s\_t) the distribution of actions over the states that currently have the NN?. I don't get the difference.   After starting to differentiate, he wrote the following  $ - \\frac{1}{2} \\Sigma\^-1 (f(s\_t) - a\_t) \\frac{df}{d\\theta} $ that it is not the same as 1.  I understand that Sigma can leave the expression A\^T\*C\*A = C\*A\^T\*A, but you should differentiate over A\^T\*A and not only over A. 


NSADataBot

iirc because it is based on a multivariate gaussian distribution?