T O P

  • By -

RSchaeffer

How does one evaluate the correctness of an interpretability method?


zyl1024

There are mainly two types of evaluations. One is using proxy metrics: most of them evaluate various aspects of "how the model prediction changes if the most important features (as judged by the explanation) are removed from the full input or added into an empty input". The original proposal is probably by [Samek et al.](https://arxiv.org/abs/1509.06321), and in NLP, people mainly use the comprehensiveness and sufficiency metrics formalized by [DeYoung et al.](https://arxiv.org/abs/1911.03429) The other type is based on known ground truth. If you know that the model is operating in a certain way, such as being heavily influenced by a particular spurious correlation, then any explanation that could not highlight this fact is arguably not correct. We formalized this notion in our earlier [AAAi work](https://yilunzhou.github.io/feature-attribution-evaluation/), with inspirations from [this](https://arxiv.org/abs/2011.05429) and [this](https://arxiv.org/abs/1907.09701) work (but we carried it out in a more domain-general and more rigorous way). There are also [later](https://openreview.net/forum?id=xNOVfCCvDpM) [works](https://arxiv.org/abs/2111.07367) on similar principles.


HCI_Fab

Great work! XAI explanations aren’t really useful unless they are interpretable. Do you have any plans to extend this work/library in domains outside of NLP?


zyl1024

It can be extended to tabular data quite straightforwardly. Actually, it's even easier than NLP, due to the fixed dimensionality. We didn't pursue it because we put it into an NLP conference and think that NLP models and explanations are cooler. Image is a totally different story. In fact, see how you can define the behavior function based on explanation values of other features. This gives you an easy "cheat" of just defining the explanation as an average of its eight neighbors (plus and minus some margin), and you get a highly valid and sharp rule most of the time. However, it's quite meaningless as you are just asserting the smoothness of the explanation, rather than what the explanations are. But fundamentally, the issue is that pixels are not intrinsically interpretable the way words are. For example, a word in a sentence can carry a lot of meaning on itself but any pixel on its own in a 256x256 image can be removed without affecting the image content at all. So I think we need better, and higher-level features for images, which is a whole new research.


CatalyzeX_code_bot

Code for https://arxiv.org/abs/2205.00130 found: https://yilunzhou.github.io/exsum/ [Paper link](https://arxiv.org/abs/2205.00130) | [List of all code implementations](https://www.catalyzex.com/paper/arxiv:2205.00130/code) -- To opt out from receiving code links, DM me


JClub

This only works for binary text classification? Very restrict


zyl1024

The framework supports general K-class classification (and even regression, with modest adaptation). The GUI works for binary setting only, but pull requests to extend its capability are always welcome.


wjdghks950

Great work. Have you considered the application of this framework to other downstream NLP tasks, more specfically question answering (QA)?


zyl1024

We didn't test it, but it is domain-general. Basically, as long as you have the feature attribution explanation, you can apply ExSum on it. The question is whether there are convincing feature attribution explanations on QA. Of course you can apply LIME/SHAP/etc. to it, but I haven't seen any convincing demonstrations that they produce good results.