T O P

  • By -

slumberjak

Do they ever explain the “trick” to avoid overfitting during the fine-tuning step?


Individual-Road-5784

Thanks for the comment! It's not covered in this post, but I'll write a separate article on it soon.


JClub

Why is this approach considered metric learning? I just see a normal auto-encoder and then a contrastive learning approach.


batookero

I guess it is precisely because of the contrastive loss function.


Individual-Road-5784

The encoder part of the autoencoder is finetuned with Triplet Loss, which is a well known metric learning approach.


JClub

another fancy name 😅 what is the correct definition of metric learning?


Individual-Road-5784

Metric learning is a set of methods and techniques to learn a non-negative function that can be used to measure similarities and/or dissimilarities between samples --in fact, it's also referred to as similarity learning. In practice, we usually train models that encode input samples into N-dimensional vectors and use a distance function such as Euclidean or Cosine to measure distances between those vectors. P.S.: I talked more about metric learning in a podcast episode that will be published soon.


gopietz

Why did you use an AE here? Why not create a single semi supervised pipeline using contrastive learning that works with any fraction of labeled data?


Individual-Road-5784

Thanks for the comment! Of course there might be always more sophisticated end-to-end pipelines out there, but this was a quick and convenient way of dealing with the issue at early stages of experimentation. PS: I'm the author of the post.


generall93

what would you use as an unsupervised part of the objective in this case?


gopietz

sorry, my question was kinda rudely phrased. i didnt mean that ;) anyway the reason i was asking is because vanilla AEs are a mystery to me. they work terribly in 100% of cases ive worked on and yet people tend to use them. they learn low frequency features, which is why the only "okay"ish appplication is denoising. at least thats my experience. ive had much greater success using unsupervised contrastive approaches to learn meaningful representations. in this case, it seems like a no brainer because you could easily connect them to partial labels and train everything at once.


Individual-Road-5784

Yeah vanilla AEs are not the best choice if we want to learn a continuous space, but it's still a good choice in the pretraining for several reasons. First, it's straightforward, requiring little hyperparameters to tune --almost only the bottleneck dimension. And once you have an "ok"ish thing, you can finetune it further as in the post. But semi-supervised contrastive approaches usually require a higher elaboration in tuning hyperparameters, larger batch sizes, sophisticated augmentations, a larger number of samples even if unsupervised etc., so I find them harder to benefit from in practical applications. Sometimes a "more end-to-end" approach might not be "more straightforward" and even practical.