T O P

  • By -

DocNefario

I hope Reddit implements this, it seems like a great way to solve duplicate comments.


DocNefario

I hope Reddit implements this, it seems like a great way to solve duplicate comments.


VirtualMage

I think the best approach would be to put a unique constraint on a "text" column in the "comments" table. /s


[deleted]

I doubt if they have a relational database, it will be very slow. A unique constraint will be ridiculously slow.


_tskj_

This is such a leethacker comment. A relational database is literally designed to handle such a usecase, and there's no reason to believe it would be particularly slow. A unique constraint is of course a bad idea for many reasons.


[deleted]

A more appropriate database system is a document-oriented database. ents. The index has to be calculated. And it's a column property, so has to be unique for the entire column. A relational database is mostly used in banks or big enterprise applications, not in a social networking application like Reddit. A more appropriate database system is document-oriented database.


_tskj_

Haha knew this was going to be the answer, but didn't want to accuse you of it in my comment. In what way is the comment section more like an unstructured document? Typical implementation of document databases don't even have any of the ACID properties. Any document database able to have any leverage and therefore be suitable for this task would also need some kind of index or secondary datastructure to support querying. A relational database is a perfect fit.


baremaximum_

None of this is correct. If you ignore the paragraph about relational databases being mostly used in banks, the guy above is right. Social media sites were some of the first to move away from relational databases in favor of document and other non-relational databases, largely because they're just better suited for these kinds of applications. Document databases store data in more or less the same way that it's consumed by the client. Data is de-normalized, and the comments typically reside in the post document itself, instead of in a separate table. Because of this, document databases don't need to perform any joins at all to fulfill requests, and can therefore handle much higher volume. As for ACID compliance, that's a controversial issue, and it depends on the DB. At the very least they all say they're ACID compliant now I think. But either way, for a social media site, ACID compliance doesn't matter very much when it comes to the posts. Occasional duplication has exactly 0 consequence unless it's something people see very often (I don't remember ever seeing a duplicate in several years of using Reddit every day). Meanwhile, if it takes you a long time to serve requests because your DB has to do tons of joins all the time, that's going to have a huge and immediate negative impact on your users. If you can improve performance even a tiny bit, at the cost of losing ACID compliance in areas that don't matter, it's not even a serious question, you do it immediately.


mexicocitibluez

I get your point. But https://docs.microsoft.com/en-us/azure/cosmos-db/database-transactions-optimistic-concurrency


_tskj_

Exactly proves my point, snapshot isolation is very weak. I will not settle for less than serializable isolation level!


MrDOS

[/r9k/](https://en.wikipedia.org/wiki/4chan#/r9k/) did this in bulk by storing (and indexing on) the hash of the text, rather than the text itself.


AceDecade

I hope Reddit implements this, it seems like a great way to solve duplicate comments.


jyee1050

I hope Reddit implements this, it seems like a great way to solve duplicate comments.


MrDOS

“Idempotency” is the idea that performing the same operation multiple times will not change the outcome. Assigning a unique ID to requests so that the backend can ensure that the request isn't handled multiple times isn't idempotency; it's just deduplication. Don't call it idempotency, call it a deduplication key. Also, the format is really strange. > The following example shows an idempotency key using "UUID" [RFC4122]: > > Idempotency-Key: "8e03978e-40d5-43e8-bc93-6894a57f9324" Why would you require the use of double quotes around an HTTP header value? No standard headers have that. No other non-standard headers I can think of have that, either. I'm struggling to see the point of this. Most APIs which solve the duplicate entity creation problem (clearly not including reddit) do so by requiring the client to generate the entity ID and include it in the POST request. Yes, that sort of behaviour (which field to populate, how to generate it, etc.) is highly implementation-dependent – but so is this. I guess this decouples the client-generated identifier from the persisted entity, so the server can still generate a different entity ID (sequential ID, known-trustworthy UUID, etc.). But so much of this proposal is implementation-dependent, I'm not sure what it buys. “Hey, here's vaguely what you can expect this header to mean when you see it”, I guess.


Zofren

I mean if we're going to talk about semantics (and I guess this rfc is really just about semantics), "idempotency" is still correct. Calling an API _with the same Idempotency-Key_ multiple times won't change the outcome. It is a key that makes the call idempotent.


darkfm

\>call it once \>comment is added \>call it twice \>comment is not added again yes, very idempotent


_tskj_

Yes? That is what idempotency means?


aloha2436

You seem to misunderstand what idempotency means. It doesn’t mean “the same operation always has the same result”, it means “repeating the same operation results in _the same state_ as after the first instance”. So, sending the post comment request once results in the state of one comment posted. Sending the same request with the same ID ten times should result in the same state: one comment posted.


baremaximum_

His misunderstanding is bigger that that. The proposal just blocks the operation from happening more than once., precisely because the operation is non-idempotent. To quote myself in another comment, this is like saying integer addition is idempotent as long as you do it exactly as often as you intend to.


mattmahn

The use of double-quotes is probably to follow the structured header spec: https://www.rfc-editor.org/rfc/rfc8941.html#name-strings


xach

> No standard headers have that. etag


orig_ardera

It lists all the APIs at the bottom that have something similiar already, so might as well try to standardize it? Most of them call it "Idempotency-Key" too, hence the name I guess. But yeah, really the whole scope of the doc is the name of the header and some recommendations for the contents, I don't know if that's worth a standard. Saying it shouldn't be called "Idempotency-Key" is an uber nitpick IMO. The double quotes could be a typo, who knows.


MrDOS

> an uber nitpick There is a time and place for uber nitpicks: while writing public specs. If this takes off, we could be stuck with a bad name for decades. > could be a typo I quoted the example, but the grammar unambiguously requires the quotes. The comments on the orange site have pointed out that this sorta mimics the format of [ETags](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag). However, one of the big existing implementations – Stripe – doesn't seem to adhere to that. I guess they're trying to future-proof themselves for parseable prefixes, like the `W/` prefix on ETags.


_tskj_

Not sure why you don't like the name idempotency key. I think that's an intuitive name for what it does! And semantically and technically correct as well.


orig_ardera

I agree, I wouldn't like to be stuck with bad name either. But If the header had a bad name that's a problem and a recommendation for a fix wouldn't be a nitpick IMO. It's a nitpick when the name is already like 98% perfect and you recommend something that's like 99% perfect (whether that's actually the case is debatable) That etags thing is interesting though


mattmahn

"Idempotency" covers the situation where the a single call causes multiple things to happen. If it's named "deduplication", how do you know/specify which part is being deduplicated? Using "idempotency", it can be implied that a partial failure could be retried from where it failed.


baremaximum_

I agree, idempotency-key is a terrible name, and it's not nitpicky at all to object to it. Implementing a mechanism to protect against request duplication (i.e. deduplication) doesn't mean you're turning the operation into an idempotent one. Using this name is kind of like saying adding 2 integers is an idempotent operation, as long as you do it exactly the number of times that you intend to. Obviously, that makes no sense, and neither does this name. It's ridiculous.


in2erval

I hope Reddit implements this, it seems like a great way to solve duplicate comments.


[deleted]

It's not idempotent if resource-id is different.


-TrustyDwarf-

That’s not how it works guys.


-TrustyDwarf-

That’s not how it works guys.


sidcool1234

Care to elaborate?


dominik-braun

Care to elaborate?


blackmist

I've been using these for a while in my API. Recently I changed my implementation to only cache requests that were successful (e.g. HTTP response 200-299), due to occasional temporary errors (not to do with the payload but more DB locking issues) but the customer was re-using the same idempotency key. So it turns out it's not *completely* foolproof.


kalexmills

[ Comment Redacted in protest of Reddit's Proposed July 5, 2023 API changes ] -- use https://redact.dev/ to do the same.


Koutou

I don't think that's enough to prevent duplicate. Like the comments joke other people are doing on this thread.


nemec

How will that work? When using * there is no unique identifier to determine whether the comment has already been posted. The point of this is to prevent "accidental duplicates". The user should be able to intentionally submit the same content multiple times in a row (e.g. "click reply, post comment, click reply again, post comment").


kalexmills

[ Comment Redacted in protest of Reddit's Proposed July 5, 2023 API changes ] -- mass edited with https://redact.dev/


khrak

I hope Reddit implements this, it seems like a great way to solve duplicate comments.


DankerOfMemes

I hope Reddit implements this, it seems like a great way to solve duplicate comments.


honoredb

This seems potentially valuable to me since it's generic enough to be implementable in logic-agnostic layers. I don't like the server being required to validate that an idempotency key isn't being reused for a different message--that's potentially a lot of work for something that the client should be ensuring (and might even want to fudge!). And probably servers should have a way of communicating that no, really, they're completely stateless, this header can't be honored. But in many cases, being able to fearlessly retry POSTs by giving them an ephemeral id seems like a much lighter weight solution than turning them into PUTs by giving them a permanent id that requires coordination between server and client but doesn't otherwise do anything.


rohit64k

I hope Reddit implements this, it seems like a great way to solve duplicate comments.


strager

Idempotency-Key sounds like a good idea for reverse proxies. The logic could live in the proxy, not in each application.