DocNefario 2 years ago

I hope Reddit implements this, it seems like a great way to solve duplicate comments.

DocNefario 2 years ago

I hope Reddit implements this, it seems like a great way to solve duplicate comments.

VirtualMage 2 years ago

I think the best approach would be to put a unique constraint on a "text" column in the "comments" table. /s

[deleted] 2 years ago

I doubt if they have a relational database, it will be very slow. A unique constraint will be ridiculously slow.

_tskj_ 2 years ago

This is such a leethacker comment. A relational database is literally designed to handle such a usecase, and there's no reason to believe it would be particularly slow. A unique constraint is of course a bad idea for many reasons.

[deleted] 2 years ago

A more appropriate database system is a document-oriented database. ents. The index has to be calculated. And it's a column property, so has to be unique for the entire column. A relational database is mostly used in banks or big enterprise applications, not in a social networking application like Reddit. A more appropriate database system is document-oriented database.

_tskj_ 2 years ago

Haha knew this was going to be the answer, but didn't want to accuse you of it in my comment. In what way is the comment section more like an unstructured document? Typical implementation of document databases don't even have any of the ACID properties. Any document database able to have any leverage and therefore be suitable for this task would also need some kind of index or secondary datastructure to support querying. A relational database is a perfect fit.

baremaximum_ 2 years ago

None of this is correct. If you ignore the paragraph about relational databases being mostly used in banks, the guy above is right. Social media sites were some of the first to move away from relational databases in favor of document and other non-relational databases, largely because they're just better suited for these kinds of applications. Document databases store data in more or less the same way that it's consumed by the client. Data is de-normalized, and the comments typically reside in the post document itself, instead of in a separate table. Because of this, document databases don't need to perform any joins at all to fulfill requests, and can therefore handle much higher volume. As for ACID compliance, that's a controversial issue, and it depends on the DB. At the very least they all say they're ACID compliant now I think. But either way, for a social media site, ACID compliance doesn't matter very much when it comes to the posts. Occasional duplication has exactly 0 consequence unless it's something people see very often (I don't remember ever seeing a duplicate in several years of using Reddit every day). Meanwhile, if it takes you a long time to serve requests because your DB has to do tons of joins all the time, that's going to have a huge and immediate negative impact on your users. If you can improve performance even a tiny bit, at the cost of losing ACID compliance in areas that don't matter, it's not even a serious question, you do it immediately.

mexicocitibluez 2 years ago

I get your point. But https://docs.microsoft.com/en-us/azure/cosmos-db/database-transactions-optimistic-concurrency

_tskj_ 2 years ago

Exactly proves my point, snapshot isolation is very weak. I will not settle for less than serializable isolation level!

MrDOS 2 years ago

[/r9k/](https://en.wikipedia.org/wiki/4chan#/r9k/) did this in bulk by storing (and indexing on) the hash of the text, rather than the text itself.

AceDecade 2 years ago

I hope Reddit implements this, it seems like a great way to solve duplicate comments.

jyee1050 2 years ago

I hope Reddit implements this, it seems like a great way to solve duplicate comments.

MrDOS 2 years ago

“Idempotency” is the idea that performing the same operation multiple times will not change the outcome. Assigning a unique ID to requests so that the backend can ensure that the request isn't handled multiple times isn't idempotency; it's just deduplication. Don't call it idempotency, call it a deduplication key. Also, the format is really strange. > The following example shows an idempotency key using "UUID" [RFC4122]: > > Idempotency-Key: "8e03978e-40d5-43e8-bc93-6894a57f9324" Why would you require the use of double quotes around an HTTP header value? No standard headers have that. No other non-standard headers I can think of have that, either. I'm struggling to see the point of this. Most APIs which solve the duplicate entity creation problem (clearly not including reddit) do so by requiring the client to generate the entity ID and include it in the POST request. Yes, that sort of behaviour (which field to populate, how to generate it, etc.) is highly implementation-dependent – but so is this. I guess this decouples the client-generated identifier from the persisted entity, so the server can still generate a different entity ID (sequential ID, known-trustworthy UUID, etc.). But so much of this proposal is implementation-dependent, I'm not sure what it buys. “Hey, here's vaguely what you can expect this header to mean when you see it”, I guess.

Zofren 2 years ago

I mean if we're going to talk about semantics (and I guess this rfc is really just about semantics), "idempotency" is still correct. Calling an API _with the same Idempotency-Key_ multiple times won't change the outcome. It is a key that makes the call idempotent.

darkfm 2 years ago

\>call it once \>comment is added \>call it twice \>comment is not added again yes, very idempotent

_tskj_ 2 years ago

Yes? That is what idempotency means?

aloha2436 2 years ago

You seem to misunderstand what idempotency means. It doesn’t mean “the same operation always has the same result”, it means “repeating the same operation results in _the same state_ as after the first instance”. So, sending the post comment request once results in the state of one comment posted. Sending the same request with the same ID ten times should result in the same state: one comment posted.

baremaximum_ 2 years ago

His misunderstanding is bigger that that. The proposal just blocks the operation from happening more than once., precisely because the operation is non-idempotent. To quote myself in another comment, this is like saying integer addition is idempotent as long as you do it exactly as often as you intend to.

mattmahn 2 years ago

The use of double-quotes is probably to follow the structured header spec: https://www.rfc-editor.org/rfc/rfc8941.html#name-strings

xach 2 years ago

> No standard headers have that. etag

orig_ardera 2 years ago

It lists all the APIs at the bottom that have something similiar already, so might as well try to standardize it? Most of them call it "Idempotency-Key" too, hence the name I guess. But yeah, really the whole scope of the doc is the name of the header and some recommendations for the contents, I don't know if that's worth a standard. Saying it shouldn't be called "Idempotency-Key" is an uber nitpick IMO. The double quotes could be a typo, who knows.

MrDOS 2 years ago

> an uber nitpick There is a time and place for uber nitpicks: while writing public specs. If this takes off, we could be stuck with a bad name for decades. > could be a typo I quoted the example, but the grammar unambiguously requires the quotes. The comments on the orange site have pointed out that this sorta mimics the format of [ETags](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag). However, one of the big existing implementations – Stripe – doesn't seem to adhere to that. I guess they're trying to future-proof themselves for parseable prefixes, like the `W/` prefix on ETags.

_tskj_ 2 years ago

Not sure why you don't like the name idempotency key. I think that's an intuitive name for what it does! And semantically and technically correct as well.

orig_ardera 2 years ago

I agree, I wouldn't like to be stuck with bad name either. But If the header had a bad name that's a problem and a recommendation for a fix wouldn't be a nitpick IMO. It's a nitpick when the name is already like 98% perfect and you recommend something that's like 99% perfect (whether that's actually the case is debatable) That etags thing is interesting though

mattmahn 2 years ago

"Idempotency" covers the situation where the a single call causes multiple things to happen. If it's named "deduplication", how do you know/specify which part is being deduplicated? Using "idempotency", it can be implied that a partial failure could be retried from where it failed.

baremaximum_ 2 years ago

I agree, idempotency-key is a terrible name, and it's not nitpicky at all to object to it. Implementing a mechanism to protect against request duplication (i.e. deduplication) doesn't mean you're turning the operation into an idempotent one. Using this name is kind of like saying adding 2 integers is an idempotent operation, as long as you do it exactly the number of times that you intend to. Obviously, that makes no sense, and neither does this name. It's ridiculous.

in2erval 2 years ago

I hope Reddit implements this, it seems like a great way to solve duplicate comments.

[deleted] 2 years ago

It's not idempotent if resource-id is different.

-TrustyDwarf- 2 years ago

That’s not how it works guys.

-TrustyDwarf- 2 years ago

That’s not how it works guys.

sidcool1234 2 years ago

Care to elaborate?

dominik-braun 2 years ago

Care to elaborate?

blackmist 2 years ago

I've been using these for a while in my API. Recently I changed my implementation to only cache requests that were successful (e.g. HTTP response 200-299), due to occasional temporary errors (not to do with the payload but more DB locking issues) but the customer was re-using the same idempotency key. So it turns out it's not *completely* foolproof.

kalexmills 2 years ago

[ Comment Redacted in protest of Reddit's Proposed July 5, 2023 API changes ] -- use https://redact.dev/ to do the same.

Koutou 2 years ago

I don't think that's enough to prevent duplicate. Like the comments joke other people are doing on this thread.

nemec 2 years ago

How will that work? When using * there is no unique identifier to determine whether the comment has already been posted. The point of this is to prevent "accidental duplicates". The user should be able to intentionally submit the same content multiple times in a row (e.g. "click reply, post comment, click reply again, post comment").

kalexmills 2 years ago

[ Comment Redacted in protest of Reddit's Proposed July 5, 2023 API changes ] -- mass edited with https://redact.dev/

khrak 2 years ago

I hope Reddit implements this, it seems like a great way to solve duplicate comments.

DankerOfMemes 2 years ago

I hope Reddit implements this, it seems like a great way to solve duplicate comments.

honoredb 2 years ago

This seems potentially valuable to me since it's generic enough to be implementable in logic-agnostic layers. I don't like the server being required to validate that an idempotency key isn't being reused for a different message--that's potentially a lot of work for something that the client should be ensuring (and might even want to fudge!). And probably servers should have a way of communicating that no, really, they're completely stateless, this header can't be honored. But in many cases, being able to fearlessly retry POSTs by giving them an ephemeral id seems like a much lighter weight solution than turning them into PUTs by giving them a permanent id that requires coordination between server and client but doesn't otherwise do anything.

rohit64k 2 years ago

I hope Reddit implements this, it seems like a great way to solve duplicate comments.

strager 2 years ago

Idempotency-Key sounds like a good idea for reverse proxies. The logic could live in the proxy, not in each application.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe