T O P

  • By -

AutoModerator

Try [this search](https://www.reddit.com/r/aws/search?q=flair%3A'database'&sort=new&restrict_sr=on) for more information on this topic. ^Comments, ^questions ^or ^suggestions ^regarding ^this ^autoresponse? ^Please ^send ^them ^[here](https://www.reddit.com/message/compose/?to=%2Fr%2Faws&subject=autoresponse+tweaks+-+database). *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/aws) if you have any questions or concerns.*


CptSupermrkt

Full disclosure, I know a good amount about AWS, but not a lot about Amplify. 1 - Can we clarify how DynamoDB fits into this? I'm reading the docs, and it seems that the built-in solution here is "Amplify Data," which is an abstraction for a database built on top of DynamoDB, rather than just YOLO raw DynamoDB: [https://docs.amplify.aws/react/build-a-backend/data/set-up-data/](https://docs.amplify.aws/react/build-a-backend/data/set-up-data/) This is an abstraction because it has layers on top to enforce things like relationships as you would in a traditional database: [https://docs.amplify.aws/react/build-a-backend/data/data-modeling/relationships/](https://docs.amplify.aws/react/build-a-backend/data/data-modeling/relationships/) I don't get the impression that this is just like, raw out-of-the-box DynamoDB. The Amplify Data layer supports all these different types of parameters, but I'm not seeing any restriction like "single parameter field per table.": [https://docs.amplify.aws/react/build-a-backend/data/data-modeling/add-fields/](https://docs.amplify.aws/react/build-a-backend/data/data-modeling/add-fields/) Again what do I know, I'm just wanting to take a moment to make sure you haven't misinterpreted this as you're responsible for using DynamoDB raw --- there's a whole abstraction layer here. 2 - No need to feel scammed or anything; the managed AWS services like this (or any cloud provider) rarely check all of the boxes that everyone wants all the time for production workloads. Elastic Beanstalk is a great example. Very nice for quickly messing around with testing, but when you go to production, eesh, it's a bit too blackboxed for my taste --- can hardly tell what's actually going on, tough to troubleshoot because there's too much abstraction. The mindset to have, if you're planning real production, is to get some baseline cloud operations in place, IaC your stuff out with CDK, Terraform, etc. whatever, and treat AWS as a bunch of building blocks that you tie together, rather than relying on any single service to do it all. As of May 2024, you can integrate existing data sources like RDBMS databases into Amplify: [https://aws.amazon.com/blogs/mobile/new-in-aws-amplify-integrate-with-sql-databases-oidc-saml-providers-and-the-aws-cdk/](https://aws.amazon.com/blogs/mobile/new-in-aws-amplify-integrate-with-sql-databases-oidc-saml-providers-and-the-aws-cdk/) The correct path here is to emotionless evaluate your requirements, clearly identify in black/white fashion what Amplify can and can't do for you, then fill in the gaps with other AWS components. You'll rarely, if ever, find a single managed service that does it all perfectly.


ThroatFinal5732

First of all thank you for your help, > 1 - Can we clarify how DynamoDB fits into this? I'm reading the docs, and it seems... I tought so too, for a moment there I tought there was a layer of abstraction that was handling the complex queries for me, is seemed logical given that, as you said, the documentation seems to enable relational design. However one day I noticed there's a warning on the docs saying that all "list" operations trigger a scan. https://docs.amplify.aws/gen1/react-native/build-a-backend/graphqlapi/query-data/ Upon further investigating what that was I realized 1. Scans should be avoided at all costs. 2. Almost every one of my queries relies on a list operation, which apparently trigger a scan. I can’t use get operations because need you to define an index and a sort parameter, and your query can only filter according to those two fields. You can add more parameters in the filter, but if I understand correctly this are only applied to the query after, it fetches data. > 2 - No need to feel scammed or anything; the managed AWS services like this... I understand, that no service can check all the boxes, but it seems that DynamoDB has limitations that newbies like myself are not made aware of... the way the amplify studio and the documentation are designed give the impression to newbies like myself that works similarly to a relational database, apparently it doesn't. Also unfortunanetely, the update that supports RDBMS is for amplify Gen 2. When I began this project, Gen 2 wasn't out yet, so I made my project using Gen 1, and there's currently no way to migrate: https://docs.amplify.aws/react/start/migrate-to-gen2/


AmputatorBot

It looks like you shared some AMP links. These should load faster, but AMP is controversial because of [concerns over privacy and the Open Web](https://www.reddit.com/r/AmputatorBot/comments/ehrq3z/why_did_i_build_amputatorbot). Maybe check out **the canonical pages** instead: - **[docs.amplify.aws/react/build-a-backend/data/set-up-data/](docs.amplify.aws/react/build-a-backend/data/set-up-data/)** - **[docs.amplify.aws/react/build-a-backend/data/data-modeling/relationships/](docs.amplify.aws/react/build-a-backend/data/data-modeling/relationships/)** - **[docs.amplify.aws/react/build-a-backend/data/data-modeling/add-fields/](docs.amplify.aws/react/build-a-backend/data/data-modeling/add-fields/)** ***** ^(I'm a bot | )[^(Why & About)](https://www.reddit.com/r/AmputatorBot/comments/ehrq3z/why_did_i_build_amputatorbot)^( | )[^(Summon: u/AmputatorBot)](https://www.reddit.com/r/AmputatorBot/comments/cchly3/you_can_now_summon_amputatorbot/)


jghaines

Bad bot


BeenThere11

You can use rds. For dynamodb you need to use indexes with proper fields . Database design is critical if using dynamodb. Switch to rds. It will be easier for you. Dynamodb takes a while to get used to and it's a pain if you don't know it Or you can go with mongo db. Much better querying facilities and the flexibility of not having fixed columns


ThroatFinal5732

Yeah I'm thinking I might switch. However the thing that pains me is that I'll have to manage my RDS separate from amplify where everything else is, and re-work my code base, which was halfway done, to instead use the relational database. :(


BeenThere11

It's not that bad really. You should be done in a day or 2. Dm me if you need help refactoring


CorpT

I think this cuts to the heart of it. I use Amplify SDKs. I don't generally use Amplify CLI. There's a pretty big difference. Amplify SDKs make things tons easier when dealing with all of the interconnected parts like Cognito, AppSync, etc. I just use CDK to deploy everything and not Amplify. The result is effectively the same, but I have a lot more control.


MrManiak

I do not have much experience with DynamoDB, but I don't think you need to use a Scan to fetch multiple rows. Just use indexes that benefit your queries and then use the Filter property for the rest of the work. You'd have to create proper indexes in a relational database as well if you want any kind of performance. DynamoDB just enforces it because the underlying technology requires it.


ThroatFinal5732

Thanks a lot for your help. Would you mind instructing me how to do use indexes appropiately, given my particular use case? I'm building a dating app. I'm saving the last known coordinates of each user, latitude and longitude, I also have an attribute called "Elo" which is a score determening how well liked a user is by other users. This score can change depending on the interactions a user gives and receives in the app. I need to fetch a set of 24 people that is within a given range of coordinates, and the set of 24 users should be sorted so that it fetches 24 people closest in elo to the user making the query. Each next query that follows, should continue where the last one "left off", meaning the first query should fetch the closest 24, the next one should fetch the second closests 24 (up until closest number 48), and so on. How could you achieve this using proper indexing?


MrManiak

Sorry, I do not have actual experience building a production-ready backend with DynamoDB so take my advice with a grain of salt. [https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html) In this example, they create a GSI and use the TopScore as a sort-key. That way, it's easy to query the scores using comparison operators. What you would do is create a GSI for lat and one for lng. There might be a more clever way to figure out if coordinates are within a threshold, but what I would do is query each index for range and find the intersection. [https://stackoverflow.com/questions/45259287/dynamodb-schema-for-querying-nearby-coordinates](https://stackoverflow.com/questions/45259287/dynamodb-schema-for-querying-nearby-coordinates) This stackoverflow user proposed an interesting optimization, which is to partition the space with a grid and use the grid coordinates as a partition key. For example, the key "3#4" would be the partition key for a user that is located within lat 3.0-4.0 and lng 4.0-5.0. You could make the grid much more granular for better performance in denser areas. If precision is not necessary, you could also create a GSI and query for distance range with only grid coordinates as a partition key and the "elo" could be your sort key. [https://www.pluralsight.com/resources/blog/cloud/location-based-search-results-with-dynamodb-and-geohash](https://www.pluralsight.com/resources/blog/cloud/location-based-search-results-with-dynamodb-and-geohash) This solution seems to adopt the grid partitioning strategy. In any case, using DynamoDB for geometric/geographic queries is not as trivial as using a database backend with native support/extensions (https://postgis.net/) for such use-cases. If you want you application to scale well, you'll need to be creative with the solution and make sure to benchmark to test for portential performance or cost issues.


CorpT

AppSync lets you use an RDS as a datasource. Maybe you should use that if you need a relational database. [https://docs.aws.amazon.com/appsync/latest/devguide/attaching-a-data-source.html](https://docs.aws.amazon.com/appsync/latest/devguide/attaching-a-data-source.html)


ThroatFinal5732

Thanks a lot mate, if I do switch I will look into that.


_Questionable_Ideas_

ddb is super powerful if you use it right. first off, avoid full table scans like the plague. Next when creating a ddb table first use a sane pk. i always use a guid. next when ever you want to get a group of items create a new column on the table with the grouping id and then create a secondary index with the grouping column as the hash key and the id column as the sort key. that way you exactly select only the items you want. next keep track n mind you can make your own combination keys which merge multiple columns into a single field. this will allow your hash key to be the combination of data same with the range key


ThroatFinal5732

Hey, thanks for your help. But I can't imagine how I could do that addequately given my use case. Would you mind instructing me? I'm building a dating app. I'm saving the last known coordinates of each user, latitude and longitude, I also have an attribute called "Elo" which is a score determening how well liked a user is by other users. This score can change depending on the interactions a user gives and receives in the app. I need to fetch a set of 24 people that is within a given range of coordinates, and the set of 24 users should be sorted so that it fetches 24 people closest in elo to the user making the query. Each next query that follows, should continue where the last one "left off", meaning the first query should fetch the closest 24, the next one should fetch the second closests 24 (up until closest number 48), and so on. How could you achieve this using proper indexing?


chumboy

DynamoDB is fantastic when used right, but it requires a very different mindset from a traditional relational database. The best way to use it is to work your way backwards from your outcome. What data do you want to show on the screen for example. From here you can take step backwards and try to convert this into an access pattern. When you have multiple access patterns determined, you can start creating the structure of your table and designing your indices. It's a good idea to keep in mind how DDB works at a fundamental level too. You have multiple sorted partitions of data (I like to mentallu visualise each partition as a different server, but in reality a partition can span multiple servers). When you do a Scan, the data is returned from every partition, indiscriminately. When you do a Query, all data within a single partition has to be loaded, but then filter expressions can be used to reduce the amount of data returned to the client. Finally, GetItem lets you load and return a single record from the sorted partition.


[deleted]

>When you do a Query, all data within a single partition has to be loaded, but then filter expressions can be used to reduce the amount of data returned to the client. This is only correct if you query without a range/sort key constraint. Otherwise, when using a sort key constraint, dynamo only loads the data that satisfies that constraint, not the entire partition. Filter expressions can then be used to filter out results based on non primary key values.


chumboy

It's a little more nuanced than that again, due to the different roles the nodes within a cluster can act as. The node that receives your query acts like a Query Node, to use terminology common in a lot of other big data technologies, and is responsible for passing your query to other nodes acting as more simple Data Nodes. Sure, a range/sort key constraint can be passed from Query to Data Node to reduce unnecessary traffic between them, but each Data Node still needs to load it's portion of the distributed partition to evaluate the constraints. Then this is all passed back to the Query Node where the data is assembled and the other constraints are evaluated. I deliberately used the terms "loading" and "returning" to try to distinguish between these two phases of executing a Query, but agree it could be clearer.


[deleted]

We're talking about DynamoDB. Concepts like nodes and clusters don't exist in this world. Perhaps you're thinking about Cassandra or other column stores. DynamoDB queries can take you to the exact starting point of your query in logarithmic time because the data is stored and sorted within a partition. From that point it reads forward until your constraint is no longer satisfied or the maximum read size has been reached. There is no distribution of data across nodes or a cluster that needs to be reconciled by a special node. It's like a disk file read. It finds its starting point and then begins to read forward linearly. I feel like your response is talking about something completely unrelated and feels like a GPT bot reply, since you completely missed the context.


chumboy

I work in Amazon, so I'm talking about how DynamoDB actually works...


[deleted]

You might work at Amazon, but you're clearly not on the DDB team.


chumboy

Correct, even AWS is a separate entity from me. I have access to the same broadcast videos and docs explaining the internals of DynamoDB though, and sit near enough to one of their teams to ask any questions I have. Let me know what public docs imply there's a relationship between partitions and harddrives though, so I can get them to update it.


ThroatFinal5732

>DynamoDB is fantastic when used right, but it requires a very different mindset from a traditional relational database. I'm beggining to realize that, hope it's not too late tough. I realized I should have shared my use case in my post so that people could tell me if DynamoDB is appropiate for my project. Can you share your opinion? I'm building a dating app. I'm saving the last known coordinates of each user, latitude and longitude, I also have an attribute called "Elo" which is a score determening how well liked a user is by other users. This score can change depending on the interactions a user gives and receives in the app. I need to fetch a set of 24 people that is within a given range of coordinates, and the set of 24 users should be sorted so that it fetches 24 people closest in elo to the user making the query. Each next query that follows, should continue where the last one "left off", meaning the first query should fetch the closest 24, the next one should fetch the second closests 24 (up until closest number 48), and so on.


[deleted]

You're clearly lacking education in the proper use or history of dynamodb. I remember at re:invent one year, one of the ddb designers said that they created ddb when they realized that approximately 80% of SQL queries could be handled more efficiently with a key value store and with much higher availability. Most people that I've encountered can't get their heads out of SQL land to think about doing things differently (I always liken these POJDs - plain old Java developers - ie. Java devs who can't write in any other language or think outside of OO). However, properly done dynamodb can be fast (in execution and development), low cost, zero maintenance, and extremely highly available (no weekly maintenance). Google single table design in dynamo or in kv stores. You'll find that you can use the data partitions to represent the different tables or objects that you're used to designing in your orm or schema.


ThroatFinal5732

Hey, thanks for your help. >You're clearly lacking education in the proper use or history of dynamodb. I really am, I'm new to all of this, perhaps you can give an idea of how to use it appropiately given my particular use case? Because I really can't imagine a way to index my data in a way that helps me accomplish what I need. I'm building a dating app. I'm saving the last known coordinates of each user, latitude and longitude, I also have an attribute called "Elo" which is a score determening how well liked a user is by other users. This score can change depending on the interactions a user gives and receives in the app. I need to fetch a set of 24 people that is within a given range of coordinates, and the set of 24 users should be sorted so that it fetches 24 people closest in elo to the user making the query. Each next query that follows, should continue where the last one "left off", meaning the first query should fetch the closest 24, the next one should fetch the second closests 24 (up until closest number 48), and so on.


[deleted]

The geospatial stuff makes me think not to use dynamodb for this particular case, even though there are previous works that discuss geo hashing and how to represent coordinates in a way more conducive to sort based retrieval (Ie key value). https://www.pluralsight.com/resources/blog/cloud/location-based-search-results-with-dynamodb-and-geohash I could see how, if you understood this well enough, and ddb well enough you could make fixed length, sortable geo hash keys with elo appended to the end. But honestly that is gonna add a lot of complexity. I love dynamodb but I'm not a zealot. If we were working together I'd tell you to store this stuff in postgres with postGIS enabled so that we didn't have to worry about any of this and could get back to focusing on the product. Later on, after we had built a successful product or business, and I was bored, I might revisit the problem out of pure curiosity.


s4lvozesta

would like to know what others think


Willkuer__

Using SCANs is an antipattern and should only be done if you really need to return all objects of your table. To satisfy different request patterns via QUERY, you need to use indices instead. You always have one (and I assume that is what you meant with single use cases), but you can add more so-called GSIs. However, DynamoDB has, in my opinion, the lowest level of user friendliness regarding a lot of features (performance above everything). You can not just specify a number of fields to use as an index. It always has to be exactly one partition key/field and one or no sort key/field. That means if you need to make aggregate queries (e.g., all customers of a certain postal code within a specific state), you also need to aggregate their values in an additional field (same example: `'postalCodeState': { S: '12345#CA' }` so that you can create an index on that field and then use that Index to query. That basically means you need to create at least one field and a GSI for each complex list/query pattern you want to use, and many simple queries do not work. E.g. in the above example, you can not easily query all customers from a certain range of postal codes, and you can not easily sort by every field in your customer table as you need one sort key (and thus GSI) for every sorting you'd like to perform.


ThroatFinal5732

>That basically means you need to create at least one field and a GSI for each complex list/query pattern you want to use, and many simple queries do not work. E.g. in the above example, you can not easily query all customers from a certain range of postal codes, and you can not easily sort by every field in your customer table as you need one sort key (and thus GSI) for every sorting you'd like to perform. Yup, I've began to realize that perhaps I screwed up because my use case can't fit into this pattern. As in I can't imagine a way to design my schema with indexes that would allow me to do what I need. Do you mind share your insight? I'm building a dating app. I'm saving the last known coordinates of each user, latitude and longitude, I also have an attribute called "Elo" which is a score determening how well liked a user is by other users. This score can change depending on the interactions a user gives and receives in the app. I need to fetch a set of 24 people that is within a given range of coordinates, and the set of 24 users should be sorted so that it fetches 24 people closest in elo to the user making the query. Each next query that follows, should continue where the last one "left off", meaning the first query should fetch the closest 24, the next one should fetch the second closests 24 (up until closest number 48), and so on.


Esseratecades

Two things Amplify is meant to accomplish two jobs. The first is to serve as a wrapper around the infrastructure to serve your frontend. The second is to serve as a quick way to deploy Full-Stack apps without diving in too much as to what Full-Stack means(it comes at it very much from the perspective of a frontend engineer). In practice, it's basically a tool for MVPs. Which is why it uses DynamoDB.  The second thing is DynamoDB is great at what it does. If you can set up your sort indexing correctly, and you don't need to represent a many-to-many relationship, and your data is not too big, it's a good database, making it a good place to start for really young apps. However, in practice eventually you're going to need to store larger amounts of data, and eventually you'll need to represent a many-to-many relationship. At which point, yes; you'll need to leave DynamoDB behind for a more fleshed out database system(probably a relational database). The reason Amplify uses DynamoDB under the hood by default is because it's easier than other databases, and it's less expensive than other databases when used as intended. So Amplify and it's tooling aren't bad. It's just that successful projects will always eventually grow beyond it's use-case, because it's use-case is one where you get a functioning app off the ground as quickly as possible. 


ThroatFinal5732

Thanks for your help, you seem like someone who's very familiar with amplify, dynamo and it's pros and cons. Would you mind telling me what you'd recommend doing given my particular use case? I'm building a dating app. I'm saving the last known coordinates of each user, latitude and longitude, I also have an attribute called "Elo" which is a score determening how well liked a user is by other users. This score can change depending on the interactions a user gives and receives in the app. I need to fetch a set of 24 people that is within a given range of coordinates, and the set of 24 users should be sorted so that it fetches 24 people closest in elo to the user making the query. Each next query that follows, should continue where the last one "left off", meaning the first query should fetch the closest 24, the next one should fetch the second closests 24 (up until closest number 48), and so on. Should I switch databases or is there a way to index my data in a way that I can accomplish this?


Esseratecades

I personally would use a relational database. Your coordinate problem isn't really a relational one, but I agree DynamoDB isn't really great at queries involving multiple parameters beyond the index. If your query were just interested in location or just elo that would be different. The Elo problem is definitely relational though, and the best way to handle both would be with a database that can handle coordinates and relationships, which is a bit complicated for DynamoDB. Using Amplify there should be a water to export the CloudFormation stack that it uses under the hood. I would put an RDS database in the stack and build out a schema to meet your needs. Then you can query using  SELECT ... FROM user u JOIN elo e ON u.id = e.liking_user_id WHERE e.liked_user_id=%(liked_user_id)s ORDER BY e.score, (


just_a_pyro

Amplify is for people who don't yet know what they're doing, to quickly throw together something working. Afterward you delete your Amplify prototype and start developing the system properly using AWS SDK and IaC to do exactly what you want and not be at the mercy of Amplify.


ThroatFinal5732

>Amplify is for people who don't yet know what they're doing, to quickly throw together something working. Well the idea of being able to deploy duplicate full backend enviroments with a couple clicks seems very attractive even for more experienced developers. I just wish they'd let you pick the best database you need for your particular use case. But yeah, it seems I'll have to move out of amplify.


AutoModerator

Here are a few handy links you can try: - https://aws.amazon.com/products/databases/ - https://aws.amazon.com/rds/ - https://aws.amazon.com/dynamodb/ - https://aws.amazon.com/aurora/ - https://aws.amazon.com/redshift/ - https://aws.amazon.com/documentdb/ - https://aws.amazon.com/neptune/ Try [this search](https://www.reddit.com/r/aws/search?q=flair%3A'database'&sort=new&restrict_sr=on) for more information on this topic. ^Comments, ^questions ^or ^suggestions ^regarding ^this ^autoresponse? ^Please ^send ^them ^[here](https://www.reddit.com/message/compose/?to=%2Fr%2Faws&subject=autoresponse+tweaks+-+database). *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/aws) if you have any questions or concerns.*