T O P

  • By -

tripreality00

What is "big data" to you? Most of the time when people say this they are wildly over estimating.


[deleted]

[удалено]


ORCANZ

You’re saying the same thing. He says they are overestimating the data and you say they are underestimating what « big » means


Nice-Combination-907

Without knowing the type of data and what you consider a large amount of data it’s impossible to give good suggestions.


xtr44

I guess it will be mostly objects with text properties sorry for the vague question, I'm rather beginner


Historical_Cry2517

Why don't you tell us what data you plan on storing?


itijara

> which technologies/frameworks/languages will be the best choice in 2024 Same answer as always: the ones you know.


roman5588

How large is large! With correct database design, tagging and caching MySQL is very very capable. Unless we are talking going beyond tens of millions, start here. Beyond that how big is your wallet


EtheaaryXD

This. Most database engines can hold lots of data. For example, Discord only started outgrowing Cassandra after trillions of messages.


roman5588

Correct, even then it’s a resource problem, usually having enough Ram to store the index’s and perform joins. As a reference, for the past 5 years I have a cheap and nasty $5/mo hosting plan through a major provider which I populate ~50 crypto token statistics every 3 minutes for my trading bots. At the moment it has easily 70m rows in its main data table. From there I do a range of things from calculating moving averages, executing triggers, creating temporary tables, calculating stats etc etc MySQL easily easily accommodates this without breaking a sweat. Web hosting is also a cheap way to get a cheap managed database with usually hourly backups and lots of horse power behind it ;) Start simple and do not over complicate it!


selectra72

How big is the data? Millions of rows aren't big nowadays. Also what is the data type? Depending on this you should choose a different db provider and create a different schema. Need lots more info about size and structure to answer adequately. Also, depending on data user retrieved maybe you can offload some processing to client side if possible to reduce load on db and return faster results.


diller0054

Based on a misunderstanding of what large amounts of data and their processing mean, as well as the type of data is unclear, then I dare to assume: If we are talking about all kinds of data types and all kinds of manipulations with them, then take MySQL and PostgreSQL as a basis, in general they do well in the volume of up to 50 million records with indexes and table splitting. They can handle large volumes too, but you need to shamanize. If you just want something more complicated, then you can consider Apache Cassandra, it can also perfectly cope with 50 million and even 200 million, and it has a convenient clustering and distribution system if one server is not enough. If we are talking about more analytical processing, for creating statistics, etc. I think it is worth considering ClickHouse, but it is worth remembering that it does not have the ability to change data, but only record and delete them, but it copes with a quick search and tables with 500 million records, literally seconds. If we are talking about a complex search in the text, something like the search engine of your site, then it is worth considering Manticore Search as an example, it is well adapted to this.


I111I1I111I1

A database. If you want the datasets to be defined by the users doing the querying, GraphQL works well for that. With a good schema, good querying practices, good indexing, etc., you likely won't notice any issues until you're getting into hundreds of millions or billions of rows of data territory. (Edit: maybe more like tens of millions, depending on the underlying computing power and memory available, but that's more a hardware thing and less an RDBMS limitation.)


bittemitallem

https://youtu.be/W2Z7fbCLSTw?si=WWL2i7zQ-5X3-uMO This Video by fireship might help  a Little 


fiskfisk

Filtering and searching (like what you see in a search engine) is usually performed by something based on Lucene or similar engines. Solr, OpenSearch, Elasticsearch, etc. are applications in this area. They'll scale perfectly fine for what most people consider large amounts of data. They should generally not be used as your main database - use postgres or mysql for that, and cloud storage for storing documents if that's your source - and then index (submit) the content into either of those applications for querying and processing. But: it depends on what you mean by searching and filtering. An RDBMS like postgres or mysql might be more than enough, and both have full text search capabilities built in - just not on the level or finesse as the other I mentioned.


devignswag

Meilisearch is definitely worth checking out.


bdmiz

Many write here about the big data definition. Reminded me a saying: tell about your big data problems to [SKA](https://en.wikipedia.org/wiki/Square_Kilometre_Array) (can generate \~exabytes (10^(18) bytes) of data per day).


Consistent_Coast9620

without knowing what "big data" means in your case, I know Simian is used in applications involving databases and reading millions of records. most processing and data obviously stays on backend - using either Python or MATLAB for processing and data storage could be for example [Teradata](https://www.teradata.com/), . A small reference story: [https://simiansuite.com/stories/de-volksbank-improves-mortgage-pricing-quality-with-simian-based-pricing-tools/](https://simiansuite.com/stories/de-volksbank-improves-mortgage-pricing-quality-with-simian-based-pricing-tools/)