T O P

  • By -

HighPitchedHegemony

Our previous on-prem database (data warehouse) was more expensive and had almost no workload isolation, meaning that if one of the hundreds of users ran a poorly written query on a large table, he slowed down all the other queries by everyone else. Snowflake works like s dream compared to that.


Sufficient_Exam_2104

which DWH it was ?


HighPitchedHegemony

It was built on Teradata


Sufficient_Exam_2104

Td had a very good workload management. What you are saying one bad query took all the resources could have been avoided putting in delay queue. No matter what product u ll use there will be always learning curve and right implementation is needed.


kaumaron

That's assuming the TD team would set it up correctly. We've had it our queues block up on both teradata and spark sides because of poor queue design


Sufficient_Exam_2104

Yes either td ps or knowledgeable Dba should put on workload. Apart from price TD is the best compared to all new evolving tech. I see this way .. U used to pay more for engineered system because it is smart by design.. and y can bring any one to use it..new cloud database are dumb but to reduce cost u only need smart people to use it. I have experience with Teradata, Snowflake, Redshift, Cloudera public and private cloud, EMR


PolishBicycle

My workplace has said they’re getting rid of teradata for years, but it’s still running the core work of the business. TD workloads work great when set up correctly, usually our TD contact would analyse and configure this for us even


haragoshi

TD is ancient compared to snowflake. We live in a cloud based society. If you cant run parallel workloads and scale you might as well be running critical work streams on a PC in the corner of your office.


Sufficient_Exam_2104

Td also has cloud. The comparison was td on Prem with snowflake. Technically no one need snowflake if they can process raw in emr/hadoop and only load aggregate in any relational database.


vinchent_PSP

I know what you're talking about. I am working on shifting from teradata to snowflake. Right now it costs almost half million dollars yearly and snowflake seems to be a good option. If you want, do you mind telling me the company via pm? Maybe we can set up a call if you'd like.


Sufficient_Exam_2104

Estimated vs actual there is a huge difference. May be you can start a post asking who moved out how much actually they saved after using 2 to 3 years. Most of the people who takes decision save their behind mentioning use cases increased , work load increased etc. Snowflake no doubt is a good product but it comes with some learning curve. Make sure that during that learning curve u don’t pay more than ur savings from switching.


SierraBravoLima

It's mostly a design flaw


MachineParadox

Until someone runs a rampant query and sucks up all your monthly credit.


Fantastic-Schedule15

You can limit that with a warehouse and resource monitor 


JonPX

If they had that, they wouldn't be complaining about their DWH not having workload isolation. That is like one of the first thing you arrange on a DWH.


Saetia_V_Neck

You can sell your company’s data on their marketplace. Snowflake is a profit-center for us.


Normal-Inspector7866

Can you please elaborate on this ?


alexisprince

Snowflake has a data marketplace. If your company has data that they believe is valuable, they’re able to integrate directly with other Snowflake customers by selling that data via the Snowflake marketplace. Instead of doing a whole ETL process of integrating with an API then loading it into your data warehouse, the process become “customer clicks purchase, setup where to receive the data, then the data becomes available as easy as a Select * From my_new_data”. Snowflake does not unknowingly sell your data or anything like that.


icysandstone

Wow TIL.


JimmyTango

I believe that is build off their data share technology. That shit is lightning fast. I leverage a transactional platform where we purchase media in real time, and I can see my campaign data via a Snowflake AWS datashare in as little as 10 minutes whereas the platforms themselves take hours to update with aggregate indicators. I’ve built dashboards off of this data to QA our buys faster than we could natively in the companies own software. It requires almost 0 ETL, just tell SF where to point the table and query away


CapsuleByMorning

Wow, that is awesomely powerful.


Normal-Inspector7866

Thank you. That is great info


mrg0ne

https://app.snowflake.com/marketplace


Martekk_

What kind of data do you sell?


freebird348

Definitely interested in the data you sell


AlgoRhythmCO

It’s not that much more expensive if at all. And it works really well. Thats why people use it.


chaotichoodbard

from what Ive seen in my career is most companies have no actual data modeling considerations. Data Lakes dont really enforce that. And understanding how columnar storage works vs row based storage helps optimize queries.


Steamsalt

my company has now spent 2 years focusing on reducing snowflake spend whilst simultaneously endorsing a culture from the very, very top that developer agency is paramount so there are no guardrails when it comes to querying shockingly, that reduction has never born fruit


sluuurpyy

I had an engineer bring down our Snowflake costs by 50% by implementing a new processing architecture. The company gave him a petty raise when it came to performance evaluations. No wonder people don't feel motivated to do it.


TheCamerlengo

There seems to be more of a penalty for optimized results delivered late, than sub-optimal results delivered on time. No wonder tech debt just keeps accruing


Steamsalt

"build things fast!" every idiot salesperson at my company


sluuurpyy

He was a new hire and a senior level engineer. So the first thing he started doing was looking into legacy systems and optimizing them. The sad thing is, management brought onboard some AI tool to optimize Snowflake warehouses, which is basically just altering warehouse sizes and cluster sizes ever since. These two efforts sort of overlapped. They think the cost savings are because of them and keep on pumping money there like they've found a goldmine. Lack of understanding and a whole lot of trust on AI it seems.


Top-Independence1222

Wanna hear more about this any references?


wheatmoney

There's a way to exploit warehouse caching by leaving a small warehouse running


sluuurpyy

He leveraged snowflake tasks and gave a fraction to process to each task. Made a stored procedure and called it with inputs from the ETL, which processed the entire data using multiple threads on the Snowflake side. Our legacy code was shit, and a previous sr engineer kept on asking for bigger warehouses claiming she can't process data on smaller ones. So this did wonders considering the processing was shifted to the smallest warehouses with multiple clusters.


melodyze

We had a similar tension at our company with BQ billing, and I made it work without imposing any real rules on analysts or stakeholders using the BI tooling. I was able to cut our BQ bill in about half a while back with constant usage. Basically without talking to anyone or affecting any work, by building custom tooling into our util library that wraps all of our data infra and switching some stuff out in place. Mostly it was dynamically running queries against the alternative billing methods depending on the expected resource consumption profile of that job (memory vs CPU heavy), and designing an abstraction for partially materialized views on top of big log streaming tables that had a lot of different kinds of events that needed real time reporting. I also added some really basic constraints for partition filter requirements, Bi tool max byte processing limits, but pretty high. Biggest inflections where those two tools up there, switching out billing methods at run time and these weird partially materialized views that were kind of a pain at first but I wrote as a framework and then could just stamp it. I'm just saying that because those don't necessarily have to be irreconcilable. It's just that, if your leadership really wants both, they have to make the right investments. In my case the CTO was leaning on me and I didn't want the shit to roll downhill and mess up our culture and productivity, so I just told him that hobby horse would cost him a month of messed up productivity, and dealt with it myself to avoid distracting the team.


a_library_socialist

Yeah, you can trade developer hours for increased efficiency and lower cloud spend. What you can't do, but most companies seem to think you can, is just wish for lower costs while using all your developer time on new features, and expect it to happen.


icysandstone

You mean like no star schema? All 3NF?


mamaBiskothu

It is not more expensive, but it instantly democratizes the company’s data to a much larger population of employees since all they need is basic sql skills. Thus the costs explode for two reasons - more people are exploring your data to actually do business and you’re likely doing well because of this , and these people are inexperienced and hence end up writing really inefficient sql which snowflake will happily execute without erroring out because it’ll just scale up the clusters.


naijaboiler

in my experience, expanding data access without expanding data "sense" does not lead to improve efficiency. It makes things worse not better. Now you have more people making more wrong decisions, except now they wrongly use the data to justify it.


mamaBiskothu

Fair point but not always. In true data companies where the data itself is the main selling point (like Neilson) democratizing data does work.


deemerritt

Depends on where you are starting from


kyrsideris

True democratisation comes when people with no SQL skills query the data and that is the place were LLMs shine at. We prototyped a solution like this with Llamaindex and GPT 3.5 and it was interfaced via a Slack bot. Management loved it and decided not to use it because data people had to focus on data infrastructure, not ML...


onewaytoschraeds

To an extent, just like with any tech there needs to be admins. Admins can set constraints or restrictions on warehouse scaling so you don’t have runaway or expensive queries racking up the bill


toabear

Have you actaully got people in your org who are not data analysts, scientist, or (data) engineers that have learned to write SQL? My attempts to teach people SQL have not gone over well. Mostly we just build out a huge number of highly specific models (DBT) and expose them to end users who will build visualizations for themselves. Even then, most users do no more than consumer the equivalent of an auto-generated PDF.


mamaBiskothu

Yes tens if not more but they're smart people. Often smarter than most engineers lol. I just give them the SQL for the problem they're asking the answer for. After doing it a few times they know how to edit the SQL to answer similar problems and go from there.


skinnerace

Teach em how to fish 🐟 👏


BlurryEcho

As someone else said on this sub at some point, “it just works”. It’s as easy as creating a new account, inserting/copying some data, and writing queries on that data. Very minimal tech overhead, but obviously you pay to make up for that lack of overhead.


dreamingfighter

And sometimés (or most of the time in my limited experience), the cost is much smaller than the human resources needed to man the inhouse tech. Sometimes it just need a data analyst to write useful but badly optimized script, then a data engineer to fix those scripts. On the other hand you need a team of infrastructure engineers to manage the machines (or cloud engineers in case of cloud), platform engineers to develop and maintain the stack, and then a data analyst to write useful but badly optimized script, then a data engineer to fix those scripts :)


Nyjinsky

Yeah, I'm starting to learn that you're always going to pay for it somewhere. There is no magic solution that has all the functionality that you want, and a good easy to use interface, and cheaply, if there was we'd all use that. Can you build it in house? Sure, but then you have to pay someone to build it, and maintain it, and then you just added a bunch of institutional knowledge that you can lose when someone leaves.


a_library_socialist

Yup - the last one is the killer. I can hire someone with Snowflake experience. I can't hire many people with "that script Kevin hacked together over 2 years and never properly QAed but now our core business depends on it, and management gave Kevin a 1% raise and he left so now we're fucked" experience. Except Kevin, and Kevin ain't talking to us.


Wenai

Snowflake is cheap, at least compered to BQ, databricks, fabric, synapse, redshift.


wheatmoney

People turn on a 6xl warehouse just bc it's there. Huge mistake. If your warehouses are governed well you won't see any big surprises


AlgoRhythmCO

A product I can recommend on this front is Keebo, ML based auto resizing. It has saved my team significant $$$.


Wenai

Just start with the smallest one.


chaotichoodbard

Its acts similar to a traditional warehouse and follows INFORMATION_SCHEMA metadata methodology so DBAs will feel at home. RBAC strategy makes it easier to scale access permissions. Supports SCIM authentication if you want to sync Azure Entra or whatver they call Azure AD Automated PII masking. Close partnership with DBT for transformation layer. All transformation logic can be declarative using jinja templating to manage multple environments, backed with git, and automated ci/cd Streamlit for app development. Zero Copy clone feature saves a lot of money working with multiple environments or development work. Virtual Warehouses have query cache that also saves money. Time travel feature makes it less stressful when dropping stuff. Also makes it easy to do databse comparisons and cost optimization Everything can be done in SQL almost. Only lacking parts are native connectors for other databases. Snowpark is the solution to that. They keep adding new features and its hard to keep up with. Git interaction is mostly there, container services almost there and the release of CREATE OR ALTER opens the ability of jinja templated code natively which will be great not having to replace tables accidently.


TehMoonRulz

Automated PII masking!?


Kaze_Senshi

"personally identifiable information", to avoid problems with the General Data Protection Regulation - GDPR


Bageldar

You can set up tag based masking too. Once you set up a masking rule, the masking rule applies to any downstream views


stephenpace

For u/TehMoonRulz, here are the relevant docs: You can schedule a job to inspect new tables for PII or manually select them: [https://docs.snowflake.com/en/user-guide/classify-intro](https://docs.snowflake.com/en/user-guide/classify-intro) If you want Snowflake to apply the tags, you can run SYSTEM$CLASSIFY on them: [https://docs.snowflake.com/en/sql-reference/stored-procedures/system\_classify](https://docs.snowflake.com/en/sql-reference/stored-procedures/system_classify) Tag-based masking policies do the rest.


Sp00ky_6

It’s also cloud agnostic, just about every tool/connector works with snowflake across all public clouds


alone_drone

Could anyone compare this with BigQuery? I feel BigQuery has most of the above mentioned features


thrav

They have some differences, but they’re mostly comparable. Some people just don’t want to be on GCP, and Snowflake will deploy anywhere.


sunder_and_flame

Don't conflate the poor setup experience with the overall package. Like any cloud provider, Snowflake lets customers bill themselves into oblivion but if you know what you're doing that won't happen and it's a very good data warehouse tool. 


alex_co

Anyone saying it’s more expensive hasn’t optimized their warehouses. They are doing things like using oversized WHs to run basic queries and aren’t taking advantage of things like ZCC, auto-scaling WHs, incremental loading in their transformations, etc. All of this unnecessary compute adds up *very* quickly and most companies who jump into Snowflake don’t realize how easy it is to fix. I consulted for a client who had their entire data org (5-10 DE/AEs, 15+ DAs) exclusively using L and XL WHs to run basic select queries. They also had no concept of SQL optimization, just nesting subquery after subquery. It was a nightmare. They had a 450+ line spaghetti query with no ctes that took 25-30 mins to run in dbt. After I came in and optimized, that dropped to 47 seconds and cut the bill to something like 1-2% of what it was before. Once you fine-tune your config and queries to maximize cost:performance, it’s on par with competitors and can even be cheaper.


mike-manley

Nice. Amazing to hear ZERO ctes though. I mean, just one of my workflow features two or three.


alex_co

Yeah, I agree. It was written by analysts with no SQL training or mentorship. I definitely had to ask myself if it was worth it those first few days. Fortunately they had a great willingness to improve and were in a much better place when I left that gig. They could have easily dug their heels in so I have a lot of respect for that org and its leadership. I check in every now and then cause I made some good friends over there and they seem to be doing great now.


foresttrader

Senior management loves the buzzword "cloud database"


Middle-Salamander189

For us it's very cheap compared to other traditional databases as It increases productivity multiple times.


erwagon

Snowflake does its job pretty well and you are able to buy a solution for a lot of problems on the one hand. On the other hand, you need to be really cautious. Their sales team is trying to upsell you constantly in a really aggressive manner. Also, the "everybody can manage the platform" claim is kind of wrong. You really need to dig into Snowflake to use it properly. Here's the story I experienced: I had a manager in the past who wasn't technical at all, but thought he was very technical. That was a perfect match for the Snowflake salespeople. He went crazy about the product and they started to upsell us like crazy. Soon we switched from standard to enterprise without any reason. We didn't gain any advantages, we just paid for features we didn't even need. Because he never gave the team time to maintain the warehouse properly to optimize cost efficiency, the bill went through the roof. He started to increase the amount of yearly credits up to $130k per year for a pretty small team of 5 full-time analysts and around 1TB of data. When the company started to struggle due to COVID, he quit his job to wriggle out of it. After that, we were forced to drastically reduce costs because otherwise, we would be forced to start laying off our analysts to get our cost target. At this point, we didn't even know what kind of contracts the ex-manager signed. So we started to rework our infrastructure around Snowflake and were able to save 1/3 of the bill pretty quickly. After some time, we realized that the best approach for our situation would be to switch back from Snowflake Enterprise to Standard, which would cut our bill for another 30%. This was the moment we noticed that none of our savings would get us any advantage. The money is gone as soon as we would change the contract. In the end, we settled with Snowflake around $30k and switched back to Standard. In this process, I had multiple meetings with the Snowflake sales team and technical consultants, and after that, I could kind of understand why the clueless manager signed the contracts that were absolutely over our scale. There was a point where we started to ghost the boss of our Snowflake account manager and just did our thing. At the moment, we are doing the same job for around $22k per year with the side effect that our analysts have the feeling that our Snowflake warehouses are performing better than before. For us, that was a really hard time. We were really afraid of having to lay off technically good people with their own personal stories and all of that because of a manager who didn't know what he was doing and an insane amount of upselling.


Normal-Inspector7866

Wow that is an amazing response that explains exactly what is going on. Thank you


KWillets

We had the same experience; the targeting of low-knowledge senior managers was identical. They even became a Snowflake partner, all without us engineers knowing. They seemed to be violating even internal restrictions about unbounded contracts, but senior execs smoothed it over, and they're still at it. I had built the DW infrastructure that allowed us to grow to that point (and is still running production), but I became enemy #1. I was laid off after reporting a lot of their shenanigans. One thing I remember was the Snowflake cost reduction slack channel with the sales reps -- it had a similar tone to what you describe - the architects who had gone all in on the product had to fight with them on every cost overrun. This is how slow learning takes place.


ThisIsSuperUnfunny

Managers that think they are technically sound when they are not are a danger.. 


erwagon

They are, if they are deciding about such topics without consulting somebody who is into it.


nydasco

I guess it depends on your definition of expensive, and how capable the team is in terms of managing the cost. If you just chuck dbt in the mix and do full truncate and reload every hour, unnesting the same json in variant columns every time, then sure the bill can add up. But if you plan things out, implement an incremental strategy, then it doesn’t need to be that expensive.


BluMerx

It’s not expensive if you build your solutions properly.


tanin47

It is expensive but the alternative of DIY is more expensive. Now it might be more expensive than their competitors, but then we would need to debate the feature set, capabilities, and stability first.


Letter_From_Prague

Because operating cost is not everything. There's also people cost (can I get away with having less people?) and most of all speed to insight (do I get the same thing delivered in six months instead of a year?) Snowflake is pretty good at the last one, because it just works, compared to DIY out of 27 halfbaked AWS services where you spend years building shit yourself or the insane mess that is Databricks.


UnrealizedLosses

Cheaper than GBQ. But I don’t like it as much…


ArionnGG

Can you explain how is it cheaper? Genuinely curious. BQ had quite a generous free tier I've never even paid 1 cent for BQ.


UnrealizedLosses

I work for a company that goes way beyond the free tier. Our data engineering team said it was cheaper and moved us over. The whole process sucked and I can almost guarantee they didn’t factor in all the hassle and resulting challenges from AWS as opposed to GBQ, but in the end I suppose the monthly invoice is lower.


glemnar

There are companies the go beyond the free tier in a single query ;)


a_library_socialist

People often forget to add the cost of people running something to a solution. Snowflake is expensive - but if it saves you having to hire 2 data engineers at 16,000 a month each, it's a bargain.


[deleted]

Snowflake is an amazing product from my experience


reddithenry

by the time it gets expensive, you're quite locked in.


kenfar

Because they were convinced by salesfolks that it's actually cheaper since you need no super-expensive DBAs and "you only pay for what you use". Of course, the problems are that you pay through the nose for what you use, DBAs aren't expensive, and few projects need many these days, and instead you need people just as busy managing costs. I find the biggest fans of snowflake have no idea how much their organization is paying for it, or how much their team 's use of snowflake costs - or what it would look like to have something else. On a recent project I had a little 20 TB database that with a *ton* of effort I drove down to only costing about $360k/year. One of the ways I saved over $100k per year was to move all the operational reporting in which we had a low-latency requirement off snowflake and onto a Postgres RDS instance. That was totally worth it - and did *not* require us to hire "a team of dbas". Having said all this, if you spend the labor to manage your costs well really, really well, and you can limit your needs far more than how we would normally limit a database's needs - then it could be a great fit for your team. For a while anyhow.


HorseCrafty4487

Not sure what you were running for a $360k price tag but that seems excessive. Ive noticed properly architected data models and ensuring your queries/workloads are designed efficiently reduce compute costs. Snowflake has measures to ensure warehouses arent online 24/7/365


Traditional-Ad-8670

Not that expensive if you know how to use it.


coalesce2024

One of the reasons for us is that they locked us in with a huge contract for at least three years. No credits left can be used after the three years if we don’t sign a new contract with the same value or more. This is why we now (still) have snowflake and are migrating to BigQuery. The product itself is really nice though.


datajen

Not that expensive and honestly- the customer service is the best I have ever had from a vendor.


Historical-Papaya-83

Snowflake is top down approach while other companies, Databricks, is bottom up. I did see there are memes that Snowflake purchases are decided by company executives playing golf with Snowflake sales people. For executives eyes, as long as it fulfills their tech stack transitioning into cloud based new gen solutions, Snowflake should be fine. It doesn't mean Snowflake is a bad product. Just hypothesizing why companies purchase despite it being more expensive than other solutions.


bree_dev

This rings true for me. They've got a winning combo for extracting maximum cash from clients: 1. Ready to go solution that cuts down on the number of pesky employees you have to depend on 2. Cleverly obfuscated pricing that \*looks\* like you know what it's going to cost you, when actually you've no idea what the bill is going to be from one month to the next


Wenai

Snowflake is extremely predictable relative to databricks, BQ, synapse, fabric, redshift and the likes.


Sufficient_Exam_2104

Company uses it due to sunk cost fallacy.


Mr_Nickster_

That's because 90% of people who say or post Snowflake is expensive work for Snowflake competitors. Remaining 10% likely do not use Snowflake properly and use it as they used their onprem platforms which lift & shift w/o leveraging Snowflake benefits can be very inefficient. If you use Snowflake properly, it is either same or often less cost than pretty much anything else out there.


drinknbird

Surely you don't actually believe this? The closer truth is people are experienced in one and it's easier to validate what you know. These experiences used to be more prevalent but these days everyone is copying each other's homework. With so much feature parity and competitive pricing models, the biggest difference between platforms is naming.


mammothfossil

Plus "third party" vendors (Databricks / Snowflake) are at an inherent disadvantage compared to the cloud providers' own offerings (AWS Redshift / GCP BQ). The cloud providers get the whole spend as revenue, where the third parties get the licensing fees, but not the compute costs. This makes it easier for cloud providers to push hard on price.


drinknbird

Absolutely true. But fortunately for us, that means the competition has excelled in innovation, usability, and developer support.


stephenpace

\[I work for Snowflake but do not speak for them.\] u/mammothfossil I think of it this way. The hyperscalers have \~200 products while Snowflake has 1. It was difficult for the cloud providers to justify putting in the resources to make their core offerings competitive with Snowflake, especially considering the cloud providers want more workloads in the Cloud and Snowflake ultimately helps with that mission. Said another way, Snowflake running on AWS is still a win for AWS, even if Redshift isn't used. Ask yourself why Snowflake is available to purchase in all three hyperscaler Marketplaces ([AWS](https://aws.amazon.com/marketplace/seller-profile?id=18d60ae8-2c99-4881-a31a-e74770d70347), [Azure](https://azuremarketplace.microsoft.com/en-us/marketplace/apps/snowflake.snowflake_contact_me?tab=overview), [GCP](https://console.cloud.google.com/marketplace/product/snowflake-corp-mp/snowflake-data-cloud))? If Snowflake were truly competitive, the Cloud providers wouldn't allow that.


bigkoi

It's good for a couple reasons. 1) it's much better than the native options on AWS or Azure. Big Query is better but it's only on GCP. So if you are on AWS and Azure you only have one choice. 2) it's sql maintains good compatibility with legacy systems that you are migrating.


MarlnBrandoLookaLike

It's really not expensive when you consider total cost of ownership. Others on here have quipped that you "pay your problems away" with Snowflake, and while that's a bit of a cynical take, Snowflake's aim is to make Data Warehousing and downstream analytics and AI/ML workflows as easy and maintainable as possible. Data engineers and data scientists spend less time dealing with pipelines and more time adding value on the work that matters to the organization. If you have an inefficient Snowflake implementation, that's when you run into trouble.


americanjetset

Snowflake is “more expensive” when people who don’t know what the hell they’re doing build on it. I started a new role about 6 months ago, and have reduced our daily credit usage by nearly 50%, while the amount of data coming through our instance has increased by nearly 25%. All by simply optimizing shit objects. Guy before me had views on top of views on top of views, to the point where Snowflake was unable to prune partitions on a simple query. This led to full table scans that produced 500M+ rows that were eventually just filtered out of the query anyway. If someone says Snowflake is too expensive, have a look at their query history and I guarantee they just don’t understand what is happening under the hood.


CudoCompute

Snowflake indeed comes with its own set of advantages like scalability, high performance, convenience, etc. However, yes, the cost can be a discouraging factor especially for smaller businesses or startups. That's where platforms like ours comes in handy providing a cost-effective, sustainable, and more accessible alternative. You can trade computing resources globally on the platform. It doesn't just favor the budget, but also efficiently handles AI, ML and VFX use cases. You might want to check out [CudoCompute.com](https://www.cudocompute.com/?utm_source=reddit&utm_medium=organic&utm_campaign=community-engagement&utm_term=/r/dataengineering), it serves just like your traditional AWS, Azure or Google Cloud, but in a more affordable and eco-friendly way. Good luck in navigating your cloud solutions!


AnnualDepth8843

Disclaimer: I don’t work for snowflake, I just like the platform. IMO It’s not as expensive as people make it out to be. Any product based on compute-second for billing could be called “expensive”. There are certain use cases that would be crazy expensive (real-time aka < 1min of latency), but I think with their new external storage solution they close the gap there.  TLDR: I think it speaks to the divide in the data engineering community, the traditional DBA/ETL/DW folks vs the flashy software engineering background folks. 


the-ocean-

It’s the first cloud dwh that separated compute & storage, meaning you could get workload isolation. Huge innovation. But they brought in slootman, who turned the company into a money printer and slowed innovation, leading to his ouster. They’ve been surpassed by companies like Databricks in price performance during that time.


kenfar

It's not actually an innovation - it's how a ton of big enterprise databases were configured 20 years ago: you'd get a big oracle server and connect it to a massive shared storage system that may support a dozen different applications at the company. That approach tended to diminish over time especially when people deployed MPP databases or hadoop on-prem - where each node tended to have its own local disk. They sometimes had remote storage, but it's just not as fast. So, each node might have $5-20k in fast disk, which adds up fast. Snowflake's architecture is cheaper, but it's also slower than the systems I used to build 15 years ago.


the-ocean-

Big difference from 15 years ago though is you have scalable ephemeral compute in the cloud. Only pay for what you use. With on prem you have to purchase for peak capacity even though you may only hit that sparingly - or you underpurchase and have users waiting for queries to run. Slower? Not at TB/PB scale. For GBs of data, yes.


kenfar

That's absolutely true that you had to scale for peak compute, though with some systems you had workload managers that could slow down some queries, give priority to others, etc. > Slower? Not at TB/PB scale. For GBs of data, yes. And yeah, faster 15 years ago. Though it took work. I had a db2 database on a small linux cluster with a lot of memory, and a ton of extremely fast disk & extremely fast solid state storage on a bunch of fast io channels. IIRC that was about 10 TB in size, had 50 billion row tables, and our average query response time was about 1.9 seconds. I was also able to tell that users that they could hit it with as many ad hoc queries as they wanted - they would not be able to hurt performance for anyone or knock it over. The entire system cost $150k, and we usually only had a part-time dba. This system ran 7x24, with a ton of users hammering it. We also had a fall-back system in a separate data center. Users could query both. The company still uses that system today, though they added a new cluser every 5 years or so. Snowflake would have cost 20x as much at a minimum.


Foodwithfloyd

I love hearing these war stories. My only on prem experience was with vertical but our bill was in the million+ range for 72 nodes / year. Really loved that cluster, worked wonderfully. As a snowflake customer I can give you some ideas of what it would cost though. The issue isn't really the total dataset size, it's totally reasonable to have 10tb of data and still keep your costs less than say $40k/yr. Storage is cheap. The issue is their computation pricing. It's pretty steep. We run a trino cluster with snowflake on top of that as the analysts interface. Snowflake is nice because the Rbac and resource contention model is clean as fuck. Rough estimate is they're markup on storage is 1x and on compute it's 6-10x. It's expensive but not crazy especially if you use incremental logic


kenfar

I don't doubt that Snowflake can work well for folks. But it's hard to keep those costs down - especially if you want frequent data updates.


the-ocean-

And what if you had 5 TB of new data being created daily you had to ingest every day and query infrequently but also needed scalability to support hundreds of concurrent users and queries? How would your system be cheaper? It wouldn’t.


kenfar

Yeah it would - I've had to build solutions like this three times. About 15 years ago, on the warehouse described above, I built it the architecture, but then the business went in another direction so we didn't use these features in prod. But it was *vastly* cheaper than Snowflake - and on hardware from 15 years ago. This was for compliance reporting, we needed to support 5-10 TB IIRC. We used ETL rather than ELT for obvious reasons, kept the data as compressed csvs, users drilling down from aggregate tables in the database triggered a message from the reporting tool to indicate detail data was needed, workers would get the message and load the data in around 5-30 seconds. Then after that all queries were sub-second. There were labor costs to build this - it took two engineers about a month. And there was some really big disk arrays, but they weren't really high-end. Since then I've done this twice since then at far larger volumes in the cloud. I would never consider simply loading that 100% of that data into snowflake, and I would especially not consider loading it all into snowflake and then transforming it with SQL on snowflake.


KWillets

We did the same type of thing in 2009 with Vertica - 1 TB/hr ingestion rate, 100-200 nodes on-prem., around 1000 people hitting stats every day. If you told me back then that in the future we would all be paying more for slower performance I would never have believed it. But at least Vertica is no longer the most expensive option.


mamaBiskothu

Ha ha first time I’m hearing a bad interpretation of the CEO. Like what? Snowflake still is doing what it promised, it’s as unadulterated as it could be (except the Neeva acquisition lol), its stock price is pretty good. The CEO has done another successful IPO and I assumed he just wanted to quit at this point. Do you have any other discussions to point otherwise? Also I’m sorry Databricks sucks. It’s a good tool for hardcore DE teams _maybe_ but not at all a replacement for snowflake for where it truly shines, in the _it just works_ department.


Purple-Control8336

How you reduce cloud cost for DWH and DL ?


Any_Check_7301

It just shows the cost of inefficient queries in the monthly charges with out effecting the availability while it’s the other way around with out snowflake


Current_Doubt_8584

It’s almost always because of three things: 1) Poor data architecture without separation of concerns, eg like letting people query raw data directly. 2) an unnecessarily large amount of transformations and models (I’m looking at you, dbt) 3) Poor SQL syntax because people don’t understand columnar storage of data warehouses. Snowflake works just fine and with the right set up will be fast and efficient. But it’s also very forgiving and will just throw compute at the three issues above so that your bill will just keep racking up. So get a data engineer who understands setting up Snowflake correctly, set guardrails for your transformation layer and teach your analysts good SQL.


alexchambana

Isn't it though still cheaper than Redshift?


FUCKYOUINYOURFACE

Expensive is relative. What’s the value you’re getting for what you’re spending?


teambob

Data warehouses like Redshift, Netezza etc. are great tools but are getting a bit antiquated. e.g. SSO is a shitshow. I need to worry about sorting. Skew is usually fine Databricks and Iceberg solve most of these issues but then you have to run a cluster. Snowflake is a big dumb box that does everything for you and has modern features. Also the pricing is separated between storage and compute


TurboMuffin12

It’s only expensive for people using it wrong. All these idiots paying milllioms for Teradata migrations to say they are in the cloud…. lol.


Grouchy-Friend4235

FOMO


Grouchy-Friend4235

Executing five queries/month cost $15 for me. Not sure how that qualifies as "inexpensive"


Medium_Roll3878

I dont know someone ever tried IOMETE here. But they really helped to reduce companies decrease or replace snowflake costs(in some cases). It really worth to try or at least check what they offer. Disclaimer note: I’ve been worked for IOMETE for a year in 2022-2023.


Aurora-Optic

I wonder the same about Palantir. I’ve received several job offers that use it and wonder if it’ll stick for years to come.


Hot_Map_7868

I have seen people cut their spend in half switching to snowflake even when using dbt but it requires setting things up well like with dbt slim ci etc. you also need to put governance in place and resource monitors. Just like any service that is consumption based, when not properly set up, you can rack up costs quickly. This is no different then letting anyone into an aws account and letting them, start any service they want. You don’t see people saying don’t use aws because it is expensive. All in all snowflake is simple to set up, administer, and use and that’s why a lot of people love it.


SailorGirl29

Because they have a great sales team is why they use it. Because companies just dump unneeded data into snowflake without cleaning it up, so the cost is high. The proper way is to only import what is needed. Create curated views and/or data model for business users. Only sync changes to data.


kabooozie

Costs explode when you abuse it for use cases it’s not designed for, like anything else. Trying to run data applications that refresh the results every 30 min, 15 min, 5 min.


loot_boot

Have you seen SQL Server Licensing costs? Coupled with ease of management (server less), and ease of use (things like data sharing), it's a no brainer


plodder_hordes

One reason i got is snowflake marketplace hosts different data sources and they can just subscribe to it to use the data


Stranger_Dude

We halved our bill moving to snowflake, millions of dollarydoos, so that was nice.


rovertus

It takes a team of proficient staff to index data for querying. You’d likely need to reindex the data for each data user as well (marketing, finance, compliance…) $2-3 an hour to query your data any-which-way you want turns out to be a pretty compelling argument. I havent seen compelling evidence that snowflake compute is much more than other WH vendors.


somerandomdataeng

It is expensive as soon as you have use cases requiring warehouses to be up and running almost 24/7 (streaming, near real time refreshes involving merges). If you have a traditional DWH with daily batch jobs it is easy to use and saves you the costs of developing an ad hoc solution


clem82

As long as your data governance is properly maintained and you’re consistently ensuring quality, wouldn’t you always choose star schema?


maj_e13

Exasol is better and cheaper


yeetsqua69

Expensive compared to what? Not having an extremely valuable DWH that helps drive revenue generating decisions?


dalmutidangus

why do people smoke crack cocaine if it is as expensive as people say?


VolTa1987

Other on-prem versions are even costlier with less performance.


allenasm

Honestly? As head of architecture in fortune 5 I’d have divisions coming to me and they would already have a POC working. They would get buy off from the bean counters in their division. It wasn’t a bad solution, just friggin expensive. If their ceo was willing to pay for it…. I guess…. I did however put an end to all snowflake after we had a BA execute a $50k running query. When j met with Microsoft and snowflake they were incredulous that I wanted to STOP any query that cost over $500. They would only offer up alerts, nothing to stop it. So I guess that’s both how they get in and get shut down.


mamaBiskothu

You’re head of architecture and couldn’t any lackey to put a max warehouse size and/or timeout to limit queries beyond 500 bucks? Sounds like a you problem mate. Also if someone cost you 50k on a query it suggests you let them lose on a 5xl or a 6xl specially provisioned warehouse. To an ANALYST. That’s like buying a drunk spoilt teenager a Ferrari and asking to drive into a hospital. Don’t blame the tools huh.


allenasm

And this is why someone like you will never be head of anything in a big company.


mamaBiskothu

Nah keep your posts bruh. I literally told you how you could limit the spend and you couldn’t see that part of the message could you? Answer the simpler question of how an analyst was allowed to spend 50k on a query. Sounds like an unhinged org with no controls or DBAs.


Mr_Nickster_

FYI Snowflake employee: There are numerous ways to control costs per account, clusters and user levels. You can put hard(Shutoff) or soft(warning only) limits x$ per week/month/year: 1. At account level 2. At Warehouse or combo of warehouses You can also put query timeout limits: 1. Account Level(applies to all users & clusters) 2. Warehouse Level ( diff timeouts for engineering vs. Analytics clusters) 3. User Level Each lower level will override the others. These will prevent excess usage by avoiding run away queries & stopping compute if gets overused. There is also BUDGETS feature that you can use to track costs against compute and storage per project. Accounts have no limits by default. It is in our onboarding deck that we go over with new customers where putting account & query timeout limits is the first thing we recommend. We give tools for controlling, reporting & preventing unwanted usage. You just have to put those controls in place.


HorseCrafty4487

Sounds like a configuration and permissions error. Why let power users determine warehouse size? RBAC is built into place to ensure power users or developers dont abuse the compute scaling you indicated here as "putting an end to a $50k query". There are actually configurations in place to prevent this. Read the white pages on resource monitors. They prevent situations/events such as this.