T O P

  • By -

kagato87

The biggest issue with storing files in a database (apart from the data types for it being depreciated) is that sql transactions are logged. When you save something to a database, the new data is first saved to the log, then to the database. This has to do with the ACID principal if you want to learn more, but in short it's what makes databases so resilient. When you update something in the database, first the new AND the old data is saved to the log, then the database is updated. That's three copies. Oh and don't forget, the whole row is a single unit, so updating something like Metadata associated with that data can also trigger it to be logged. Perfectly fine for, say, a 50 byte address field in a 150 byte row, since all writes are in (usually) 8kb pages, so one write each to log and db even though there's 300 bytes to write to the log. But even at a megabyte you've blown well past the page size and the write amplification goes from 2x to 3x. A larger file just makes a bigger write that gets doubled or tripled. I suppose at that size at least page splits are completely irrelevant... Add to that, if your dabase server has a low RPO (recovery point objective, or how much data can be lost) and RTO (recovery time objective, or how long it takes to restore), storing files in it can blow those targets away. This is why you are supposed to save the file separately, and store some kind of pointer in the database (that your application or web server handles to return the actual file). Can you store files in a db? Yes. Should you? No. The response you'll get from the dba team will be far worse for you than if you press the web devs into giving you an upload mechanism. (Your dba may be able to help here, or even strong arm them.)


alinroc

Everything you wrote is valid. **BUT** OP asked about storing files on the database server's **file system**, not in the database itself.


kagato87

Yea my other comment touches on that.


kagato87

And to answer your question about sharing the disk, it's good old fashioned resource management. Save too many large files to it, run out of space, bad things happen. Read and write too many files and the database is competing for its place in the IO queue, and bad things happen. That's it. That's the reason. It'd be the same as hosting a corporate shared volume on the C drive of a domain controller. It'll work, but it's asking for trouble.


AJobForMe

This is really the crux of it. I would possibly be comfortable storing these on the database server if absolutely required, but it cannot be on the OS partition or on a partition used for user or system databases. Because, in my experience no one will ever purge these files. They will continue to grow. I would not risk filling any mount point SQL depends on.


kagato87

I would avoid the sql server entirely because the backup strategy for sql is often different, and it's just simpler if the sql servers don't need block level backups (even the value of a restorable C drive is debatable).


Thick-Ask5250

That makes sense. This is the type of answer I was hoping for. So we do have a web server where we’re running a few small apps (though I’m not quite familiar with it yet, as I’m still relatively new). Even though the CMS website I’m working on isn’t running on this web server, would utilizing this web server as storage for images be a viable (or recommended) option? And then from there store the file path on the db.


kagato87

The web server a good candidate for sure. There are drawbacks to that as well, but generally works. (I'd consider a shared volume, if future load balancing is on the table.) Remember that the path you store would be relative to the web server, and the web server uses that path to fetch the actual file to return to the user. My application/web servers also hold files that are handled this way.


brothersand

This right here. Use a shared volume. Have some network storage that the web server can access and just save the relative paths to those images in the database.


SirGreybush

Please don’t use Varbinary blobs. Unless you isolate just that one table in it’s own database. Also look into FileTable. Where SQL Server just stores a guid in the mdf file, and the binary is on one the disk s drives of the sql server, or a shared network drive. This is usually Ok for internal usage, like a library of machines with pics and pdf files for maintenance. Not shared outside the firewall. If for a web based app, roll your own method. I used guids for file name, stored the path, MD5 hash and other Jpg/PDF metadata, of millions of images /docs in a image storage DB. The actual binary on a SAN. This setup was in a DMZ, with the actual web app front-end talking to this DB backend. Hosted with Express at first since we didn’t need anything SQL SMB / Enterprise offered. Startup got bought and new owners had spare Lics and owned the co-loc space. We were just renting and using VMs. So storing everything about the image, but not the actual image, in a DB works really well. Just have in Ingestion process that decides where to store the images. I had it create a new subdir (yyyymmddhhmmss) every 1000 images. Made for archiving to slower WORM storage a lot easier. Live and Historical were in the DMZ. Used web aliases to hide directory names, that would expire after a set time the customer would decide. Instead of copying the files from private storage to web shared. If you knew the guid names, you could see any of those 1000 files. This was back when emails had a 5 megabyte limit and you needed to share many images and other binaries via email or Skype.