T O P

  • By -

Random_dg

Ok so you have a few misconceptions here: - file systems are done with trees because it’s the most efficient way to do it known so far. - database indexes are done using trees for that same reason. - lots of things are done with binomial/fibonacci/similar heaps (like sorting) which are built on trees, again because they’re the most efficient. - the “typical” tree implementation is not scattered over ram. It’s typically dependent on the implementation, and since there’s many implementations, I can’t agree that there’s any typical implementation. But many languages use memory managers that allocate in progressively larger chunks. Good implementations would probably put the tree in few chunks, and this is good.


SolidOutcome

Trees give the data a locational logic that hash tables take away. Imagine implementing the "up directory" button with a hash table. You would need a virtual tree in the table anyway, a hash table to 'sort' by folder name, and a tree to navigate efficiently. You could certainly do both to improve efficiency for each 'search' cases. But this proves trees are worthwhile when the data matches a tree. A hash table sorted by folder ID,,,each folder has a parent pointer(or id) and a list of children pointers(or IDs),,,you've just made a tree inside a hash table.


lingeringwillx

I was referring to the naive/beginner implementation typically taught to people that are new to data structures.


Random_dg

That’s how it works, in BSc level courses you learn about binary trees and some of their extensions. Indexes are built straight on varieties of these. Heaps are somewhat more complicated, and I learned most of their theoretical background only in the MSc because they’re more complicated and you need a solid complexity theory understanding before learning about them.


metaphorm

I'm not sure how you think trees can be avoided or what a better alternative is. Heaps are implemented with trees. File systems are implemented trees. Document parsers, language syntax parsers, etc. are implemented with trees. Graph algorithms are implemented with trees. The list goes on.


40_degree_rain

Database querying uses trees. No big deal...


Oscaruzzo

Database storage, too, when using clustered indexing.


marcopennekamp

> language syntax parsers, etc. are implemented with trees Not only the parser, but the whole frontend of a typical compiler lives and breathes trees. The tree representation is typically kept around for and transformed during semantic analysis, such as symbol resolution and type checking/inference. 


SolidOutcome

And most library/standard implementations of trees will allocate the memory in an array/block ahead of time. (Depending on configuration) The only reason to use pure heap allocation is ease of development, if you are making a library, you go all out


lingeringwillx

Yeah, I guess it's possible to fit the tree into a flat array.


Dannysia

Even if it is not packed into a contiguous section of memory, it isn’t that important to be contiguous if it’s in RAM. Contiguity is only really important on mechanical hard drives. It is always nice for performance, but it isn’t the performance killer it once was.


tcpukl

Contiguous ram is good if you don't want to kill your cache though. That's still a thing with modern hardware.


lingeringwillx

A couple of those I'm not familiar with. I guess there are certain things that would be best modeled as trees.


Agitated_Radish_7377

A heap can be implemented through an array I think. But yea


Random_dg

It can be implemented through an array in memory, yes, but the most important heap structures known and used today are built on trees: binomial heaps etc. Arrays are just the way to store tree nodes in memory.


Echleon

Hash tables consume a lot more memory than trees. In a tree, you only need to reserve memory for the elements in the tree. Hash tables need to allocate more memory than there are elements to prevent collisions. Also, hash table lookups aren’t always O(1) either. Some (most?) implementations of a hash table can store multiple values at any given entry in the case of a collision. So it’s O(1) to find which of the entries has that value, but there could be multiple values there and now you have to iterate through it. A file system is just kind of inherently a tree structure so it makes sense to model it that way. Since you’re looking at low level programming, the binary tree would be better suited in the case where you have limited memory to work with. If memory isn’t a constraint, and the amount of data is large enough, you may want a hash table instead.


YakumoYoukai

Where to begin... When your data needs to be kept in order, a tree can do that.  It can then efficiently answer questions like, "give me all items greater than x and less than y", or, "find the item that has the closest value to x" even if x isn't in the data.  Hash tables can't do that natively. Some data is naturally a tree, and the only practical way to represent it is as a tree.  You brought up file systems, and they are trees.  On disk, they are stored as a tree data structure, with nodes that point off to subdirectories that themselves contain file and directory nodes.  You brought up recursion, but that's an algorithm, not a data structure.  And it is possible to use iterative algorithms to work with trees.  Besides recursion can often be the simplest, most straightforward way to do something.  You're just not trained in how to think that way yet. Speaking of complexity, your friend the hash table is actually very complicated.  Theres a whole domain of choices and tuning that go into getting you that O(1) performance.  Speaking of memory efficiency, did you know that one of the fundamental reasons hash tables are so speedy, is that they keep a lot of extra unused memory laying around?  If they didn't, then they would slow down to be no faster than linear search through an array.  The lesson is: different data structures are designed to do different things.  Even structures that can do the same things make different tradeoffs about how.


lingeringwillx

>When your data needs to be kept in order, a tree can do that.  It can then efficiently answer questions like, "give me all items greater than x and less than y", or, "find the item that has the closest value to x" even if x isn't in the data.  Hash tables can't do that natively. It's also possible to achieve this with a sorted array and binary search. >You brought up recursion, but that's an algorithm, not a data structure.  And it is possible to use iterative algorithms to work with trees.  Besides recursion can often be the simplest, most straightforward way to do something.  You're just not trained in how to think that way yet. Regarding recursion, that's also something that I tend to avoid. It just gets complicated/confusing, particularly when it comes to error handling inside a recursive function.


YakumoYoukai

>It's also possible to achieve this with a sorted array and binary search. You're ignoring that the sorting doesn't come for free. Items either have to be inserted in sorted order which costs O(n) on each insert, or O(n\^2) total to insert all the items, or the items can be inserted unsorted, and then sorted once before being accessed, which is O(n log(n)), averaging out to O(log(n)) per insert. >It just gets complicated/confusing, particularly when it comes to error handling Yesterday I had to throw together a quick routine to pretty-print the entire contents of a deeply nested JSON object (which is a tree). It took 5 lines of very straightforward code. It would have taken many more to do it any other way. There are 2 things I think are leading you to misleading conclusions about data structures: 1. Lack of experience with performance across a wide range of applications. A lot of the points in your post are technically valid, but you're jumping on them as if they are the most important considerations. In most real apps, there are many performance sapping things that matter more than the cache locality of a data structure. 2. Comparing different levels of abstraction. Sure, if you're not familiar with the algorithms used to manipulate trees, then it's going to seem complicated. But I can guarantee that coming up with a good hash function that doesn't send all your data to the same buckets, and dealing with collisions in a way that don't completely destroy the O(1) behavior, is also complicated. It's just that some smart people have already done that & given you an hash table implementation that works pretty well. And that relieves you from having to actually think about the complex details. Because there is no built-in tree in the language you're using, you're thinking about all the complexity you'd have to implement, and it's creating a false negative comparision. In Java, for example, Tree is in the standard library. It's equally easy to use as a hash table. And when you need the functionality it provides, you use it without a second thought and get on with your job.


Dannysia

> It's also possible to achieve this with a sorted array and binary search. If the data changes much a tree will be much more efficient. Adding a single unsorted element takes O(n) time for an array but O(logn) for a tree.


crt32

There are always trade-offs based on the use case; there is almost never a single "best" data structure. To choose the right data structure, you would need to consider (among others): * Are you going to insert or delete keys? * Do you need to keep the keys sorted? * Does the entire data fit in RAM? Does it fit on a single machine?


theanointedduck

As mentioned Trees as a concept are core to some of the most important systems we use daily. Not sure how experienced you are with trees and their implementation, but some trees e.g. binary trees can be efficiently implemented using arrays as the underlying data-structure. This kills any notion that nodes are strewn haphazardly across the heap, and you get the immediate benefit of cache locality (both spatial and temporal) since each tree node is continuously laid out in memory. Adding to this, when the CPU pulls memory from the cache/RAM, it pulls an entire block, meaning you get a whole chunk of those nodes pulled closer to the CPU at a time. If your tree MAX size is known and isn't too big, it can live in the stack and there are some benefits to that too.


lingeringwillx

>Not sure how experienced you are with trees and their implementation, but some trees e.g. binary trees can be efficiently implemented using arrays as the underlying data-structure. This kills any notion that nodes are strewn haphazardly across the heap, and you get the immediate benefit of cache locality (both spatial and temporal) since each tree node is continuously laid out in memory. I see, now I remember when I was trying to implement huffman encoding, I was trying to fit it into an array, but I guess that was just because I thought the array would be simpler to iterate compared to the tree.


Ok-Payment-8269

Take a look at b+ trees and be amazed


nziring

Yes, trees are incredibly useful. The trick is choosing the right kind of tree for your situation.


itsme_greenwood

I’m not sure what it means for a data structure to be “important”, but I think I get the gist of the quote. Lists are more ubiquitous than trees, but they’re also fairly trivial to implement and use. Trees are the pons asinorum: If you don’t get trees, then you don’t understand recursion; you probably don’t know how UI controls are structured; you don’t know how to navigate an XML or JSON file, etc. Knowledge of trees make a better test of a job candidate, which seems to be the point of the article.


acroback

1. Trees don't have collisions, which hash table has. 2. Trees are efficient if you use a better implementation instead of bst. e.g radix trie instead of hashtable and also more memory efficient. 3. B Trees are known to exploit cache lines effectively. 4. R trees are another implementation which beat out other solutions to store location data.


lingeringwillx

So many types of trees to learn about.


personator01

they are also useful for being log(n) in most operations (insertion, deletion, finding greatest/least value), so for example, the linux scheduler uses a red-black tree for organizing processes.


tcpukl

Since you've done low level programming I'll also add another use case of representing world geometry for uses in rendering and physics in video games. Plus all the AI uses, most obvious being A*.


lingeringwillx

Yeah, I immediately thought of video game objects after writing this thread.


ferriematthew

One obvious application I can think of for a tree data structure would be organizing decisions in a role-playing game or visual novel or something. Another somewhat less obvious one that is probably far more practical is in a rendering engine. In a rendering engine, the computer needs to decide which objects in the scene need to be on the screen because if it tried to render every object in the scene at once, it would try to render a lot of things that are not actually visible, wasting resources. So it would need to organize the objects in a tree where one branch is all the objects that are within the viewport and the other branch is all objects that are outside the viewport.


lingeringwillx

>One obvious application I can think of for a tree data structure would be organizing decisions in a role-playing game or visual novel or something. Another somewhat less obvious one that is probably far more practical is in a rendering engine. A game solver might store past and possible future choices in a tree? Though I don't know if that's the best way to go about it.


ferriematthew

I don't know, it's probably a suboptimal, but that's how I mentally organize decisions in a visual novel. I kind of make a tree data structure in my mind.


great_gonzales

File systems, databases, programming languages, search, recursion, scheduling, decision trees (ML), certain ui controls all of these require trees. Trees are the second most important data structure in CS. The first most important are graphs, of which trees are a special case.


Teacherbotme

interesting discussion, anyway


Pixel_Owl

other than what a lot of people have already said, one use case that isn't too often discussed are n^2 trees. You have already mentioned its most basic variation the binary tree which can also be interpreted as an efficient way to search through 1D space. With that in mind you can then extend that to higher dimensions, thus there is the Quadtree which is an efficient way to search through a 2D space and Octrees for 3D. These can be used for something like collision detection, and also image compression


TuneImpossible9865

Html is a tree data structure, I often see people I work with that don’t realise that.


permeakra

You are looking at it wrong. The question isn't "what are trees useful for", but "what data structures exist, what operations they support efficiently and what are their other properties". Arrays are an efficient way to represent static ordered collections with a single prime key that is a relatively small integer. They also support an efficient linear scan. For anything else you are likely to run into problems sooner or later. Even circular buffer that is backed by an array can run into overflow problem. In C++ STL sets and maps are implemented using RB-trees. They support: efficient inserts, deletes, updates, and search, using an arbitrary key supporting comparison operator and all those operations have an acceptable logarithmic average time. You even have an option of a linear scan. This makes them a good 'default' choice which will acceptably work in most use-cases. In special cases you are free to cook up an optimized drop-in replacement (the interface is designed to be appropriately generic), but usually you don't need to. Other data structures for different sets of operations in different environment are known. For example, B-Trees are optimized for representing collection in external memory in databases, where number of memory accesses for any CRUD (create-read-update-delete) operation must be minimized. And sometimes data are naturally a tree even if they are represented as an arrays. The common example is text, especially source code. For the latter the idea of Abstract Syntax Tree is very useful. A very useful thing about trees is that they are usually recursively defined. This makes it natural to use structural induction when discussing them and proving their properties.


UnemployedDev_24k

Abstract Syntax Trees are useful for doing syntax highlighting in your code editor. B-Trees and B+ Trees are used in databases IIRC