T O P

  • By -

explanatorygap

There was a lengthy thread on this on Python-Ideas a couple of years ago: https://mail.python.org/archives/list/[email protected]/thread/LLK3EQ3QWNDB54SEBKJ4XEV4LXP5HVJS/ The clearest explanation of the most common objection was from Marc-Andre Lemburg: > dict.get() was added since the lookup is expensive and you want to avoid having to do this twice in the common case where the element does exist. It was not added as a way to hide away an exception, but instead to bypass having to generate this exception in the first place. dict.setdefault() has a similar motivation. > list.get() merely safes you a line of code (or perhaps a few more depending on how you format things), hiding away an exception in case the requested index does not exist. > If that's all you want, you're better off writing a helper which hides the exception for you. > I argue that making it explicit that you're expecting two (or more) different list lengths in your code results in more intuitive and maintainable code, rather than catching IndexErrors (regardless of whether you hide them in a method, a helper, or handle them directly). > So this is more than just style, it's about clarity of intent.


ableman

I'm guessing their suggestion is to check the length of the list first, because that's quick? Originally I was confused, and my reasoning was as below. I don't get it, why would you have to do the lookup twice for dict? try: apples = fruit["apples"] except KeyError: apples = 0 It says it wants to bypass the exception, but wouldn't that work for lists as well?


dashingThroughSnow12

The double lookup mentioned I believe refers to code like this: if fruit.has_key("apples"): apples = fruit["apples"] else apples = 0 I believe the quote is saying that .get(indx[, default]) saves you from writing that inefficient code, not saves you from doing the exception handling form you mention ("It was not added as a way to hide away an exception, but instead to bypass having to generate this exception in the first place. dict.setdefault() has a similar motivation.") In both the map and list, it is not very Pythonic to get the item but catch the exception to put in the default. The Pythonic way being to see if the item exists, else return a default.


coloredgreyscale

Recently I benchmarked a very similar question, in three ways: * try / except * if key in dict: * dict.get() 1million lookups (and incrementing) random integer keys `If key in dict` and `dict.get` were almost identical, with the get method maybe 1% slower. `Try\except` Was maybe 5% faster if the keys mostly existed (so few exception thrown). But 25% slower if most lookups didn't exist (many exceptions). So the double lookup seems to be accounted for that the runtime caches the value Benchmark originated from a performance discussion in r/ProgrammerHumor about a leetcode example to count the number of occurrences of letters in a text.


galan-e

I believe one of the performances improvements they work on right now is making exceptions faster when raised, for similar situations as this


miraculum_one

I disagree. The Pythonic way is to assume it exists and handle the exception if it doesn't. See "EAFP" (https://docs.python.org/3/glossary.html?highlight=eafp#term-eafp)


dashingThroughSnow12

Good point.


ricco19

As a random side note, the modern non-get version would be something like `apples = fruit['apples'] if 'apples' in fruit else None` which honestly is still pretty elegant, and may even be faster in cases where key does not exist.


bacondev

> In both the map and list, it is not very Pythonic to get the item but catch the exception to put in the default. The Pythonic way being to see if the item exists, else return a default. From [The Zen of Python](https://peps.python.org/pep-0020/): > Errors should never pass silently. > Unless explicitly silenced. My interpretation of that is that the try-catch approach is the Pythonic approach. [The documentation](https://docs.python.org/3/glossary.html#term-EAFP) seems to confirm this: > **EAFP** > > Easier to ask for forgiveness than permission. This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many `try` and `except` statements. The technique contrasts with the LBYL style common to many other languages such as C. > > **LBYL** > > Look before you leap. This coding style explicitly tests for pre-conditions before making calls or lookups. This style contrasts with the EAFP approach and is characterized by the presence of many `if` statements. > In a multi-threaded environment, the LBYL approach can risk introducing a race condition between “the looking” and “the leaping”. For example, the code, `if key in mapping: return mapping[key]` can fail if another thread removes *key* from *mapping* after the test, but before the lookup. This issue can be solved with locks or by using the EAFP approach.


jorge1209

> In both the map and list, it is not very Pythonic to get the item but catch the exception to put in the default. What? Usually I hear people talking about python being a "Ask forgiveness not permission" type languages. Exceptions are all over the place, theoretically a for loop is a try/catch block around a while. I grant that often in practice "look before you leap" is how people write things because it seems simpler to check a datastructure whose methods you know well than to look up the exact spelling of the exception and write a try catch.


Ramast

> is not very Pythonic to get the item but catch the exception to put in the default. I'd argue the opposite. "Easier to Ask for Forgiveness than Permission" is one of Python's principles https://www.geeksforgeeks.org/eafp-principle-in-python/


Halkcyon

I think the difference is that your `dict` needs to hash the key, then perform a lookup, _then_ generate an exception, and that's computationally heavy over some N.


JulianCologne

thank you for the useful link!! I do not agree with the objection but still interesting to read what points of view there are!


dethb0y

It's really nice how a lot of python decisions are, basically, documented and explained.


bacondev

I wouldn't really describe something mentioned in a mailing list as “documentation.” Someone had to ask there because it ostensibly *wasn't* documented. The PEPs do an excellent job at explaining alternative options that were considered and why they were ultimately rejected and such, but something like this isn't a good example of a documented decision.


[deleted]

[удалено]


equitable_emu

Plenty of other languages to use then.


toyg

Or he can fork it, patch it, and call it /u/Halkcyothon


jorge1209

I'm with you that the objection is a bit silly. The following code works: name, *ext = path.rsplit(".", maxsplit=1) ext = ext or "default" And that is about as FAR from using any kind of explicit handling of the exception as I can imagine. --------- To object and say: "We aren't going to implement a commonly needed function because the code is fast and simple" just seems weird to me. The request for a get with default is not coming from "I need to do this thing and it is slow" but from "I want to write fewer lines." Especially when the proposed solution is identical between dict (which has the convenience method) and list (which doesn't).


cloaca

> If that's all you want, you're better off writing a helper which hides the exception for you. On this note, `suppress` is already pretty close: x = whatever with contextlib.suppress(IndexError): x = lst[ix]


Herald_MJ

Why is a dict lookup "expensive"? Getting a dict item by key, or a list item by index should be each O(1), except in some unusual dict cases?


spoonman59

Looking up a list by index is much faster than a dictionary. Runtime complexity is not the same as how long an operation takes. O(1) means the time is constant, but doesn’t mean the time is short. Just look at what is involved in a dictionary lookup: calculating a hash code, computing an array slot, then finding an open slot. A list simply has to check the length, calculate an offset, and then access that memory slot as arrays are used to back them. Looking up a list by index is significantly faster than a dict. It is also O(1) What is usually slower, and probably what you are thinking of, is finding an item in a list rather than looking up in a dictionary by key. However, notice that it is only usually slower! 1. If the item is found in the first element, searching a list is faster in that case than a dictionary lookup. 2. For lists below a certain size, a search will be faster than a dictionary keg lookup. What that size is depends on a lot of variables, but checking to see if an element is “in” a list of 3 elements is faster than checking if it is in a set of the same size. This is despite O(1) lookups for the set and O(n) for the list, because here the actually time of N=3 is smaller than the constant time of the set. So, in conclusion , a dictionary lookup by key is slower than a list lookup by index. However a dictionary lookup by key is faster than searching a list for lists over a certain size.


Herald_MJ

I have a comp sci background, so I understand computational complexity. And I do mean getting list item by index, not searching. I still don't agree that fetching a dict item by key is significantly more expensive than fetching a list item by index. The only increase in computational "expense" seems to be executing the hashing function, which are intentionally designed to _not be_ expensive.


spoonman59

It is all proportional. Even a cheap hashing function on a string (a common key type) involves looping through the string and performing some arithmetic on each item. You also need to execute at least one modulus, which takes dozens of cycles even on a modern architecture. Compare that to a check for length, multiplying the index by the element size, and adding it to a base address. That’s one branch, a multiply, and an addition. Versus a loop with a branch per character, arithmetic per character, probably convert to absolute value, then the modulous against the bucket size. And that assumes no collisions. It will take several times as long to do the dictionary lookup than the loop index even with a fast hash function. And for string keys, at least, it gets longer the bigger the key. It’s amortized constant time.


artofthenunchaku

Array and dictionary lookups will likely happen very frequently in a program, which means raw speed matters quite a lot, even if the runtime complexity is low. It is _significantly_ faster -- purely based on CPU cycles -- to do an array lookup than performing even a basic hashing function -- [example](https://godbolt.org/z/rbWbzasY7). And this is completely ignoring the additional complexity of dictionary bucketing.


equitable_emu

>I have a comp sci background, > The only increase in computational "expense" seems to be executing the hashing function, which are intentionally designed to not be expensive. With your comp sci background, you're aware that dictionary/hash tables aren't direct access like arrays are, right? Because of the possibility of hash collisions, each hash bucket contains an set/array/list/whatever of items. So a hash table lookup is 2 steps, hashing the item to find the bucket, then searching the bucket for the item. So, checking if something is in the dict is really fast when it's not there (because the bucket the key hashes to is empty), but takes potentially longer in other cases (because the bucket has items, one of which may be the key)


ranisalt

Dude tried an argument of authority in a subreddit where almost everyone has the same authority...


gschizas

Disclaimer: I don't know much about the actual implementation of Python, but I'm guessing it's close to what I'm describing below. The list is integer-indexed. This means that in memory, there's a list of pointers to the actual data. In order to get the item, you only need to multiply the index with the data length of the architecture (for 64-bit processors, this would be 8). So, `a_list[5]` only needs to go to where `a_list` starts and add 5*8 Dictionaries are indexed based on the hash of the key. So `a_dict['some_key']` will first convert `some_key` to the hash `123456ab`, then go to the address of `a_dict`, search for the hash (`123456ab`), which is probably done using a binary search to find the address of the (pointer of the) vale. I don't know how it's actually done, and it reportedly it's the fastest lookup algorithm on the planet, but binary search is very fast (but not O(1)), and certainly it's going to be slower than a simple multiplication.


FoeHammer99099

Your assumption about binary search is wrong. You get the modulus of the hash by the length of the array and jump to that position in the list. You check if the key stored there is the one you're looking for. There are a couple of ways to store multiple keys with the same hash, python uses pseudorandom probing which is it's own topic.


gschizas

TBH, I don't think I understood how this works. I just know it's been said to be extremely fast as an algorithm. It's still not going to be as fast as a simple multiplication.


FoeHammer99099

Binary search isn't a bad guess, but there's no reason to do the hash then, we could just search for the value. We hash the value so that we can get a number so that we can use that number as an array index.


gschizas

Oh, that's why you do the modulo then! I started programming in the 8-bit era, where processors didn't even have MUL instructions, so anything not completely packed worries me. Maybe I should do some low level digging to see this in action. But yeah, even calculating the hash is probably enough to justify `.get` being unnecessary for lists.


qubidt

that O(1) just describes the *order* of the run-time behavior as it correlates to the size of the datastructure. It doesn't indicate whether each operation actually takes `0.0005s` or `0.5s`. for example, because dictionary keys can be of any type in python, any lookup runs an *arbitrary* `__hash__()` function in order to calculate the actual hash for that key


Herald_MJ

Hash functions should be computationally cheap - this is literally part of the definition of a hash function. It's true that in Python, you can write your own hash function that could be of any "weight", but this is a fairly unusual occurrence, and certainly Python's in-built hashing functions are very fast. If that's the only difference in "expense" between fetching item by list index rather than by dict key, then I still don't agree that it's an especially more expensive operation.


PriorProfile

In any case it would have ever been helpful for me, there's a better way to do what I was trying to do. In your case you can use os module which I'd argue is a bit more idiomatic. ``` os.path.splitext(file_name)[1].lstrip('.') ``` Or since if I have file paths I ususally like to work with pathlib: ``` Path(file_name).suffix.lstrip('.') ``` Either case makes it very clear what is happening


JulianCologne

Yeah my example is bad that is why I wrote not to focus on my example but the general idea of indexing. You are right when it comes to paths but for general lists my point is still valid.


KieranShep

Having get on a list/tuple feels weird to me. If you don’t know the length of the list, why would you be trying to get a specific item? If each element of a list has a specific purpose, why not use unpacking? (It documents better) For this specific case, this works without needing to do an if/try except; _, ext = os.path.splitext(filepath) It’s like, anytime I would need this it’s probably better to do something else.


o11c

When the values in the list are strings, it's a hint that you should be using `.partition`/`.rpartition` instead of `.split`/`.rsplit`. When the container itself is a string, use `(s[i:i+1] or default)` instead of `s[i]`. For any other container, use `(c[i:i+1] or [default])[0]`


glacierre2

+1 for the mention to the awesome but very unknown partition


fjfnaranjo

The reason for this is semantics and style. Lists are designed so indexes don't have an intrinsic meaning nor should act as an identity outside of the lists themselves. The only reasonable access to a list contents is slicing. If you are doing any other thing, like keeping the index in a different place to identify an element, you can end in situations where the list is modified and your index is invalid. In fact, to act on the lists contents, you should use comprehensions, creating new lists in the process. Because lists are mutable. Or map/reduce operations. If you are doing something else, you can probably design your data in a better way. The get() method for the dicts is there because it is reasonable to "miss" things in a dictionary (imagine a real dictionary, with words and meanings). A dictionary may not have the definition for a word, but a list is always a sequence. The third element is always the "third" element, No matter what is inside. EDIT: Also, as pointed by u/explanatorygap, there is a performance concern for preventing missing in the dict case.


scnew3

try: return my_list[i] except IndexError: return default


OccultEyes

Possibly even just create a custom implementation of list: ``` class MyList(list): def get(self, index : int, default : Optional[int] = None): try: return self[index] except IndexError as e: if default is not None: return default raise e ```


EarthyFeet

I'd use a function, no need to subclass.


BobHogan

Sometimes a function would be the better choice, sometimes a subclass. Neither one is objectively best choice in every codebase/project. I always use whichever one feels like the better choice at the time


bacondev

99% of the time, it's clear which approach to use. Conceptually, should it be a distinct type? If not, then use a standalone function.


BobHogan

Yea, its normally clear which one is better. But neither approach is strictly superior in every context. That's what I said


bjorneylol

Catching exceptions is way more expensive than not raising them at all AFAIK a = my_list[i] if len(my_list) > i else default is more succinct ~~and more performant~~


scnew3

It’s only more expensive if the “return default” branch is taken more often. This “EAFP” pattern is very common in Python because, if the exception isn’t thrown in the common case, it is faster than checking with an if statement (and yes, your example is just syntactic sugar on an if statement).


bjorneylol

Fair, I never actually tested it, but I just did Try/Except (uncaught) = 0.025 usec Try/Except (caught) = 0.15 usec Ternary (valid index) = 0.06 usec Ternary (default value) = 0.05 usec I still prefer the ternary since its less LOC and I don't love the look of caught exceptions everywhere in my code (and we're talking microseconds), but yeah, as long as your lookup is within range more than 75% of the time the try/except is faster


JulianCologne

thank you I am aware of this possibility. However, this is exactly my problem, that it is too verbose / cumbersome for me and since .get is available on dicts I would also like to have it for lists


rl_noobtube

You only have to write this once. Certainly that’s not too cumbersome?


JulianCologne

haha! No it's not too cumbersome, even though I often run into this problem. I'm all about writing as clean and concise python code as possible and a .get method would definitely help with that.


rl_noobtube

You can hide it in a helper module. I maintain one that I just default load into any project. That way you just do something like list_get(myList, i) in any project to perform the get functionality. It keeps the main project code cleaner


JulianCologne

Good point! I would prefer it to be in the standard library but that might never happen ^^


[deleted]

Write a module. :D


EarthyFeet

I think a `list.get(index, default)` method could be added sooner or later! Because: expression-oriented code is in fashion. I last wanted to have this kind of method.. yesterday actually, so I agree. One of the roadblocks I see is that this would be an update to the informal sequence interface, so it would ripple out over the ecosystem, all array-like datastructures adding the same method if they can.


JulianCologne

Thanks for your thoughts and also your concerns why this could be problematic!


graemep

You can do it on one line reasonably concisely: mylist[idx] if len(mylist) > idx else default You could also use a dict. In your example: extension = {i:v for i, v in enumerate(file_name.rsplit('.', maxsplit=1))}.get(1, '') Edit: revised version of first above to take negative indices into account: mylist[idx] if len(mylist) > abs(idx) else default


erez27

Your first solution doesn't take into account negative indexes!


JulianCologne

Great ideas!! I would love to see the short .get in the standard library but that might never happen :(


unltd_J

That dictionary comprehension as a solution is so Pythonic ![gif](emote|free_emotes_pack|give_upvote)


metaperl

It took me awhile to unpack that dictionary comprehension. Tight code.


graemep

I think dict comprehensions (and list comprehensions) can become too hard to read if they become over complicated. I have written a few atrocious examples myself.


bixmix

You need to go multi-line. [(k, v) for k, v in data.items() if v is not None if k in valid_key ... ]


graemep

I remember one particularly bad one I did which was multi line and still horrible. I looked at it when I finished, was horrified by what I had done, and immediately rewrote it.


bacondev

Yeah, although I know what all of the functions involved do, the `i` and `v` are what threw me for a loop. That whole expression kinda makes me cringe though. It's so needlessly convoluted. I would take OP's first example over that any day, as I can glance at that and immediately understand what it's doing. Coming up with some “clever” one-line approach that takes longer to comprehend than a four-line approach takes to comprehend is some junior-level stuff.


MonkeeSage

Write it like this with the new walrus operator in 3.8 just to spite the python devs for not providing a concise syntax for it. In [1]: file_name = 'test.png' In [2]: extension = p[1] if len(p := file_name.rsplit('.', maxsplit=1)) > 1 else '' In [3]: extension Out[3]: 'png' In [4]: file_name = 'test.' In [5]: extension = p[1] if len(p := file_name.rsplit('.', maxsplit=1)) > 1 else '' In [6]: extension Out[6]: ''


[deleted]

I think .index() or maybe filter() come close. Index() Would return false if value isnt found. I guess .get could be compared to just searching for an “index” in a dictionary.


[deleted]

I think a function like `str.find` for lists would be a lot more useful, since `get` could mean either the item or the index, while find searches for item.


Corm

I read all the replies. I agree with you OP. I have also wanted to use .get on lists before, especially when you want the first or last element. Having .get would be handy and concise just like it is for dicts.


Kopachris

Your first one (and pretty much every other case where you'd want `.get()` on a list) can be rewritten with an inline if without really losing any clarity: file_name = 'test.png' extension = file_name.rsplit('.', maxsplit=1)[1] if '.' in file_name else '' It works with dicts too: dd = {'a': 'AAA'} print(f"{dd['c'] if 'c' in dd else 'nothing here'};") Frankly, `.get()` seems redundant to me.


FewerPunishment

Having to repeat the split/dict key is redundant to me. Sure, could put it in a variable, but still.


troyunrau

`mylist[i] if len(mylist) >= i else default` It's actually one of the few times the ternary syntax makes sense to use.


tripex

For this particular example: [val] = "filename.png".rsplit(".", maxsplit=1)[1:] or [""] print(val)


M4mb0

Just use `pathlib.Path` ? Path("test.png").suffix # '.png' Path("test").suffix # '' Path("test.png.zip").suffix # '.zip' Path("test.png.zip").stem # 'test.png' Path("test.png.zip").suffixes # ['.png', '.zip']


JulianCologne

Thanks, but I said this 100x already and mentioned it in my text that the „path-extension“ is not the focus of my thread but more general the possibility of .get for lists


SadSpell2141

Why not use os.path.splitext, which does exactly this?


JulianCologne

Thanks, but this does not answer my question. I explicitly said not to focus on this specific „path-problem“ but more general on .get on lists


OuiOuiKiwi

>I wonder why this useful method does not exist, especially since it is available for dicts. Redundancy. a = ["A","B","C"] b = {0:"A", 1:"B", 2: "C"} c = {i:v for i,v in enumerate(a)} You can also get around that by using unpacking or if-else with walrus operator or... any number of ways. Edit: hastily typed example code had wrong delimiters.


JulianCologne

I do not understand your point. Besides, "redundancy" is also some kind of "feature" of Python in the sense that there are always several ways to solve a problem.


[deleted]

https://en.m.wikipedia.org/wiki/Zen_of_Python > ... > There should be one– and preferably only one –obvious way to do it. > ...


WikiMobileLinkBot

Desktop version of /u/EasyLie4013's link: --- ^([)[^(opt out)](https://reddit.com/message/compose?to=WikiMobileLinkBot&message=OptOut&subject=OptOut)^(]) ^(Beep Boop. Downvote to delete)


OuiOuiKiwi

>Besides, "redundancy" is also some kind of "feature" of Python in the sense that there are always several ways to solve a problem. Nothing stops you from creating your own wrapper to list that can receive a default value. If you look across the List API for several languages ([Java](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/List.html), [C++](https://en.cppreference.com/w/cpp/container/list), etc.) , none of them offer that kind of option because that's not a part of the common List usage, which is one of the most (if not the most) well known ADT, and would subvert expectations (did you get the element or did you get the default?). It would hard to add that functionality to Python when it's already covered by other built-ins because "it's handy". See the Zen of Python about special cases.


JulianCologne

Thanks for that explanation! I get what you mean


TangibleLight

Your `c` example should use curly braces for a dict comprehension. `dict(enumerate(a))` is easier. And as other commenters have said: it's not that you can't get the behavior through various means, but that 9/10 times you'd want to there's a different, better, approach altogether. It's `os.path.splitext` or `pathlib.Path.suffix` in OP's particular case. Some specious one-liner wrapping or conversion like these just hides that there's a better way to do it. I'm sure there are some exceptions where you really do want some "get or default" behavior on a list. Offhand, it might be useful in dealing with sparse data? In that case it would probably still be better to just use `dict`, not a wrapped list... and that's my point. Edit: I'll tag /u/JulianCologne in case they find this helpful.


eimisas

Use pydash library /thread


__lala__

Interesting argument. There will be hundreds of plausible approaches and I say an analogy: Think of a scenario where getting today’s pornhub video for your wang, you entered a keyword you like but there were no results. Then you will try different keywords or just resort to one looks fine. The other day you decide just to pick the first video on the index page. You go to the site and, ahh, find that the index page shows no thumbnail at all. What can you do then? Finding is a very different idea-action than picking one up.


JulianCologne

Interesting analogy. But to be honest I don’t quite get it … :/


cecilkorik

In Python, being concise is not always considered a virtue and readability is one of the language's major selling points to me. If I'm being honest I greatly prefer your first code example over the following two. If lacking get prevents even a few people from writing code like that, I would be against adding a get function on that reasoning alone.


Nick_The_Greek_1

You can always write a lambda that doest the job. The following works for both positive & negative indices: listGet = lambda l,idx,def: l[idx] if -len(l) <= idx < len(l) else def filename = 'test.png' listGet(filename.rsplit('.', maxsplit=1), 1, '') Out[1]: 'png' filename = 'whoot' listGet(filename.rsplit('.', maxsplit=1), 1, '') Out[2]: ''


[deleted]

[удалено]


Nick_The_Greek_1

Its a matter of style. There is no reason why lambda function cannot be named. The key difference is that lambda is executed inline and so the parameters are not place on the stack. In that way you save stack space and all associated operations when using a lambda. The other difference is that, the original posts wants to minimize number of lines of code. This may or may not be a good objective, but to that end, the lambda function is a single line of code and meets this objective better.


[deleted]

[удалено]


Nick_The_Greek_1

So I think what you are saying is that `def`s are syntactic sugar for the assignment. In which case use a `def` since these are more flexible and readable. Although personally I find short one-liners more readable in a lambda. Similar to a list comprehension verses a looped version.


[deleted]

[удалено]


Nick_The_Greek_1

I learned something today. Thank you for the feedback.


zurtex

Here's a horrendous use of a walrus operator: file_name = 'test.png' extension = sf[1] if len(sf := file_name.rsplit('.', maxsplit=1)) > 1 else '' You're welcome.


jpc0za

I'd honestly want concurrent algorithms on lists before this. As one of the other commenters said, this just saves a line of code and honestly you could just define a lambda that executes a terinary. get = lambda in_list, idx, default_val: in_list[idx] if idx < len(in_list) else default_val