Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trialwillyraider
6,550 PointsIncremental order in for loop
Is there an incremental order in python for "for loops"?
Because if I execute the attached counts.py several times, the order of the output changes.
Output:
treehouse:~/workspace$ python counts.py
apples
bananas
coconuts
treehouse:~/workspace$ python counts.py
apples
coconuts
bananas
treehouse:~/workspace$ python counts.py
coconuts
bananas
apples
# You can check for dictionary membership using the
# "key in dict" syntax from lists.
### Example
my_dict = {'apples': 1, 'bananas': 2, 'coconuts': 3}
my_list = ['apples', 'coconuts', 'grapes', 'strawberries']
# members(my_dict, my_list) => 2
def members(my_dict, my_list):
count = 0
for key in my_dict:
print(key)
if key in my_list:
count += 1
return count
members(my_dict, my_list)
3 Answers
Chris Freeman
Treehouse Moderator 68,441 PointsThanks Josh Keenan for the mention. In general, I research this info as needed and don't have it "on recall".
In simple terms, the items in a dictionary are stored in "slots". Which slot to use for each entry is determined by a hashing algorithm that returns an integer based on the key, the value, and some "secret keys". This integer is used to pick the slot to store the item in the dictionary. The secret keys change for each run of python.
Because hashing is used beyond dictionaries (such has obscuring original data values), the exact hash value returned is not deterministic. That is, the hash of a specific item will change between instantiations of python, but will be the same for the lifetime of that object with a python session. So each time you run your program the resulting hash values of your dictionary keys will change. This may change the slots where the values are held. The slot order is determines the print order.
A dict is initialized to 8 slots and is resized when it reaches two-thirds full. With 8 initial slots [0-7], a binary mask of 0b111
will extracted the lowest 3 bits of the hash value which can map to the slot number. Using a function to extract the binary value of the hash integer to handle negative binary numbers.
For example:
## Run 1
$ python
Python 3.4.0 (default, Jun 19 2015, 14:20:21)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def to_twoscomplement(bits, value):
... if value < 0:
... value = ( 1<<bits ) + value
... formatstring = '{:0%ib}' % bits
... return formatstring.format(value)
...
>>> # hash 0
... hash('0')
-5983707641340658336
>>> # binary value
... to_twoscomplement(64, hash('0'))
'1010110011110101100110011001101001101110000110010111110101100000'
>>> # mask for 8 slots
... to_twoscomplement(64, hash('0'))[-3:]
'000'
>>> # slot value
... hash('0') & 0x7
0
>>>
>>> # hash 1
... hash('1')
-1529977540348658223
>>> # binary value
... to_twoscomplement(64, hash('1'))
'1110101011000100011011011000100010000001001100010011110111010001'
>>> # mask for 8 slots
... to_twoscomplement(64, hash('1'))[-3:]
'001'
>>> # slot value
... hash('1') & 0x7
1
>>>
>>> # hash 2
... hash('2')
-3394813998062570398
>>> # binary value
... to_twoscomplement(64, hash('2'))
'1101000011100011001100101010011110111110111101001111010001100010'
>>> # mask for 8 slots
... to_twoscomplement(64, hash('2'))[-3:]
'010'
>>> # slot value
... hash('2') & 0x7
2
# Above hash values predict a dictionary will print the keys in order: '0', '1', '2':
>>> d1 = {'0': 'a', '1': 'b', '2': 'c'}
>>> d1
{'0': 'a', '1': 'b', '2': 'c'}
## Run 2
$ python
Python 3.4.0 (default, Jun 19 2015, 14:20:21)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def to_twoscomplement(bits, value):
... if value < 0:
... value = ( 1<<bits ) + value
... formatstring = '{:0%ib}' % bits
... return formatstring.format(value)
...
>>> # hash 0
... hash('0')
8677678239253310006
>>> # binary value
... to_twoscomplement(64, hash('0'))
'0111100001101101010011100110000111011101110101001010101000110110'
>>> # mask for 8 slots
... to_twoscomplement(64, hash('0'))[-3:]
'110'
>>> # slot value
... hash('0') & 0x7
6
>>>
>>> # hash 1
... hash('1')
-8884389653283825069
>>> # binary value
... to_twoscomplement(64, hash('1'))
'1000010010110100010011101010111010001101110111001001101001010011'
>>> # mask for 8 slots
... to_twoscomplement(64, hash('1'))[-3:]
'011'
>>> # slot value
... hash('1') & 0x7
3
>>>
>>> # hash 2
... hash('2')
-1672787438145109052
>>> # binary value
... to_twoscomplement(64, hash('2'))
'1110100011001001000100001011000000010100101000000000001111000100'
>>> # mask for 8 slots
... to_twoscomplement(64, hash('2'))[-3:]
'100'
>>> # slot value
... hash('2') & 0x7
4
# Above hash values predict a dictionary will print the keys in order: '1', '2', '0':
>>> d1 = {'0': 'a', '1': 'b', '2': 'c'}
>>> d1
{'1': 'b', '2': 'c', '0': 'a'}
Note: when two hash values end up the same, the algorithm changes for these "collisions". A probing algorithm is used to find an open slot. The more collisions a dictionary has, the slower the lookups become.
References used in this answer:
Steven Parker
231,236 PointsStandard Python dictionaries do not have an order. A for..in loop will traverse all elements, but not in any specific order.
If you care about the order, you could use an OrderedDict, or just sort the original, like this:
for key in sorted(my_dict):
Now as to why the unspecified order is actually different in successive runs, that is truly a mystery! But it's not sequential, I tried it several times and occasionally got the same order twice in a row. I know the effective order is supposed to result from the dictionary's history of insertions and deletions. But since it's always created with the same components, I can only guess that the internal hashing algorithm uses a randomized, time-based, or uninitialized (not good programming!) value at some point.
Josh Keenan
20,315 PointsTagging the mystical Python genius Chris Freeman to see if he can answer this question.
willyraider
6,550 PointsThank you Steven Parker!
I've testet lists again dictionaries. It seems that dictionaries do not have an incremental order.
- the output of dict_num with the code given below is always different (2,1,0) or (1,0,2)
- the ouput of list_num and _list string is always the same (0,1,2)
dict_num = {'0', '1', '2'}
list_num = [0, 1, 2]
list_string = ['0', '1', '2']
def test():
for x in dict_num:
print(x)
for x in list_num:
print(x)
for x in list_string:
print(x)
test()
Chris Freeman
Treehouse Moderator 68,441 PointsThe line dict_num = {'0', '1', '2'}
defines a set not a dict.
A dict needs key / value pairs: dict_num = {'0': 'a', '1':'b', '2':'c'}
Steven Parker
231,236 PointsSteven Parker
231,236 PointsIt's nice to see that my original answer regarding order has been confirmed, but we still don't know why the order is different in successive runs.
Thanks to the detailed explanation of hashing, we can now say that my original guess that the internal hashing algorithm uses intentionally randomized, time-based, or uninitialized values applies specifically to the "secret keys".
Josh Keenan
20,315 PointsJosh Keenan
20,315 PointsLies, you have it all on recall!
Chris Freeman
Treehouse Moderator 68,441 PointsChris Freeman
Treehouse Moderator 68,441 PointsI would say we do know why the order changes (the use of a process dependent hash function gives the slot number), but we may not know explicitly how (since the secret key part of the equation is buried in the CPython core code. See pyhash.h file for some details.
I haven't see an explicit list of reasons of why the hash() values change between executions. One notion I read is that some web site inputs are stored in dicts. if the hash values were deterministic then a Denial of Service attack may be caused by constructing dict key values that overload a dictionary with many hash value collisions causing poor performance. By adding a non-deterministic element to the hash value calculation it would be difficult to purposely create a dict with a high number of hash value collisions.
In short, the Python philosophy is not to worry about the hash algorithm, but trust it gives sufficiently efficient low collision values and handles collisions with low overhead.