Hashable Objects in Python Mutable
objects are not hashable. You might ask why. Hash function usually
depend on object state. So if state changes, the hash function
could change, violating the consistency specification. How do you
see if something is hashable? Call the hash
function
on it.
>>> hash("dsfdfasd")
4008608082288030192
>>> hash("dsfdfase")
-2996068020242219814
>>> hash((1,2,3))
529344067295497451
Tuples and strings are hashable, as are integers and booleans. However, lists are not, becasue they re mutable.
>>> hash([1,2,3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Sets
A Little Math There is an entire branch of mathematics addressing set theory .
Here is the "primitive notion" of set theory. If A is a collection of objects, we write x ∈ A to mean x belongs to A. What is important to notice is that an element either belongs to a set or it doesn't. There is no duplicate membership. A set is empty if it has no elements.
Let's begin by making an empty set in Python.
>>> s = set()
>>> s
set()
>>> len(s)
0
This set is devoid of elements. Now we can add some.
>>> s.add("a")
>>> s
{'a'}
>>> s.add(True)
>>> s
{True, 'a'}
Notice that this method is silent about whether or not it added a new element.
>>> print(s.add("b"))
None
>>> s
{True, 'a', 'b'}
If you add an element again, this is ignored.
>>> s.add("b")
>>> s
{True, 'a', 'b'}
So what happens is this.
- We check for memembership
- If the new element is not a member, we add it to the list.
What if the seet has 1000000 elements? This is what is called an O(n) time operation, "at worst proportional to n, the size of our set." When the set is large, this becomes extremely inefficient.
Also to check is some object is in the set, we have to search the entire list. Ugh.
Hashing and Sets Here is why retrieval from a set is fast.
Enter hashing.
When you make a set, Python allocates a big chunk of memory which is basically an internal list. When you add a new item, it is hashed to get a big integer. Then Python mods by the size of the memorhy (say M) to get a number that is nonnegative and less than M. The object is slotted into the list entry at that location The object is slotted into the list entry at that location.
If there is a collision, make a little list at that entry.
To check for membership, Python hashes the item you ar checking, goes to the location and checks the (hopefully short) list there. this elimnates checking the vast majority of elements in the set.
If your set gets crowded, Python will increase its size, then rehash and replace the items in a more spacious list.
This makes sets very efficient.
Set Operations
Make these two sets.
>>> s = {"a", "b", "c", "e", "m"}
>>> t = {"b", "e", "n", "p", "r"}
>>> s
{'c', 'e', 'b', 'm', 'a'}
>>> t
{'p', 'r', 'e', 'b', 'n'}
If A and B are sets, A ∩ B is the set of all elements belonging to both A and B
You can see this in Python a couple of ways.
>>> s & t
{'e', 'b'}
>>> s.intersection(t)
{'e', 'b'}
If A and B are sets, A ∪ B is the set of all elements belonging to at least one of A or B. Here is how Python does it.
>>> s | t
{'c', 'b', 'm', 'a', 'n', 'p', 'r', 'e'}
>>> s.union(t)
{'c', 'b', 'm', 'a', 'n', 'p', 'r', 'e'}
If A and B are sets, A - B is the set of all elements belonging to A but not B. This operation is called relative complement. Here is how Python does it.
>>> s - t
{'m', 'c', 'a'}
>>> s.difference(t)
{'m', 'c', 'a'}
If A and B are sets, A ▵ B is the set of all elements belonging exactly one of B. This operation is called symmetric differencew. Here is how Python does it.
>>> s ^ t
{'p', 'r', 'c', 'n', 'a', 'm'}
>>> s.symmetric_difference(t)
{'p', 'r', 'c', 'n', 'a', 'm'}
Two sets are disjoint if they have no elements in common, i.e. if their intersection is empty. Our sets here fail this test.
>>> s.isdisjoint(t)
False
>>> s & t
{'e', 'b'}
Here is something interesting.
>>> s < t
False
>>> t < s
False
>>> s == t
False
For s <= t
to be true, every element of
s
must belong to t
.
Two set are equal if they have exactly the same elements;
python checks for this with ==
.
For s < t
to be true, every element of
s
must belong to t
and
t
must contain some element not possessed by
s
.
Here is how to ditch elements.
>>> s.discard("e")
>>> s
{'c', 'b', 'm', 'a'}
>>> s.discard("b")
>>> s
{'c', 'm', 'a'}
Now the sets will be disjoint
>>> s.isdisjoint(t)
True
Since sets are mutable, they are NOT hashable.
>>> s.isdisjoint(t)
True
Dictionaries
These are collections of key-value pairs. The keys must be hashsable, but the values do not have to be hashable. They are "filed" by their key in the hash table.
>>> d = {}
>>> d["morrison"] = "computer scince"
>>> d["teague"] = "mathematics"
>>> d["vazquez"] = "mathemtics"
>>>
OOps, we made a mistake on Vazquez.
>>> d["vazquez"] = "mathematics"
>>> d
Note that Vazquez's old value got the boot. Now we go ahead and extract the keys and values as lists.
>>> d.keys()
dict_keys(['morrison', 'teague', 'vazquez'])
>>> list(d.keys())
['morrison', 'teague', 'vazquez']
>>> list(d.values())
['computer scince', 'mathematics', 'mathematics']