Sets
As mentioned earlier, the defining characteristic of Sets is that all the elements must be unique (based on the implementation of the equals method). If you add a duplicate element to a Set it will replace the existing element that it is equal to.
Set is also an interface: the two main implementation of Set are HashSet and the TreeSet.
Deciding which implementation to use is somewhat simpler than with Lists: if the order of elements in the Set is not important use HashSet, otherwise use TreeSet. Operations such as add, remove, contains and size are typically far faster with HashSet than TreeSet, but HashSets cannot be ordered.
The performance of Sets is conditional on providing a good implementation of the hashKey method, because this provides a critical mechanism for quickly detecting duplicate elements.
For instance, when objects are added to a Set, they are indexed based on the value of their hashKey. All objects with the same hashKey are grouped together in buckets. As mentioned earlier, hashKey will not necessarily produce unique results, so buckets may contain more than one object, but typically the buckets will be very small.
When you add an object to a Set, the Set need to find out if there are any other elements in the Set that are equal to the new object.
One way of implementing this would be to iterate through all the elements in the Set and call equals on them with the new object. A far faster mechanism, however, is to find the hashCode of the object being added, and then locate the bucket of objects for this value (which is extremely quick, because the lookup is based on a numeric index). Once the bucket is found, the new object can be compared for equality with the objects in the bucket.
Because buckets will typically contain 0 or 1 objects, the equals method will probably only be invoked (at most)once per add.
Consider, however, what would happen if you provided the following hashKey implementation:
public int hashCode() {
return 1;
}
This is actually a valid implementation of hashKey: if two objects are equal they will return the same hashKey.
Despite being valid, it would result in very poor performance when adding a new element to a Set, because it would result in the following logic:
• Find the hashKey of the object passed as a parameter (1)
• Find the bucket with this hashKey (the bucket would contain all objects in the Set)
• Loop through these objects and check if they are equal to the object passed in.
As you can see, you have completely negated the performance optimizations associated with hashCodes and indexing.
The following program demonstrates the use of a HashSet:
public static void main(String[] args) {
Set addresses = new HashSet();
Address a1 = new Address(799, "E Dragham",
"Tucson", 85705, "USA");
Address a2 = new Address(200, "Main Street",
"Phoenix", 85123, "USA");
Address a3 = new Address(100, "Main Street",
"Seattle", 98104, "USA");
Address a4 = new Address(8400 , "London Place",
"Washington", 20521, "USA");
Address a5 = new Address(8400 , "London Place",
"Washington", 20521, "USA");
addresses.add(a1);
addresses.add(a2);
addresses.add(a3);
addresses.add(a4);
addresses.add(a5);
System.out.println("Size = " + addresses.size());
System.out.println(addresses);
}
This produces the following output:
Size = 4
[799 E Dragham, Tucson, USA
, 8400 London Place, Washington, USA
, 200 Main Street, Phoenix, USA
, 100 Main Street, Seattle, USA
]
Notice that even though you have added 5 elements to the Set, the Set only contains 4 elements: this is because the final two addresses added to the Set meet the definition of equality defined by Address, and therefore the second instance overwrites the first.
Also notice that the order in which the addresses are printed out does not match the order they were added; HashSet makes no attempts to maintain any order.
An example using TreeSet would look identical; except the output would adhere to the order the elements were added. I will leave you to implement that yourself.