Course Content
Java Data Structures
Java Data Structures
HashMap<>
Now, let's take a closer look at the class implementing the Map
interface — HashMap
.
This class is commonly used when we talk about maps. You've already had a chance to work with this class and study its methods. But let's find out how this class actually works.
First, we need to understand what a hash code is, as it is used internally in a HashMap
.
Hash Code
Every object has its own hash code, which can be obtained using the Object
class's hashCode()
method. Let's look at an example:
main
package com.example; public class Main { public static void main(String[] args) { String example = "Hello World!"; int hash = example.hashCode(); System.out.println("HashCode: " + hash); } }
As you can see, the String object with the data "Hello World!" has a hash code of "-969099747". The method that determines exactly which hash code will be overridden in the StringLatin1 class looks like this:
main
public static int hashCode(byte[] value) { int h = 0; for (byte v : value) { h = 31 * h + (v & 0xff); } return h; }
The String
object is broken down into a byte array using the getBytes()
method, after which each byte is processed and added to the hash code. This way, the hash code becomes maximally unique, aiding the program in identifying this object.
For objects of custom classes, you can also override the hash code by defining a unique way for objects to obtain this number. Let's create a test class User
and override the hash code method for it.
Note
The hash code method belongs to the
Object
class, and all objects inherit from this class. Therefore, we can override this method in each object. This involves low-level programming, and we won't delve into it here.
main
package com.example; class User { String name; int age; public User(String name, int age) { this.name = name; this.age = age; } @Override public int hashCode() { int h = 0; byte[] array = name.getBytes(); for (byte element : array) { h += element * age + (age * 15); } return h; } } public class Main { public static void main(String[] args) { User user1 = new User("Bob", 13); User user2 = new User("Alice", 20); System.out.println("First user's hash code: " + user1.hashCode()); System.out.println("Second user's hash code: " + user2.hashCode()); } }
In the User
class, there are 2 attributes that I used in the hashCode()
method. I chose random operations in the hashCode()
method to obtain a maximally random number, making it unique for each User
object.
This is precisely the hash code. Now, let's find out how it is used in a HashMap
.
HashMap
A HashMap
itself can be visualized as an array of buckets. A bucket is a list where an element is stored. Initially, the HashMap
has 16 buckets, which is the DEFAULT_INITIAL_CAPACITY
.
When we put an element into the HashMap
using the put()
method, the HashMap needs to determine which specific bucket to place this element into. It does this using a simple formula: keyValue.hashCode() & (n - 1)
, where n
is the size of the array of buckets in the HashMap. '&' is a bitwise AND
operation, which is also low-level programming, so we won't delve into the details.
It's important to understand that the HashMap uses the hash code of the key's value to find the appropriate bucket for it. Therefore, it's crucial to create a maximally unique hash code to avoid collisions.
Collision
No matter how unique a hash code is, there may still be a situation where the HashMap computes the same bucket for two elements. In such cases, a collision occurs.
In simple terms, a collision is a situation where two or more elements end up in the same bucket. They are stored there as a SinglyLinkedList
, which you implemented earlier.
Let's take a look at an illustration:
Collision is not favorable as it affects optimization. If we look at this table, we will see that in the best case, the HashMap has constant algorithmic complexity, while in the worst case, it is linear. This linear algorithmic complexity arises from significant collisions. Therefore, programmers advise using keys with hash codes that are as unique as possible to avoid collisions.
Note
If there are a lot of elements in the HashMap, for optimization, a red-black tree is used instead of a
SinglyLinkedList
.
With this data structure, the algorithmic complexity is logarithmic, making data processing much faster than with linear complexity. We won't delve into the topic of trees in this course, as it is a subject for a separate course. However, if you're interested, you can always find information about it online.
How Does HashMap Deal with Collisions?
HashMap also addresses collisions through constant resizing of the bucket array. When the number of elements reaches 75% of the bucket array size, HashMap triggers a resize by doubling the size of the bucket array. Afterward, it redistributes the elements across the new buckets using the new value of n
in the formula.
Thus, HashMap is a highly optimized data structure and is often used as the primary implementation of the Map
interface.
Why Do You Need to Know All This?
This is a low-level programming theory that is important because when working with large datasets, data structures can significantly impact application performance. This happens when these data structures are misused.
Misuse of data structures occurs because programmers don't understand how these data structures work. After all, you want to become a good programmer, right?
Moreover, questions about data structures are often asked in interviews.
If you're interested in delving deeper into how HashMap works under the hood, you can refer to the Java documentation in IntelliJ IDEA:
Note
By the way, in the
equals()
method, the hash code is also used to compare two objects. If the hash codes are equal, it means the objects are considered equal.
1. Which of the following implementations uses a hash table to store key-value pairs?
2. In a HashMap
, what happens if two keys have the same hash code?
Thanks for your feedback!