HashMap<>

Now, let's take a closer look at the class implementing the Map interface — HashMap.

This class is commonly used when we talk about maps. You've already had a chance to work with this class and study its methods. But let's find out how this class actually works.

First, we need to understand what a hash code is, as it is used internally in a HashMap.

Hash Code

Every object has its own hash code, which can be obtained using the Object class's hashCode() method. Let's look at an example:

main


              123456789
            
package com.example;

public class Main {
    public static void main(String[] args) {
        String example = "Hello World!";
        int hash = example.hashCode();
        System.out.println("HashCode: " + hash);
    }
}

As you can see, the String object with the data "Hello World!" has a hash code of "-969099747". The method that determines exactly which hash code will be overridden in the StringLatin1 class looks like this:

main


              1234567
            
public static int hashCode(byte[] value) {
    int h = 0;
    for (byte v : value) {
        h = 31 * h + (v & 0xff);
    }
    return h;
}

The String object is broken down into a byte array using the getBytes() method, after which each byte is processed and added to the hash code. This way, the hash code becomes maximally unique, aiding the program in identifying this object.

For objects of custom classes, you can also override the hash code by defining a unique way for objects to obtain this number. Let's create a test class User and override the hash code method for it.

Note

The hash code method belongs to the Object class, and all objects inherit from this class. Therefore, we can override this method in each object. This involves low-level programming, and we won't delve into it here.

main


              12345678910111213141516171819202122232425262728293031
            
package com.example;

class User {
    String name;
    int age;

    public User(String name, int age) {
        this.name = name;
        this.age = age;
    }

    @Override
    public int hashCode() {
        int h = 0;
        byte[] array = name.getBytes();
        for (byte element :
                array) {
            h += element * age + (age * 15);
        }
        return h;
    }
}

public class Main {
    public static void main(String[] args) {
        User user1 = new User("Bob", 13);
        User user2 = new User("Alice", 20);
        System.out.println("First user's hash code: " + user1.hashCode());
        System.out.println("Second user's hash code: " + user2.hashCode());
    }
}

In the User class, there are 2 attributes that I used in the hashCode() method. I chose random operations in the hashCode() method to obtain a maximally random number, making it unique for each User object.

This is precisely the hash code. Now, let's find out how it is used in a HashMap.

HashMap

A HashMap itself can be visualized as an array of buckets. A bucket is a list where an element is stored. Initially, the HashMap has 16 buckets, which is the DEFAULT_INITIAL_CAPACITY.

When we put an element into the HashMap using the put() method, the HashMap needs to determine which specific bucket to place this element into. It does this using a simple formula: keyValue.hashCode() & (n - 1), where n is the size of the array of buckets in the HashMap. '&' is a bitwise AND operation, which is also low-level programming, so we won't delve into the details.

It's important to understand that the HashMap uses the hash code of the key's value to find the appropriate bucket for it. Therefore, it's crucial to create a maximally unique hash code to avoid collisions.

Collision

No matter how unique a hash code is, there may still be a situation where the HashMap computes the same bucket for two elements. In such cases, a collision occurs. In simple terms, a collision is a situation where two or more elements end up in the same bucket. They are stored there as a SinglyLinkedList, which you implemented earlier.

Let's take a look at an illustration:

Collision is not favorable as it affects optimization. If we look at this table, we will see that in the best case, the HashMap has constant algorithmic complexity, while in the worst case, it is linear. This linear algorithmic complexity arises from significant collisions. Therefore, programmers advise using keys with hash codes that are as unique as possible to avoid collisions.

Note

If there are a lot of elements in the HashMap, for optimization, a red-black tree is used instead of a SinglyLinkedList.
With this data structure, the algorithmic complexity is logarithmic, making data processing much faster than with linear complexity. We won't delve into the topic of trees in this course, as it is a subject for a separate course. However, if you're interested, you can always find information about it online.

How Does HashMap Deal with Collisions?

HashMap also addresses collisions through constant resizing of the bucket array. When the number of elements reaches 75% of the bucket array size, HashMap triggers a resize by doubling the size of the bucket array. Afterward, it redistributes the elements across the new buckets using the new value of n in the formula.

Thus, HashMap is a highly optimized data structure and is often used as the primary implementation of the Map interface.

Why Do You Need to Know All This?

This is a low-level programming theory that is important because when working with large datasets, data structures can significantly impact application performance. This happens when these data structures are misused.

Misuse of data structures occurs because programmers don't understand how these data structures work. After all, you want to become a good programmer, right?

Moreover, questions about data structures are often asked in interviews.

If you're interested in delving deeper into how HashMap works under the hood, you can refer to the Java documentation in IntelliJ IDEA:

Note

By the way, in the equals() method, the hash code is also used to compare two objects. If the hash codes are equal, it means the objects are considered equal.

1. Which of the following implementations uses a hash table to store key-value pairs?

2. In a `HashMap`, what happens if two keys have the same hash code?

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 2