动态哈希表：从0到1解析C语言动态数组+链表冲突解决方案

引言

哈希表（Hash Map）是一种高效的数据结构，通过哈希函数将键映射到数组索引，实现O(1)时间复杂度的插入、查询和删除操作。但传统静态哈希表（固定容量数组）存在空间浪费（数据少时数组空置）和冲突频发（数据多时哈希碰撞概率激增）的痛点。本文将基于C语言，手把手实现一个动态哈希表，通过「动态数组+链表」的组合方案解决这些问题，并深入解析核心设计与实现细节。

一、设计背景：为什么选择「动态数组+链表」？

1.1 传统静态哈希表的痛点

传统哈希表通常使用固定大小的数组存储键值对，通过哈希函数计算索引。但这种设计存在两大缺陷：

空间浪费：若初始容量过大，数据稀疏时大量数组空间闲置；若初始容量过小，数据增多时频繁扩容（需重新哈希所有数据），效率低下。
冲突频发：当数据量超过数组容量时，哈希碰撞概率激增，链表法（拉链法）虽能解决冲突，但链表过长会导致查询时间退化为O(n)。

1.2 动态数组+链表的组合优势

动态哈希表通过「动态数组」和「链表」的组合，完美解决了上述问题：

动态扩容：当负载因子（键值对数量/桶数量）超过阈值（如0.75）时，自动扩容（通常翻倍），保持哈希分布均匀，减少冲突。
链表处理冲突：每个桶对应一个链表，冲突的键值对存储在同一链表中，插入、查询时遍历链表即可，无需移动其他数据。
空间高效：桶的数量随数据量动态调整，避免固定数组的空间浪费。

二、核心结构体解析：DynamicHashMap与KeyValueNode

2.1 KeyValueNode：链表节点

typedef struct node_s {
    KeyType key;          // 键（字符串指针）
    ValueType val;        // 值（字符串指针）
    struct node_s *next;  // 指向下一个节点的指针（解决冲突）
} KeyValueNode;

key：存储键的字符串指针（如"age"）。
val：存储值的字符串指针（如"25"）。
next：链表指针，用于连接同一桶中冲突的其他键值对。

2.2 DynamicHashMap：哈希表主体

typedef struct {
    KeyValueNode **buckets;  // 动态数组（存储链表头节点）
    int size;                // 当前键值对数量
    int capacity;            // 桶数组的当前容量（桶的数量）
    uint32_t hash_seed;      // 哈希种子（用于生成随机哈希值，防碰撞）
} DynamicHashMap;

buckets：动态数组，每个元素是一个链表头节点（KeyValueNode*）。桶的数量由capacity决定。
size：当前存储的键值对总数，用于计算负载因子（size/capacity）。
capacity：桶数组的容量（即最多有多少个桶）。初始通常设为16，扩容时翻倍。
hash_seed：哈希函数的种子，通过time(NULL)随机生成，避免相同输入生成相同哈希值（防碰撞攻击）。

三、关键函数实现逻辑：从创建到删除

3.1 hashmap_create()：初始化动态哈希表

DynamicHashMap *hashmap_create() {
    DynamicHashMap *map = (DynamicHashMap *)calloc(1, sizeof(DynamicHashMap));
    if (map == NULL) { perror("map create"); exit(EXIT_FAILURE); }

    map->capacity = 16;         // 初始容量（固定为16）
    map->size = 0;              // 初始无键值对
    map->hash_seed = (uint32_t)time(NULL); // 随机哈希种子

    // 初始化桶数组（动态分配内存，每个元素是链表头节点）
    map->buckets = (KeyValueNode **)calloc(map->capacity, sizeof(KeyValueNode *));
    if (!map->buckets) { perror("map->buckets creat"); free(map); exit(EXIT_FAILURE); }

    return map;
}

关键细节：

内存分配：使用calloc初始化DynamicHashMap和buckets数组，确保内存清零，避免野指针。
初始容量：代码中固定为16，实际项目中可根据需求调整。
哈希种子：通过time(NULL)生成随机种子，确保不同运行实例的哈希值分布不同，减少碰撞。

3.2 hashmap_put()：插入键值对（含扩容逻辑）

ValueType hashmap_put(DynamicHashMap *map, KeyType key, ValueType val) {
    if (!map || !key || !val) return NULL; // 参数校验

    int index = hash_function(map, key); // 计算哈希值对应的桶索引
    KeyValueNode *current = map->buckets[index]; // 获取桶的链表头

    // 1. 查找是否已存在相同键
    while (current) {
        if (strcmp(current->key, key) == 0) {
            // 键已存在：更新值（释放旧值，存储新值副本）
            free(current->val);
            current->val = strdupp(val); // strdupp是新实现的字符串复制函数（防内存泄漏）
            return current->val;
        }
        current = current->next; // 遍历链表
    }

    // 2. 键不存在：新建节点并插入链表头部
    KeyValueNode *new_node = (KeyValueNode *)calloc(1, sizeof(KeyValueNode));
    if (!new_node) return NULL; // 内存分配失败

    new_node->key = strdupp(key);   // 复制键（防外部修改）
    new_node->val = strdupp(val);   // 复制值（防外部修改）
    new_node->next = map->buckets[index]; // 新节点插入链表头部
    map->buckets[index] = new_node;       // 更新链表头
    map->size++;                        // 键值对数量+1

    // 3. 检查负载因子，触发扩容（负载因子>0.75时扩容）
    float load_factor = (float)map->size / map->capacity;
    if (load_factor > 0.75) {
        if (!hashmap_resize(map)) {       // 扩容失败（如内存不足）
            hashmap_remove(map, key);     // 回滚：删除刚插入的节点
            return NULL;
        }
    }

    return new_node->val;
}

关键逻辑：

哈希冲突处理：通过链表存储同一桶中的冲突键值对，插入时遍历链表查找是否存在相同键。
动态扩容：当负载因子超过0.75时，调用hashmap_resize将桶数量翻倍，重新哈希所有现有键值对，确保哈希分布均匀。
内存管理：使用strdupp复制键和值（避免外部修改影响哈希表内部数据），插入失败时回滚删除节点，防止内存泄漏。

3.3 hashmap_get()：查询键值对

ValueType hashmap_get(DynamicHashMap *map, KeyType key) {
    if (!map || !key) { printf("参数错误"); return NULL; }

    int index = hash_function(map, key); // 计算桶索引
    KeyValueNode *current = map->buckets[index]; // 获取链表头

    // 遍历链表查找键
    while (current) {
        if (strcmp(current->key, key) == 0) {
            return current->val; // 找到键，返回对应值
        }
        current = current->next; // 未找到，继续遍历
    }

    printf("键未找到"); // 遍历完链表未找到
    return NULL;
}

关键细节：

哈希定位：通过hash_function计算键的哈希值，取模得到桶索引（index = hash % capacity）。
链表遍历：从桶的链表头开始遍历，逐个比较键，找到后返回对应值；遍历完链表未找到则返回NULL。

3.4 hashmap_remove()：删除键值对

bool hashmap_remove(DynamicHashMap *map, KeyType key) {
    if (!map || !key) { printf("参数错误"); return false; }

    int index = hash_function(map, key); // 计算桶索引
    KeyValueNode *prev = NULL;           // 记录前一个节点（用于修复链表）
    KeyValueNode *current = map->buckets[index]; // 当前节点

    // 遍历链表查找键
    while (current) {
        if (strcmp(current->key, key) == 0) {
            // 找到键：删除节点
            if (prev) {
                prev->next = current->next; // 前一个节点指向当前节点的下一个节点
            } else {
                map->buckets[index] = current->next; // 头节点删除（更新链表头）
            }

            // 释放内存
            free(current->key);
            free(current->val);
            free(current);
            map->size--; // 键值对数量-1
            return true;
        }
        prev = current;   // 记录前一个节点
        current = current->next; // 移动到下一个节点
    }

    return false; // 未找到键
}

关键细节：

链表指针修复：删除节点时，若节点是链表头（prev为NULL），则直接更新桶的链表头为current->next；否则，将前一个节点的next指向当前节点的下一个节点，确保链表不断裂。
内存释放：删除节点后，必须释放其key、val和节点本身的内存，防止内存泄漏。

四、性能优化点：当前瓶颈与改进方向

4.1 当前实现的潜在瓶颈

链表过长导致查询变慢：当负载因子过高（如接近1）时，链表长度增加，查询时间退化为O(n)（最坏情况遍历整个链表）。
哈希函数分布不均：若哈希函数设计不佳（如种子固定），可能导致键值对集中在少数桶中，加剧链表冲突。
扩容开销：扩容时需要重新哈希所有现有键值对（时间复杂度O(n)），频繁扩容会影响性能。

4.2 优化方向

动态调整扩容阈值：根据实际场景调整负载因子阈值（如0.75→0.8），平衡空间与时间效率。
优化冲突结构：当链表长度超过阈值（如8）时，将链表转换为红黑树（时间复杂度O(log n)），提升长链表的查询效率（Java 8的HashMap即采用此策略）。
哈希函数优化：使用更复杂的哈希函数（如MurmurHash），减少哈希碰撞概率，确保键值对均匀分布在各个桶中。
惰性删除：对于删除操作，标记节点为“已删除”而非立即释放内存，减少频繁内存分配/释放的开销（适用于高频删除场景）。

五、使用示例：插入、查询、删除操作

5.1 测试代码说明

以下是用户提供的测试代码，演示了哈希表的完整生命周期：

int main() {
    DynamicHashMap *map = hashmap_create(); // 创建哈希表
    if (!map) { /* 错误处理 */ }

    // 插入键值对
    hashmap_put(map, "name", "Alice");    // 插入成功
    hashmap_put(map, "age", "25");        // 插入成功

    // 更新键值对
    hashmap_put(map, "name", "Bob");      // 更新成功（原"Alice"被替换为"Bob"）

    // 查询键值对
    ValueType name = hashmap_get(map, "name"); // 返回"Bob"
    ValueType age = hashmap_get(map, "age");   // 返回"25"

    // 删除键值对
    hashmap_remove(map, "age");           // 删除成功
    hashmap_get(map, "age");              // 返回NULL（键已删除）

    hashmap_destroy(map);                 // 销毁哈希表（释放所有内存）
    return 0;
}

5.2 注意事项

内存管理：插入时strdupp会复制键和值的内存，删除时需手动释放（哈希表内部已处理），避免外部重复释放。
哈希种子选择：hash_seed使用time(NULL)随机生成，确保不同运行实例的哈希分布不同，防止恶意构造碰撞攻击。
负载因子监控：插入时检查负载因子，触发扩容后需确保新桶数组的内存分配成功（否则回滚删除新插入的节点）。

总结

本文手把手实现了基于「动态数组+链表」的C语言哈希表，详细解析了核心结构体设计、关键函数逻辑及性能优化方向。动态哈希表通过动态扩容和链表冲突解决，兼顾了空间效率与时间效率，是处理动态数据存储的理想选择。实际应用中，可根据需求进一步优化（如红黑树替代长链表），以应对更复杂的场景。

#ifndef DYNAMIC_HASHMAP_H
#define DYNAMIC_HASHMAP_H

#include <stdint.h>
#include <stdbool.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>

typedef char *KeyType;
typedef char *ValueType;

typedef struct node_s {
    KeyType key;          // 键
    ValueType val;        // 值
    struct node_s *next;  // 链表指针
} KeyValueNode;

typedef struct {
    KeyValueNode **buckets;  // 动态数组（哈希桶）
    int size;                // 键值对数量
    int capacity;            // 桶数组容量
    uint32_t hash_seed;      // 哈希种子（防碰撞）
} DynamicHashMap;

// 创建一个固定容量的哈希表
DynamicHashMap *hashmap_create();
// 销毁一个哈希表
void hashmap_destroy(DynamicHashMap *map);
// 插入一个键值对
ValueType hashmap_put(DynamicHashMap *map, KeyType key, ValueType val);
// 查询一个键值对
ValueType hashmap_get(DynamicHashMap *map, KeyType key);
// 删除某个键值对
bool hashmap_remove(DynamicHashMap *map, KeyType key);

#endif // DYNAMIC_HASHMAP_H

#include <stdio.h>
#include "hash.h"

int main() {
    // 创建哈希表
    DynamicHashMap *map = hashmap_create();
    if (!map) {
        fprintf(stderr, "Failed to create hashmap.\n");
        return 1;
    }

    // 测试插入操作
    printf("Testing hashmap_put...\n");
    const char *key1 = "name";
    const char *val1 = "Alice";
    ValueType result1 = hashmap_put(map, (KeyType)key1, (ValueType)val1);
    if (result1) {
        printf("Inserted key: %s, value: %s\n", key1, result1);
    }
    else {
        printf("Failed to insert key: %s\n", key1);
    }

    const char *key2 = "age";
    const char *val2 = "25";
    ValueType result2 = hashmap_put(map, (KeyType)key2, (ValueType)val2);
    if (result2) {
        printf("Inserted key: %s, value: %s\n", key2, result2);
    }
    else {
        printf("Failed to insert key: %s\n", key2);
    }

    // 测试更新操作
    printf("\nTesting update...\n");
    const char *new_val1 = "Bob";
    ValueType update_result = hashmap_put(map, (KeyType)key1, (ValueType)new_val1);
    if (update_result) {
        printf("Updated key: %s, new value: %s\n", key1, update_result);
    }
    else {
        printf("Failed to update key: %s\n", key1);
    }

    // 测试查询操作
    printf("\nTesting hashmap_get...\n");
    ValueType get_result1 = hashmap_get(map, (KeyType)key1);
    if (get_result1) {
        printf("Got key: %s, value: %s\n", key1, get_result1);
    }
    else {
        printf("Key %s not found.\n", key1);
    }

    ValueType get_result2 = hashmap_get(map, (KeyType)key2);
    if (get_result2) {
        printf("Got key: %s, value: %s\n", key2, get_result2);
    }
    else {
        printf("Key %s not found.\n", key2);
    }

    // 测试删除操作
    printf("\nTesting hashmap_remove...\n");
    bool remove_result = hashmap_remove(map, (KeyType)key2);
    if (remove_result) {
        printf("Removed key: %s successfully.\n", key2);
    }
    else {
        printf("Failed to remove key: %s\n", key2);
    }

    // 再次查询已删除的键
    ValueType get_after_remove = hashmap_get(map, (KeyType)key2);
    if (get_after_remove) {
        printf("Got key: %s, value: %s\n", key2, get_after_remove);
    }
    else {
        printf("Key %s not found after removal (expected).\n", key2);
    }

    // 销毁哈希表
    printf("\nDestroying hashmap...\n");
    hashmap_destroy(map);
    printf("Hashmap destroyed.\n");

    return 0;
}

#include"hash.h"
#include <time.h>
#include <stdio.h>
#include <string.h>

// 辅助哈希函数（BKDRHash变种）
static uint32_t hash_function(DynamicHashMap *map, const char *key) {
    uint32_t hash = 0;
    while (*key) {
        hash = hash * map->hash_seed + (uint8_t)(*key++);
    }
    return hash % map->capacity;
}
// 创建一个固定容量的哈希表
DynamicHashMap *hashmap_create() {
    DynamicHashMap *map = (DynamicHashMap *)calloc(1, sizeof(DynamicHashMap));
    if (map == NULL) {
        perror("map create");
        exit(EXIT_FAILURE);
    }
    // 初始化参数
    map->capacity = 16;         // 初始容量
    map->size = 0;              // 初始键值对数量
    uint32_t hash_seed = (uint32_t)time(NULL); //种子值随机，减少哈希冲突
    map->buckets = (KeyValueNode **)calloc(map->capacity, sizeof(KeyValueNode *)); //给二重数组设置初始化桶
    if (!map->buckets) {
        perror("map->buckets creat");
        free(map);
        exit(EXIT_FAILURE);
    }
    return map;
}
// 销毁一个哈希表
void hashmap_destroy(DynamicHashMap *map) {
    if (map == NULL) {
        return;
    }
    
    for(int i=0;i<map->capacity;i++) {
        KeyValueNode *p = map->buckets[i];
        while (p != NULL) {
            KeyValueNode *next = p->next;
            free(p->key);
            free(p->val);
            free(p);
            p = next;
        }
    }
    free(map->buckets);  // 释放桶数组
    free(map);
}

// 扩容函数，更新桶容量，所以一个是capacity更新，一个是buckets更新
static bool hashmap_resize(DynamicHashMap *map) {
    KeyValueNode **new_buckets = (KeyValueNode **)calloc((2 * map->capacity), sizeof(KeyValueNode *));
    if (!new_buckets) {
        perror("new_buckets create");
        return false;
    }
    //保存旧数据
    KeyValueNode **old_buckets = map->buckets;
    int old_capacity = map->capacity;

    //更新哈希表参数
    map->capacity *= 2;
    map->buckets = new_buckets;
    //更新节点
    for (int i = 0; i < old_capacity; i++) {
        KeyValueNode *current = old_buckets[i];
        while (current) {
            KeyValueNode *next = current->next;
            int index= hash_function(map, current->key);

            current->next = map->buckets[index];
            map->buckets[index] = current;  //此处看做指针可能更好理解
            current = next;
        }
    }
    free(old_buckets);  // 释放旧桶数组
    return true;
}
// 复制字符串 strdup被禁用
static ValueType strdupp(const ValueType val) {
    size_t len = strlen(val) + 1;
    ValueType p = (ValueType)malloc(len);
    if (!p) {
        return NULL;
    }
    if (strcpy_s(p, len, val) != 0) {
        free(p);
        return NULL;
    }
    return p;
}
// 插入一个键值对
ValueType hashmap_put(DynamicHashMap *map, KeyType key, ValueType val) {
    if (!map || !key || !val) return NULL; //保证输出没问题
    int index = hash_function(map,key);
    //查找现有键值，没有就新建，有就顺序插入
    KeyValueNode *current = map->buckets[index];
    while (current) {
        if (strcmp(current->key, key) == 0) {
            // 找到重复键，更新值
            free(current->val);       // 释放旧值
            current->val = strdupp(val); // 存储新值副本,此处需要新函数，让旧值更新
            if (!current->val) return NULL; // 内存分配失败处理
            return current->val;
        }
        current = current->next;
    }
    //没有这个值，那就新建
    KeyValueNode *new_node = (KeyValueNode *)calloc(1,sizeof(KeyValueNode));
    if (!new_node) return NULL;
    new_node->key = strdupp(key);
    new_node->val = strdupp(val);
    if (!new_node->key || !new_node->val) { // 内存分配失败处理
        free(new_node->key);
        free(new_node->val);
        free(new_node);
        return NULL;
    }
    // 插入链表头部
    new_node->next = map->buckets[index];
    map->buckets[index] = new_node;
    map->size++;

    //检查负载因子
    float load_factor = (float)map->size / map->capacity;
    if (load_factor > 0.75) {
        if (!hashmap_resize(map)) {
            // 扩容失败时删除刚插入的节点（可选）
            hashmap_remove(map, key);
            return NULL;
        }
    }
    return new_node->val;
}
// 查询一个键值对，此处跟正常数组一样了
ValueType hashmap_get(DynamicHashMap *map, KeyType key) {
    if (!map || !key) {
        printf("没有这个值");
        return false;
    }
    int index = hash_function(map, key);
    KeyValueNode *current = map->buckets[index];
    while (current) {
        if (strcmp(current->key, key) == 0) {
            return current->val;
        }
        current=current->next;
    }
    printf("没有这个值");
    return NULL;
}
// 删除某个键值对
bool hashmap_remove(DynamicHashMap *map, KeyType key) {
    if (!map || !key) {
        printf("没有这个值");
        return false;
    }
    int index = hash_function(map, key);
    KeyValueNode *prev = NULL;
    KeyValueNode *current = map->buckets[index];
    while (current) {
        if (strcmp(current->key, key) == 0) {
            if (prev) {
                prev->next = current->next;
            }
            else {
                map->buckets[index] = current->next;  // 头节点删除
            }
            // 释放内存
            free(current->key);
            free(current->val);
            free(current);
            map->size--;
            return true;
        }
        prev = current;
        current = current->next;
    }
    return false;  // 未找到
}