详解Trie结构和实现

Last Updated: 2024-01-24 13:05:15 Wednesday

-- TOC --

Trie这个名称来自retrieval这个单词,读作tree,或try。它是一种用来搜索的前缀树,也被称为字典树。这种结构很适合在一组字符串的集合中,搜索某个字符串是否存在,或提取所有前缀相同的字符串。由于其特有的存储结构,可以显著减少比较次数,因而性能较高。

A Trie, also known as a prefix tree, is a type of search tree used for storing a collection of strings. It has a number of uses such as building autocomplete functionalities, and is especially efficient for prefix-based searches.

Trie这种数据结构于1960年由Edward Fredkin和著名的Donald Knuth提出。

The original paper on the Trie data structure was titled "A File Maintenance System" and was written by Edward Fredkin and Donald Knuth in 1960. The paper introduces the Trie data structure as a way to efficiently store and retrieve information from a large collection of records.

Trie结构

举个例子,比如我们要存储an,ape,car,carry和the这5个字符串,插入Trie结构后,得到:

trie

Trie结构主要有三个操作:insert,remove,search和startswith。

Trie supports three main operations: insert, search, and startswith.

insert: To insert a word into the Trie, start from the root and traverse down the path that corresponds to the word. If the path doesn't exist, create a new node. If the path already exists, move down to the next level. Repeat until you've processed each character in the word. At the end of the word, mark the final node as the end of a word. \(O(N)\)

search: To find a word (or a prefix), start from the root and traverse down the path corresponding to the word. If at any point the path doesn't exist, return false. If you finish traversing the word and end at a node that's marked as the end of a word, return true. Otherwise, return false. \(O(N)\)

startswith (retrieval): Retrieve all words with the same prefix.

Trie的空间复杂度就是节点数的总和,\(O(N\cdot M)\)。

The space complexity of a Trie can be high. In the worst-case scenario, if no paths are shared by any words, the space required is \(O(N\cdot M)\), where N is the number of words and M is the length of the longest word. This is because each word would require a new path from the root to the leaf. However, in the average case where there are some common prefixes, the space complexity will be less because paths are shared between words.

虽然现在内存便宜,多用点空间换取性能是常规操作,但是(在保持性能时)节省空间依然有一些潜在的好处:(1)增加CPU Cache命中率;(2)减少Page Swap。

Compressed Trie结构

后缀树(Suffix Tree)

Trie实现

完整的实现和严格的测试:https://github.com/xinlin-z/trie

本文链接:https://cs.pynote.net/ag/tree/202308234/

-- EOF --

-- MORE --