30. Substring with Concatenation of All Words,寻找包含所有word的子串

Last Updated: 2024-01-05 13:12:46 Friday

-- TOC --

题目分析

You are given a string s and an array of strings words. All the strings of words are of the same length.

输入s是一个string,另一个输入是所有元素长度一样的string words array,即这个array中所有string长度相同。(这个长度相同条件,降低了此题的难度)

A concatenated substring in s is a substring that contains all the strings of any permutation of words concatenated.

所谓的concatenated substring,就是包含所有string words的子串,不管这些string words在子串中的排列如何。子串包含所有string words,即string words要用完。

For example, if words = ["ab","cd","ef"], then "abcdef", "abefcd", "cdabef", "cdefab", "efabcd", and "efcdab" are all concatenated strings. "acdbef" is not a concatenated substring because it is not the concatenation of any permutation of words.

Return the starting indices of all the concatenated substrings in s. You can return the answer in any order.

返回所有concatenated substring的起始index。

Example 1:
Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation: Since words.length == 2 and words[i].length == 3, the concatenated substring has to be of length 6.
The substring starting at 0 is "barfoo". It is the concatenation of ["bar","foo"] which is a permutation of words.
The substring starting at 9 is "foobar". It is the concatenation of ["foo","bar"] which is a permutation of words.
The output order does not matter. Returning [9,0] is fine too.

Example 2:
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation: Since words.length == 4 and words[i].length == 4, the concatenated substring has to be of length 16.
There is no substring of length 16 in s that is equal to the concatenation of any permutation of words.
We return an empty array.

Example 3:
Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]
Explanation: Since words.length == 3 and words[i].length == 3, the concatenated substring has to be of length 9.
The substring starting at 6 is "foobarthe". It is the concatenation of ["foo","bar","the"] which is a permutation of words.
The substring starting at 9 is "barthefoo". It is the concatenation of ["bar","the","foo"] which is a permutation of words.
The substring starting at 12 is "thefoobar". It is the concatenation of ["the","foo","bar"] which is a permutation of words.

Constraints:

题目示例没有说清楚的一个重要细节:array中的string word可以重复,重复的次数,就是在concatenated string中出现的次数。

Hash和Sliding Window

class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        wlen = len(words[0])
        wnum = len(words)
        words_need = Counter(words)
        r = [-1]
        wordslst = {}
        size = wnum * wlen
        lo = 0
        hi = size
        while hi <= len(s):
            wlst = []
            words_occured = Counter()
            pos = lo
            if (last:=r[-1]) >= 0:
                if (lo-last)%wlen==0 and (lo-last)<size:
                    # pos = lo + (size-(lo-last))
                    pos = size + last
                    wlst = wordslst[last][(lo-last)//wlen:]
                    words_occured = Counter(wlst)
            while pos < hi:
                ss = s[pos:pos+wlen]
                if ss in words_need:
                    words_occured[ss] += 1  # Counter, not dict
                    if words_occured[ss] > words_need[ss]:
                        break
                    wlst.append(ss)
                    pos += wlen
                else:
                    break
            else:
                wordslst[lo] = wlst
                r.append(lo)
            lo += 1
            hi = lo + size
        return r[1:]

这是我自己能够写出来的最快的版本了,结合了DP不做重复计算的思想,Hash快速查找,以及Sliding Windows技巧。用Counter(在collections模块中)代替dict,代码看起来精炼一点。

更聪明的Sliding Window

insight,insight,insight......更高级的算法,来自于你的insight!!

看到自己攒的算法,无论如何都无法跑入1秒的范围,也就60%+的成绩,忍不住看看了排在最前面的代码,果然有干货!还是Sliding Window,还是Hash,还是尽量不做重复计算,但是有insight加持,算法快了至少一个数量级。

Python

class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        wlen = len(words[0])
        wneeds = Counter(words)
        sssize = wlen * len(words)
        endsize = max(0, len(s)-sssize+1)
        woccur = Counter()
        wqueue = deque()
        r = []
        for i in range(min(wlen,endsize)):
            lo = hi = i
            woccur.clear()
            wqueue.clear()
            while lo < endsize:
                ss = s[hi:hi+wlen]
                if ss in wneeds:
                    wqueue.append(ss)
                    woccur[ss] += 1
                    hi += wlen
                    while woccur[ss] > wneeds[ss]:
                        woccur[wqueue.popleft()] -= 1
                        lo += wlen
                    if hi-lo == sssize:
                        r.append(lo)
                        woccur[wqueue.popleft()] -= 1
                        lo += wlen
                else:
                    lo = hi = hi + wlen
                    woccur.clear()
                    wqueue.clear()
        return r

C++

class Solution {
public:
    vector<int> findSubstring(string s, vector<string>& words) {
        size_t wlen { words[0].size() };
        size_t sslen { wlen*words.size() };
        size_t endsize { max((size_t)0,s.size()-sslen+1) };
        unordered_map<string,size_t> wneeds;
        for(auto &w: words)
            wneeds[w] += 1;
        unordered_map<string,size_t> wcount;
        list<string> wque;
        size_t lo, hi;
        vector<int> r;
        for(size_t i{}; i<min(wlen,endsize); ++i){
            lo = hi = i;
            wcount.clear();
            wque.clear();
            while(lo < endsize){
                string ss { s.substr(hi,wlen) };
                hi += wlen;
                if(wneeds.contains(ss)){
                    wcount[ss] += 1;
                    wque.push_back(ss);
                    while(wcount[ss] > wneeds[ss]){
                        wcount[wque.front()] -= 1;
                        wque.pop_front();
                        lo += wlen;
                    }
                    if((hi-lo) == sslen){
                        r.push_back(static_cast<int>(lo));
                        wcount[wque.front()] -= 1;
                        wque.pop_front();
                        lo += wlen;
                    }
                } else {
                    lo = hi;
                    wcount.clear();
                    wque.clear();
                }
            }
        }
        return r;
    }
};

C

本文链接:https://cs.pynote.net/ag/leetcode/202312211/

-- EOF --

-- MORE --