30. Substring with Concatenation of All Words，寻找包含所有word的子串

输入s是一个string，另一个输入是所有元素长度一样的string words array，即这个array中所有string长度相同。（这个长度相同条件，降低了此题的难度）

所谓的concatenated substring，就是包含所有string words的子串，不管这些string words在子串中的排列如何。子串包含所有string words，即string words要用完。

Example 1:
Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation: Since words.length == 2 and words[i].length == 3, the concatenated substring has to be of length 6.
The substring starting at 0 is "barfoo". It is the concatenation of ["bar","foo"] which is a permutation of words.
The substring starting at 9 is "foobar". It is the concatenation of ["foo","bar"] which is a permutation of words.
The output order does not matter. Returning [9,0] is fine too.

Example 2:
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation: Since words.length == 4 and words[i].length == 4, the concatenated substring has to be of length 16.
There is no substring of length 16 in s that is equal to the concatenation of any permutation of words.
We return an empty array.

Example 3:
Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]
Explanation: Since words.length == 3 and words[i].length == 3, the concatenated substring has to be of length 9.
The substring starting at 6 is "foobarthe". It is the concatenation of ["foo","bar","the"] which is a permutation of words.
The substring starting at 9 is "barthefoo". It is the concatenation of ["bar","the","foo"] which is a permutation of words.
The substring starting at 12 is "thefoobar". It is the concatenation of ["the","foo","bar"] which is a permutation of words.

题目示例没有说清楚的一个重要细节：array中的string word可以重复，重复的次数，就是在concatenated string中出现的次数。

Hash和Sliding Window

class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        wlen = len(words[0])
        wnum = len(words)
        words_need = Counter(words)
        r = [-1]
        wordslst = {}
        size = wnum * wlen
        lo = 0
        hi = size
        while hi <= len(s):
            wlst = []
            words_occured = Counter()
            pos = lo
            if (last:=r[-1]) >= 0:
                if (lo-last)%wlen==0 and (lo-last)<size:
                    # pos = lo + (size-(lo-last))
                    pos = size + last
                    wlst = wordslst[last][(lo-last)//wlen:]
                    words_occured = Counter(wlst)
            while pos < hi:
                ss = s[pos:pos+wlen]
                if ss in words_need:
                    words_occured[ss] += 1  # Counter, not dict
                    if words_occured[ss] > words_need[ss]:
                        break
                    wlst.append(ss)
                    pos += wlen
                else:
                    break
            else:
                wordslst[lo] = wlst
                r.append(lo)
            lo += 1
            hi = lo + size
        return r[1:]

这是我自己能够写出来的最快的版本了，结合了DP不做重复计算的思想，Hash快速查找，以及Sliding Windows技巧。用Counter（在collections模块中）代替dict，代码看起来精炼一点。

更聪明的Sliding Window

看到自己攒的算法，无论如何都无法跑入1秒的范围，也就60%+的成绩，忍不住看看了排在最前面的代码，果然有干货！还是Sliding Window，还是Hash，还是尽量不做重复计算，但是有insight加持，算法快了至少一个数量级。

Python

class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        wlen = len(words[0])
        wneeds = Counter(words)
        sssize = wlen * len(words)
        endsize = max(0, len(s)-sssize+1)
        woccur = Counter()
        wqueue = deque()
        r = []
        for i in range(min(wlen,endsize)):
            lo = hi = i
            woccur.clear()
            wqueue.clear()
            while lo < endsize:
                ss = s[hi:hi+wlen]
                if ss in wneeds:
                    wqueue.append(ss)
                    woccur[ss] += 1
                    hi += wlen
                    while woccur[ss] > wneeds[ss]:
                        woccur[wqueue.popleft()] -= 1
                        lo += wlen
                    if hi-lo == sssize:
                        r.append(lo)
                        woccur[wqueue.popleft()] -= 1
                        lo += wlen
                else:
                    lo = hi = hi + wlen
                    woccur.clear()
                    wqueue.clear()
        return r

C++

class Solution {
public:
    vector<int> findSubstring(string s, vector<string>& words) {
        size_t wlen { words[0].size() };
        size_t sslen { wlen*words.size() };
        size_t endsize { max((size_t)0,s.size()-sslen+1) };
        unordered_map<string,size_t> wneeds;
        for(auto &w: words)
            wneeds[w] += 1;
        unordered_map<string,size_t> wcount;
        list<string> wque;
        size_t lo, hi;
        vector<int> r;
        for(size_t i{}; i<min(wlen,endsize); ++i){
            lo = hi = i;
            wcount.clear();
            wque.clear();
            while(lo < endsize){
                string ss { s.substr(hi,wlen) };
                hi += wlen;
                if(wneeds.contains(ss)){
                    wcount[ss] += 1;
                    wque.push_back(ss);
                    while(wcount[ss] > wneeds[ss]){
                        wcount[wque.front()] -= 1;
                        wque.pop_front();
                        lo += wlen;
                    }
                    if((hi-lo) == sslen){
                        r.push_back(static_cast<int>(lo));
                        wcount[wque.front()] -= 1;
                        wque.pop_front();
                        lo += wlen;
                    }
                } else {
                    lo = hi;
                    wcount.clear();
                    wque.clear();
                }
            }
        }
        return r;
    }
};

30. Substring with Concatenation of All Words，寻找包含所有word的子串

题目分析

Hash和Sliding Window

更聪明的Sliding Window

Python

C++

C