5. Longest Palindromic Substring，最长回文子串

Last Updated: 2024-01-10 13:18:26 Wednesday

-- TOC --

题目分析
Expanding Window
- Python
- C++
- C
Dynamic Programming
- Python
- C++
- C
这应该不算是DP吧...
Longest Common Substring

这道题不适合用DP的思路来解决...

题目分析

A string is palindromic（回文） if it reads the same forward and backward. 本题要求寻找最长子串，不是返回最长的长度。

Example 1:
Input: s = "babad"
Output: "bab"
Explanation: "aba" is also a valid answer.

Example 2:
Input: s = "cbbd"
Output: "bb"

s = "1234543abcba3454321"
output is the whole s

s = "123454abcba"
output is "abcba"

Constraints:

1 <= s.length <= 1000 （输入长度最小是1，此时返回s）
s consist of only digits and English letters.（输入只包含数字和英文字符）

Expanding Window

LeetCode第3题，最长无重复substring，使用了Sliding Window技巧，记录符合要求的序列的left和right，以及其它一些状态信息。本题也可尝试变通地使用这个技巧，我称之为Expanding Window，也会记录下left和right和各种需要的状态信息。不过最后需要返回字符串对象。

后来发现leetcode官方的solution中，也有这个思路，而且，这个思路还算不错的！

Python

class Solution:
    def longestPalindrome(self, s: str) -> str:
        n = len(s)
        i = left = right = 0
        while i<n:
            j = 1
            t1 = i
            while i+j<n and s[i+j]==s[i]:
                j += 1
            t2 = i + j - 1
            j = 1
            while t1-j>=0 and t2+j<n and s[t1-j]==s[t2+j]:
                j += 1
            step = j - 1
            if t2-t1+2*step > right-left:
                left = t1 - step
                right = t2 + step
            i = t2 + 1
        return s[left:right+1]

t1和t2用来解决连续相同字符的情况。

C++

class Solution {
public:
    string longestPalindrome(string s) {
        size_t n{s.size()};
        int i{},j{},left{},right{},t1{},t2{},step{};
        while(i < n){
            j = 1;
            t1 = i;
            while((i+j<n) && (s[i+j]==s[i]))
                ++j;
            t2 = i + j - 1;
            j = 1;
            while((t1-j>=0) && (t2+j<n) && (s[t1-j]==s[t2+j]))
                ++j;
            step = j - 1;
            if(t2-t1+2*step > right-left){
                left = t1 - step;
                right = t2 + step;
            }
            i = t2 + 1;
        }
        return s.substr(left, right-left+1);
    }
};

C

char* longestPalindrome(char *s){
    size_t n = strlen(s);
    int i,j,left,right,t1,t2,step;
    i = j = left = right = t1 = t2 = step = 0;
    while(i < n){
        j = 1;
        t1 = i;
        while((i+j<n) && (s[i+j]==s[i]))
            ++j;
        t2 = i + j - 1;
        j = 1;
        while((t1-j>=0) && (t2+j<n) && (s[t1-j]==s[t2+j]))
            ++j;
        step = j - 1;
        if(t2-t1+2*step > right-left){
            left = t1 - step;
            right = t2 + step;
        }
        i = t2 + 1;
    }
    s[right+1] = '\x00';
    return s+left;
}

Dynamic Programming

这道题的技能点居然是DP算法，Dynamic Programming Algorithm，它是一个算法设计优化技巧。

理解DP算法

DP algorithm's definition of Wikipedia: “method for solving complex problems by breaking them down into simpler subproblems”。Dynamic programming works by storing the result of subproblems so that when their solutions are required, they are at hand and we do not need to recalculate them. （在计算过程中，记住subproblem的解，在后续的计算中，直接使用，不用重复计算） This technique of storing the value of subproblems is called memoization.

Dynamic programming with memoization is a top-down approach to dynamic programming. By reversing the direction in which the algorithm works i.e. by starting from the base case and working towards the solution, we can also implement dynamic programming in a bottom-up manner.

Dynamic programming is mostly applied to recursive algorithms. This is not a coincidence, most optimization problems require recursion and dynamic programming is used for optimization. But not all problems that use recursion can use Dynamic Programming. Unless there is a presence of overlapping subproblems like in the fibonacci sequence problem. Some recursions can only reach the solution by using a divide and conquer approach, and that is the reason why a recursive algorithm like Merge Sort cannot use dynamic programming, because the subproblems are not overlapping in any way.

不是所有recursive算法都可以用DP来优化，必须要有overlapping subproblems，即重复的子问题，如果子问题之间不存在重复的更小的子问题，就无法利用DP思路来优化，比如并归排序Merge Sort。

DP算法专题

应用DP优化，需要observation。很多时候，一个好算法的出现，都是因为对问题本身有了更深刻的认识，有了insight或者关键重要的observation！

Python

下面的DP版本是buttom-up的思路，比上面的Expanding Windows版本慢40倍！思路为：不断地扩大两个字符之间的跨度，并记录下是否为panlindrom，最后的那个就是最长的。

class Solution:
    def longestPalindrome(self, s: str) -> str:
        n = len(s)
        sr = {(0,0):1}
        sr |= {(i,i+1):1 for i in range(n-1) if s[i]==s[i+1]}
        sr |= {(i,i+2):1 for i in range(n-2) if s[i]==s[i+2]}
        for i in range(3,n):
            for j in range(n-i):
                if (j+1,j+i-1) in sr and s[j]==s[j+i]:
                    sr[(j,j+i)] = 1
        k, _ = sr.popitem()
        left, right = k
        return s[left:right+1]

这段代码虽然很慢，但还是有一些值的总结的Python细节：

|，合并两个dict对象；|=，inplace合并，同update方法。
popitem，弹出最后一个插入dict对象的元素，dict对象的插入顺序可以依赖。
a,b = k，unpacking tuple/list，无需*，此处k是两个元素的tuple。如果k是只包含一个元素的tuple，写作a, = k。
sth in sr，sr如果是list，很慢，sr如果是set，无序，因此这里只能是dict，1作为value是没有用的，而且dict内存使用较多，每个key都是一个tuple。

anyway.....更新一下，对齐官方使用嵌套list的方案，看看速度。

class Solution:
    def longestPalindrome(self, s: str) -> str:
        n = len(s)
        # sr = [[0]*n]*n  # wrong!!
        sr = [[0]*n for _ in range(n)]
        left, right = 0, 0
        for i in range(n):
            sr[i][i] = 1
        for i in range(n-1):
            if s[i] == s[i+1]:
                sr[i][i+1] = 1
                left, right = i, i+1
        for i in range(2,n):
            for j in range(n-i):
                t = j+i
                # if sr[j+1][t-1] and s[j]==s[t]:
                if s[j]==s[t] and sr[j+1][t-1]:  # faster
                    sr[j][t] = 1
                    left, right = j, t
        return s[left:right+1]

[0]*n，这种表达式在应用到元素为mutable对象的时候，要小心，它们都是一样的，这里只是shallow copy。
如果把s[j]==s[t]这个判断放到前面，速度会再快一些！

也就快一丢丢吧....不过这种嵌套list的方式，比较适合同步到C++和C版本中去。

C++

class Solution {
public:
    string longestPalindrome(string s) {
        int n {s.size()};
        vector<vector<char>> sr(n, vector<char>(n,0));
        int i{}, j{}, left{}, right{};
        for(i=0; i<n; ++i)
            sr[i][i] = 1;
        for(i=0; i<n-1; ++i)
            if(s[i] == s[i+1]){
                sr[i][i+1] = 1;
                left = i; right = i+1;
            }
        for(i=2; i<n; ++i)
            for(j=0; j<n-i; ++j){
                int t {j+i};
                if((s[j]==s[t]) && (sr[j+1][t-1])){
                    sr[j][t] = 1;
                    left = j; right = t;
                }
            }
        return s.substr(left, right-left+1);
    }
};

C

char* longestPalindrome(char *s){
    size_t n = strlen(s);
    char (*sr)[n] = (char*)calloc(n, n);
    int i, j, left, right;
    left = right = 0;
    for(i=0; i<n; ++i)
        sr[i][i] = 1;
    for(i=0; i<n-1; ++i)
        if(s[i] == s[i+1]){
            sr[i][i+1] = 1;
            left = i; right = i+1;
        }
    for(i=2; i<n; ++i)
        for(j=0; j<n-i; ++j){
            int t = j+i;
            if((s[j]==s[t]) && (sr[j+1][t-1])){
                sr[j][t] = 1;
                left = j; right = t;
            }
        }
    free(sr);
    s[right+1] = '\x00';
    return s+left;
}

使用calloc来申请内存，自动清零。
char (*sr)[n]这个定义有点烧脑，背后有两个知识点：（1）这是在定义二维数组，这个定义使得双下标表达式可以正常工作。（2）n是运行时才能确定的变量，这里用到了gcc支持的C99特性VLA。

这个C语言版本，比使用Expanding Window的Python版本，慢50%！

这应该不算是DP吧...

本来是想试试另一个DP思路的，结果代码写到最后，发现跟最前面的expanding windows基本一样了。

class Solution:
    def longestPalindrome(self, s: str) -> str:
        n = len(s)
        pair = []
        i = 0
        left = right = 0
        while i < n:
            j = i + 1
            while j<n and s[i]==s[j]:
                j += 1
            if j-i-1 > right-left:
                left = i
                right = j-1
            pair.append((i,j-1))
            i = j
        for i,j in pair:
            step = 0
            while True:
                step += 1
                if (lo:=i-step) < 0:
                    break
                if (hi:=j+step) >= n:
                    break
                if s[lo] != s[hi]:
                    break
            step -= 1
            if j-i+step*2 > right-left:
                left = i - step
                right = j + step
        return s[left:right+1]

本来是想基于pair这个初始状态，一步步向上计算，但是最后发现，每个初始状态都是非常独立的，各自直接计算到最后反而更节省内存。这道题不适合DP的原因，我认为：子问题完全独立，子问题之间没有交互。

Longest Common Substring

此题是否可以转化为最长相同子串问题？求出s和s[::-1]的lcsubstr。

在推进过程中，遇到一个测试用例：

aacabdkacaa

按此思路，得到的结果是aaca，但正确的结果为aca。此方法不行！

anyway，保留一段暴力求解longest common substring的代码：

def lcsubstr(a,b):
    na = len(a)
    nb = len(b)
    ss = ''
    for i in range(na):
        for j in range(nb):
            k = 0
            while i+k<na and j+k<nb:
                if a[i+k] == b[j+k]:
                    k += 1
                else:
                    break
            if k > len(ss):
                ss = a[i:i+k]
    return ss

本文链接：https://cs.pynote.net/ag/leetcode/202211201/

-- EOF --

-- MORE --