187. Repeated DNA Sequences

1. Description

The DNA sequence is composed of a series of nucleotides abbreviated as ‘A’, ‘C’, ‘G’, and ‘T’.

  • For example, “ACGAATTCCG” is a DNA sequence.

When studying DNA, it is useful to identify repeated sequences within the DNA.
Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

2. Example

Example 1:
Input: s = “AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT”
Output: [“AAAAACCCCC”,“CCCCCAAAAA”]

Example 2:
Input: s = “AAAAAAAAAAAAA” Output: [“AAAAAAAAAA”]

3. Constraints

  • 1 <= s.length <= 10^5
  • s[i] is either ‘A’, ‘C’, ‘G’, or ‘T’.

4. Solutions

Sliding Window && Hash Table

n = str.size()
Time complexity: O(n)
Space complexity: O(n)

class Solution {
public:
    vector<string> findRepeatedDnaSequences(const string &str) {
        unordered_map<char, int> binary = {{'A', 0}, {'C', 1}, {'G', 2}, {'T', 3}};
        const int sequence_length = 10;
        if (str.size() < sequence_length) {
            return {};
        }

        int sequence = 0;
        for (int i = 0; i < sequence_length; ++i) {
            sequence = (sequence << 2) | binary[str[i]];
        }
        unordered_map<int, int> sequence_count = {{sequence, 1}};
        vector<string> repeated_DNA;
        for (int i = sequence_length; i < str.size(); ++i) {
            sequence = (sequence << 2) & ((1 << (sequence_length * 2)) - 1) | binary[str[i]];
            ++sequence_count[sequence];
            if (sequence_count[sequence] == 2) {
                repeated_DNA.emplace_back(str.substr(i - sequence_length + 1, sequence_length));
            }
        }

        return repeated_DNA;
    }
};
comments powered by Disqus