Hello, and welcome to our analysis lesson. Today, we will be tackling a common problem in the field of string manipulations with C++. We will learn how to find all occurrences of a substring
within a larger string
. The techniques you will master today can be utilized in numerous situations, such as text processing and data analysis. Are you ready to get started? Let's jump right in!
Here is the task for today: We have two vectors of strings, both of identical lengths — the first containing the "original" strings and the second containing the substrings
. Our goal is to detect all occurrences of each substring
within its corresponding original string and, finally, return a vector
that contains the starting indices of these occurrences. Remember, the index counting should start from 0.
Example
Let's consider the following vectors:
Original Vector: { "HelloWorld", "LearningC++", "GoForBroke", "BackToBasics" }
Substring Vector: { "loW", "ear", "o", "Ba" }
.
The following are the expected outputs: In "HelloWorld", "loW" starts at index 3. In "LearningC++", "ear" starts at index 1. In "GoForBroke", "o" appears at indices 1, 3, and 7. In "BackToBasics", "Ba" starts at indices 0 and 6.
Thus, when findSubString({"HelloWorld", "LearningC++", "GoForBroke", "BackToBasics"}, {"loW", "ear", "o", "Ba"})
is invoked, the function should return
1{ 2 "The substring 'loW' was found in the original string 'HelloWorld' at position(s) 3.", 3 "The substring 'ear' was found in the original string 'LearningC++' at position(s) 1.", 4 "The substring 'o' was found in the original string 'GoForBroke' at position(s) 1, 3, 7.", 5 "The substring 'Ba' was found in the original string 'BackToBasics' at position(s) 0, 6." 6}
Although this task may seem fairly straightforward, it can prove challenging. However, don't worry! We will break it down step by step.
Initially, we need to create a space to store our results. Can you think of a C++ data type that would be ideal for this task? That's right! A vector
would be perfect!
C++1std::vector<std::string> solution(std::vector<std::string> orig_strs, std::vector<std::string> substrs) { 2 std::vector<std::string> result;
To pair original strings with their substrings
, we use a simple for
loop. In C++, we don't have the zip()
function like in Python; however, we can achieve the same result by relying on the indices, as both vectors share the same length. To find the first occurrence of each substring
in the corresponding original string, we utilize the std::string::find
method:
C++1 for (size_t i = 0; i < orig_strs.size(); ++i) { 2 size_t start_pos = orig_strs[i].find(substrs[i]);
In string::find(substr)
, we provide the substring
that we intend to locate. The function starts its search from the beginning because we have not specified a starting position.
The next step is to find the subsequent instances of the substring
in the original
.
To do this, we will use a while
loop. But when should we stop looking for more occurrences? When our find()
function starts returning std::string::npos
, it indicates there are no more matches to be found.
Each time we locate a match, we record its starting index in the match_indices
vector, adjust the start_pos
, and begin the search anew:
C++1 std::vector<size_t> match_indices; 2 while (start_pos != std::string::npos) { 3 match_indices.push_back(start_pos); 4 start_pos = orig_strs[i].find(substrs[i], start_pos + substrs[i].size()); 5 }
Finally, we employ std::ostringstream
to format the result for improved readability and add it to the result
vector:
C++1 std::ostringstream os; 2 os << "The substring '" << substrs[i] << "' was found in the original string '" << orig_strs[i] << "' at position(s) "; 3 for (size_t idx : match_indices) 4 os << idx << ", "; 5 os.seekp(-2, os.cur); // remove trailing comma and space 6 os << "."; 7 result.push_back(os.str()); 8 }
That's it! We have completed the design of our function.
Here is the complete function, incorporating all the steps we have discussed so far:
C++1#include <vector> 2#include <string> 3#include <sstream> 4#include <iostream> 5 6std::vector<std::string> solution(std::vector<std::string> orig_strs, std::vector<std::string> substrs) { 7 std::vector<std::string> result; 8 9 for (size_t i = 0; i < orig_strs.size(); ++i) { 10 size_t start_pos = orig_strs[i].find(substrs[i]); 11 std::vector<size_t> match_indices; 12 13 while (start_pos != std::string::npos) { 14 match_indices.push_back(start_pos); 15 start_pos = orig_strs[i].find(substrs[i], start_pos + substrs[i].size()); 16 } 17 18 std::ostringstream os; 19 os << "The substring '" << substrs[i] << "' was found in the original string '" << orig_strs[i] << "' at position(s) "; 20 for (size_t idx : match_indices) 21 os << idx << ", "; 22 os.seekp(-2, os.cur); // remove trailing comma and space 23 os << "."; 24 result.push_back(os.str()); 25 } 26 27 return result; 28} 29 30int main() { 31 // Call the function 32 std::vector<std::string> result = solution({ "HelloWorld", "LearningC++", "GoForBroke", "BackToBasics" }, { "loW", "ear", "o", "Ba" }); 33 for (int i = 0; i < result.size(); ++i) { 34 std::cout << result[i] << std::endl; 35 } 36 return 0; 37}
Well done! You've mastered a central operation in string manipulations in C++ — finding all occurrences of a substring
in another string. Keep in mind that this algorithm has numerous applications in real-world scenarios. Now that we have intricately dissected the problem and provided a detailed solution, I encourage you to practice more. Future exercises will help you hone your skills further. Keep on coding and exploring!