3 - Common Subwords and Subsequences
Enroll to start learning
You’ve not yet enrolled in this course. Please enroll for free to listen to audio lessons, classroom podcasts and take practice test.
Interactive Audio Lesson
Listen to a student-teacher conversation explaining the topic in a relatable way.
Introduction to Common Subwords
🔒 Unlock Audio Lesson
Sign up and enroll to listen to this audio lesson
Class, today we are going to learn about common subwords. Can anyone tell me what a subword is?
I think a subword is a part of a word?
Exactly! A subword is a contiguous segment of a word. For instance, in the word 'secret', 'sec' is a subword. Now, what do you think about finding a common subword?
Is it a part that appears in two different words?
Correct! Common subwords appear in both words. For example, 'secret' and 'secretary' share the common subword 'secret'.
Let's remember this as the subword-segment relationship. Can anyone give me another example?
How about 'bisect' and 'trisect'—they have 'isect' as a common subword?
Great example! Keep this in mind as we dive deeper.
Brute-Force Approach
🔒 Unlock Audio Lesson
Sign up and enroll to listen to this audio lesson
Now, let’s discuss the brute-force method for finding the longest common subword. Can anyone suggest how we might do that?
We could check each letter in both words?
Yes! We look at all possible starting positions in both words. If we run this algorithm, what will the complexity be?
It sounded like O(m * n²) from the lecture?
Exactly! M is the length of one word, and N is the length of the other. This means for each character, we are checking every other character until we reach a mismatch.
Inductive Approach
🔒 Unlock Audio Lesson
Sign up and enroll to listen to this audio lesson
To enhance our method, we can use an inductive approach. Who can explain how this works?
We can compare letters and then reduce the problem size?
Absolutely! If we find a match, we can add to the length and continue comparing. What if they don't match?
Then we just stop and start over from the next letters?
Right! This allows us to significantly reduce unnecessary checks.
Dynamic Programming
🔒 Unlock Audio Lesson
Sign up and enroll to listen to this audio lesson
Finally, we arrive at the dynamic programming solution. Can someone describe this approach.
We use a table to store common subword lengths?
Correct! Each cell in the table corresponds to subproblems that we build upon. How do we populate this table, for example?
We check characters and if they match, we store the value from the diagonal plus one?
Exactly! And when they do not match, the length remains zero. This is efficient!
Extracting the Common Subword
🔒 Unlock Audio Lesson
Sign up and enroll to listen to this audio lesson
Now that we know how to find the length, who can tell me how we extract the common subword?
Do we trace back through the table?
Great job! We trace our steps back through the table, leveraging the information we’ve accumulated.
So, we follow the diagonal we mentioned earlier?
Exactly! The diagonal elements represent parts of our common subword.
This tracing allows us to see all candidates for the longest subword.
Correct! This understanding can reinforce your approach in algorithm design.
In summary, we defined subwords, discussed the brute-force method, transitioned to induction, and wrapped it up with dynamic programming while exploring subword extraction!
Introduction & Overview
Read summaries of the section's main ideas at different levels of detail.
Quick Overview
Standard
In this section, we explore the longest common subword problem, highlighting how to compute the length of the longest subword using brute-force methods and introducing a more efficient dynamic programming approach. This includes detailed examples and structures to visualize and solve the problem significantly.
Detailed
Detailed Summary
In this section, we delve deep into the problem of finding the longest common subword between two sequences of characters, typically strings. A subword is defined as a contiguous segment of a word. For example, in the words 'secret' and 'secretary', the longest common subword is 'secret', which has a length of 6.
The section begins by formalizing the problem by denoting two strings and defining what a common subword is. The initial focus is on creating a brute-force algorithm that examines all possible starting positions in both words, leading to a time complexity of O(m * n^2).
Next, the section introduces an inductive approach to improve the efficiency of finding the longest common subword. The key principle here is utilizing the recursive property—if we find a match at positions a_i and b_j, we can extend the search to the next characters in both strings. If they do not match, we seek the next potential starting position, thereby constructing a more efficient search mechanism.
The section concludes with a dynamic programming approach. It proposes the use of a two-dimensional table to store subproblem solutions—where each cell represents the length of the longest common subword found so far. By systematically updating this table, we can derive both the length of the longest common subword and easily backtrack to extract the subword itself. This yields a more efficient solution compared to the brute-force method, eliminating unnecessary computations.
Youtube Videos
Audio Book
Dive deep into the subject with an immersive audiobook experience.
Introduction to Common Subwords
Chapter 1 of 6
🔒 Unlock Audio Chapter
Sign up and enroll to access the full audio experience
Chapter Content
Let us now turn to the problem of finding Common Subwords and Subsequences between two sequences. The first problem we look at is what is called a longest common subword problem... our main goal at the moment is to compute not the subword itself, but the length.
Detailed Explanation
This chunk introduces the concept of common subwords, which are segments that appear in both sequences. The focus is on determining the length of the longest common segment without actually providing the subword.
Examples & Analogies
Imagine searching for the longest common phrase in two different books. You want to identify how many words in a row are the same in both, but you're not necessarily interested in what that phrase is, just how long it is.
Defining the Problem More Formally
Chapter 2 of 6
🔒 Unlock Audio Chapter
Sign up and enroll to access the full audio experience
Chapter Content
Let us look at this thing more formally, so we have two words a 0 to a m and b 0 to b n... I will say that u and v have a common subword of length k.
Detailed Explanation
Here, two sequences are represented mathematically, with indices marking the positions in each word. A common subword is defined as a matching segment of 'k' characters starting at positions 'i' and 'j' in each sequence.
Examples & Analogies
Think of stitching two pieces of fabric. You can only stitch where the fabric overlaps, and the length of the stitch represents the common segment. If you find a longer stitch, that means there's a longer common piece.
Brute Force Approach
Chapter 3 of 6
🔒 Unlock Audio Chapter
Sign up and enroll to access the full audio experience
Chapter Content
So, here is a Brute force algorithm... therefore, this will be O of m times n into n or O m n squared.
Detailed Explanation
This section describes a naive approach to determine the longest common subword. It involves checking every possible pair of starting positions in the two words and matching characters sequentially. The time complexity is O(mn^2), which can be inefficient for long words.
Examples & Analogies
Imagine using a manual method to compare every sentence in two books by reading each page one by one to spot similarities. It's exhaustive and time-consuming, illustrating the inefficiency of a brute force method.
Inductive Observation
Chapter 4 of 6
🔒 Unlock Audio Chapter
Sign up and enroll to access the full audio experience
Chapter Content
The first observation in the inductive observation is that if I have a i a i plus 1... adding to it, because the very first letter does not match.
Detailed Explanation
This chunk discusses leveraging an inductive approach to improve efficiency. It establishes that if two characters match at positions 'i' and 'j', the problem reduces to finding common subwords at 'i+1' and 'j+1'. Otherwise, no common subword can be extended from these positions.
Examples & Analogies
Consider comparing two people's shopping lists. If the first items match, you can check the next items for matches. If they don’t match, you might as well stop comparing further at that point.
Dynamic Programming Approach
Chapter 5 of 6
🔒 Unlock Audio Chapter
Sign up and enroll to access the full audio experience
Chapter Content
So, we could write a memorized recursive function for that... that is, if this square depends on this square.
Detailed Explanation
The text introduces the concept of dynamic programming to optimize the solution by storing intermediate results. Each entry in a table corresponds to calculated values indicating the length of common subwords, allowing for efficient lookups.
Examples & Analogies
Think of dynamic programming like saving your progress in a video game. Instead of starting over every time you make a mistake, you can go back to a safe point, saving time and effort.
Implementation of the Algorithm
Chapter 6 of 6
🔒 Unlock Audio Chapter
Sign up and enroll to access the full audio experience
Chapter Content
So, here is some code for this particular algorithm... you could invert these two indices and it row by row.
Detailed Explanation
Finally, this section shares an actual code implementation of the dynamic programming approach. It initializes a table to store the lengths of matching segments and iteratively fills in the values based on previous computations.
Examples & Analogies
Writing an instruction manual to guide how to play a game, ensuring players understand the steps they can take based on where they currently are in the game. Each step is built on the previous steps.
Key Concepts
-
Longest Common Subword: The maximum length of a segment appearing in both words.
-
Brute-force Method: An exhaustive technique examining all possible matches.
-
Dynamic Programming: An advanced technique used for efficiency by storing results of subproblems.
Examples & Applications
The longest common subword between 'director' and 'secretary' is 'ec' and 're', of length 2.
For the words 'abc' and 'bc', the longest common subword is 'bc', length 2.
Memory Aids
Interactive tools to help you remember key concepts
Rhymes
In searching for a word that's common, look for segments, let’s keep it fun; a subword's where two words align, for longest matching, we'll combine.
Stories
Imagine two friends, Alice and Bob, who have overlapping stories. They want to find the longest part of their tales that matches. By sharing and checking their writings, they discover 'adventure' is a common theme!
Memory Tools
Subwords in two: Stand for Similarity; Underline the Basics!
Acronyms
LCS - Longest Common Subword, helps to remember it's all about those matching segments!
Flash Cards
Glossary
- Subword
A contiguous segment of a word.
- Longest Common Subword
The longest contiguous segment found in two words.
- Bruteforce Algorithm
An exhaustive search method that checks all positions.
- Dynamic Programming
An optimization approach that solves problems by breaking them down into simpler subproblems.
- Induction
A method of proving a property for all integers by establishing it for the base case and an arbitrary case.
Reference links
Supplementary resources to enhance your learning experience.