Posted by robert on Sep 13, 2021 in programming
When working strings there are any number of operations you might need to perform. One operation is to reverse a string.
When working strings there are any number of operations you might need to perform. One operation is to reverse a string.
Strings are one of those basic constructs that are so important to communicating that many programming languages include them as a basic type.
What is a string? It is nothing more than a series of zero or more characters. For computers, characters are just symbolic representations of numbers. What is the difference between the letter `A` and the number `65`? To a computer nothing. The difference is how the value is interpreted by the higher level language. If a program labels the as an integer then it is the number `65`. If a program labels the value as a character then the value is interpreted as the letter `A`.
If we want to create a string then we need to represent it as a series of characters. So the string `Hello, World!` is a series of 13 characters that are put together to form a string.
Once of the first standards for encoding characters was the ASCII character set. The ASCII table contains a set of values that can be represented in 8 bits. This is a standard used to ensure a standard way to exchange information. This ensures that when a computers stores the 8 bit representation of an `A` that all other computers would understand what that the value represents and not interpret is as different value.
As computers spread throughout the world the need to be able to represent characters outside of the standard English alphabet became more important. New ways of encoding characters were created to enable this. One of these is the UTF-16 character encoding. This encodes characters in a 16-bit instead of 8-bits. This requires more memory per character but allows for a much richer set of characters to be encoded.
To better understand what a string is we need to understand how it is stored in a computer. First we will look at the architecture of a 32-bit computer. This means that every spot in memory has an address that can be represented by 32 bits. One way to think about this is a piece of graph paper.
Imagine a piece of graph paper that has a grid on it. The grid is 32 squares wide. The rows represent a single row in the memory. So if you have 256 bits of memory on a 32 bit architecture you would have 8 rows total.
There are a lot of different architectures out there for computers. The most common are 32-bit and 64-bit. However, there are plenty out there that have a different architecture. Additionally, how memory is addressed on each of the architectures depends on the implementation. For more information word sizes see this article
If we use an 8-bit representation to store and encode characters like in ASCII then we could represent a Word
as four characters. Each of the characters are stored in a 1-byte space of memory.
If we start at the memory address of 0 (0x0) then we would have the character W
at memory address 0 (0x0) followed by the character o
at address 8 (0x8), the character r
at address 16 (0x10) and finally the character d
at address 24 (0x18).
Since computers store information in the form of 1's and 0's the characters are represented as binary numbers. Using the ASCII representation results in the letters represented as the following numbers.
Character | Decimal | Hex | Binary |
---|---|---|---|
W | 87 | 56 | 0101 0111 |
o | 111 | 6F | 0110 1111 |
r | 114 | 72 | 0111 0010 |
d | 100 | 64 | 0110 0100 |
If we put the characters into our memory space as a series of contiguous 8-bit (1-byte) numbers we get something that looks like this.
A string is just a series of contiguous characters stored in memory. To store a string we need to know two things. The starting address of the first character and the size of the string or length of string. The length is just the number of characters in the string.
Now that we have a Word
stored in memory we can see that the first character starts at address 0x0. This is where the starting address of a string would be indicated. The length of Word
is 4. When we want to get the value of the string we return the address of the first character and each character that follows until we have reached the length of the string.
That is a relatively basic explanation. The specifics on how a string is stored and read will depend on the architecture of the hardware that is being used. When writing in a programming language the manner in which you define the string will also be dependent upon the language. Some define a string as an array of characters. Others have a type that represents a string just like a type represents an integer.
The problem presented is that you have a one dimensional array of numbers. From that array you need to find the maximum sum sub-array. The sub-array must be contiguous.
The solution to this can be in several forms. You could just want the actual maximum sum of the numbers. In this case you would find the sub-array and then sum all of the numbers and return the value of the maximum sum. Another solution could be that you want the elements of the sub-array. Finally, perhaps you just need to provide the indices where the sub-array starts and ends.
There are several ways you can implement this solution.
Here we explore Kadane's algorithm.
Presented with an array of numbers, find the maximum sum sub-array.
Kadane's algorithm is an optimized method of scanning the array to find the largest sum sub-array. Like the brute-force method it iterates over the array to find the maximum sum. However it does it using a single loop which improves the performance.
maxSum = 0
currentSum = 0
// Iterate over the array
for (i = 0; i < array.length; i++)
{
// add current index to sum
currentSum = currentSum + array[i]
// Is the current sum less than zero?
if(currentSum < 0)
{
// Reset the sum and subarray indexes.
currentSum=0
}
if(currentSum > maxSum)
{
maxSum = currentSum;
}
}
return maxSum
}
In addition to only having a single loop this algorithm has another change from the brute-force method. It resets the currentSum
to zero if it ever drops below zero. This is the key to this algorithm. If the currentSum
is ever less than zero then you can safely discard the entire sub-array that you have built up to this point. Any values to the left of the current position will only result in a smaller sum no matter how large the positive values are remaining in the array.
Finally, the currentSum
is compared to the maxSum
and a new maxSum
is assigned if the currentSum
is larger.
If we use our example array towalk through the algorithm this is what we see. First the iterator starts with position 0. This results in a sub-array of length 1 with a currentSum of -1
The currentSum
is found to be less than zero and so the currentSum
is reset to 0. Since the currentSum
is not greater than the maxSum
the loop continues and increments to the next item.
During the second time through the loop the array[i] value of 3 is examined. The currentSum
is incremented by 3 resulting in a value of 3 since we reset the currentSum
to zero during the previous loop. The currentSum is not less than zero so we do not need to reset it. The currentSum is greater than the current maxSum and so the maxSum is now set to the curentSum and the loop is incremented to the next element.
The array[2] is now being inspected. The value of this is added to the currentSum and results in a value of 1 since the value of currentSum is 3 and the value of array[i] is -2.
The loop continues to inspect the next element in the array and add it to the currentSum. Each time the currentSum is larger than the maxSum it is assigned to maxSum.
After reaching array[6] the maxSum is currently set to 6.
The array index is incremented to the next two elements array[7] and array[8]. Each time the currentSum is insepcted it is found that it is less than the maxSum. After inspecting all of the elements it has been determined that the maxSum is 6.
The solution must iterate over the entire array before it can determine the maximum sum. However it only needs to do this one. This means that the time for this algorithm to complete it will take N
iterations. This results in a complexity of O(n).
The algorithm as it is presented will not work if all numbers in the array are negative or if the highest value item is a zero. A minor change to the algorithm can remedy this though. Set the initial maxSum
to MIN_INT
to represent the minimum value for an integer. Secondly, move the check for a currentSum less than zero to after the maxSum is set.
maxSum = MIN_INT
currentSum = 0
// Iterate over the array
for (i = 0; i < array.length; i++)
{
// add current index to sum
currentSum = currentSum + array[i]
if(currentSum > maxSum)
{
maxSum = currentSum;
}
// Is the current sum less than zero?
if(currentSum < 0)
{
// Reset the sum and subarray indexes.
currentSum=0
}
}
return maxSum
}
Now imagine an array where ALL of the values are MIN_INT
. With the maxSum now set to the minimum value for an integer any value in the array is guaranteed to be greater than or equal to the maxSum. If the first value in the array is the MIN_INT
then the maxSum will remain MIN_INT
. The currentSum will be reset to 0. At each iteration, the value of currentSum will be found to be less than or equal to the maxSum and therefore the maxSum will not change.
After all of the array values have been inspected the maxSum of MIN_INT
will be returned.
If there is an array where at least one of the values is greater than MIN_INT
then the currentSum will be greater than maxSum and it will be set accordingly.
You now can compute the maximum sum sub-array in O(n) time and process an array with all negative values or a mix of negative and positive.
The problem presented is that you have a one dimensional array of numbers. From that array you need to find the maximum sum sub-array. The sub-array must be contiguous.
The solution to this can be in several forms. You could just want the actual maximum sum of the numbers. In this case you would find the sub-array and then sum all of the numbers and return the value of the maximum sum. Another solution could be that you want the elements of the sub-array. Finally, perhaps you just need to provide the indices where the sub-array starts and ends.
There are several ways you can implement this solution.
Here we explore the Brute-Force method.
Whenever I am trying create a solution, the brute-force method is one of the first things I do. It certainly is not the best, but it can be an effective way to better understand the problem and ensure you are getting a correct solution.
The concept behind the brute-force method is that you will just try all of the possible solutions until you are satisfied that you have the best.
Brute force algorithms are rarely the best. Sometimes they are the only method available for a problem. This is not the case for this problem though.
Using the array [-1, 3, -2, 3, -2, -1, 5, -4, -1] the maximum sum sub-array of [3, -2, 3, -2, -1, 5] starting at index 1 and ending at index 6 with a sum of 6.
A brute force approach to finding this value would be to iterate over the array and keeping track of each sub-array and the sum. If a sum is found that is higher you keep track of that.
maxSum=0
currentSum=0
for(i=0; i < array.length; i++)
{
currentSum=0
for(j=i; j < array.length; j++)
{
currentSum = currentSum + array[j];
if(currentSum > maxSum)
{
maxSum = currentSum;
}
}
}
return maxSum;
In the pseudo-code we start at the first position in the array and reset the current sum. The first check inside the second loop looks at the value of the sub-array starting at index 0 and of a size of 1.
The first time through the loop with i=0
and j=0
the sum is -1. Since -1 is less than the current maxSum
it will not be assigned to maxSum
The end index j
will be incremented to 1 and point to the next element in the array. This increases the size of the current sub-array to 2.
The value of the array[j]
is now added to the currentSum
and currentSum
becomes 2. The currentSum
is compared against the maxSum
. Since the current currentSum
is greater than the maxSum
the maxSum
is updated to be the value of currentSum
.
The inner loop continues to increment the index value of j
and add the value of array[j]
to the currentSum
. Each time the currentSum
is incremented it is compared against the value of maxSum
. If the value of currentSum
is greater than maxSum
it will become the new maxSum
.
This results in the maxSum
changing during the initial loop of j
to 3 and then 5 before reaching the end of the array.
Now that the inner loop has completed to the end of the array the outer loop iterator i
is incremented to 1 and the currentSum
is reset to 0. At this time the maxSum
is 5. Now the process of examining a sub-array and the sum resumes. However, this time the sub-array will start at position 1.
The process of incrementing the inner loop iterator and incrementing the currentSum
will continue until the inner loop reaches the end of the array. When we reach the ending index inside the inner loop with a value of 6 we find the next maxSum
of 6.
The inner loop continues to be incremented until it again reaches the end of the array. The final scan of a sub-array starting at index 1 results in a sub-array of size 8 with a maxSum
of 6.
After the scanning the array with position 0 as the starting index and then scanning the array again with the starting position as 1 we have a new maxSum
of 6.
The process of incrementing the staring index and then scanning the array until the end continues. Anytime a greater maxSum
is found it is saved. Eventually the starting index will be incremented to the very last index in the array and the final sub-array to be tested will be a sub-array of size 1 starting at index 8.
At this point every possible sub-array has been tested and the maximum sum has been determined.
The time complexity of the solution depends on the loops. The inner loop must iterate over the array N
times. This means that the number of iterations depends on the number of elements in the array.
The outer most loop must also iterate over the array N
times. Since the second loop is nested inside the outer loop this makes the complexity N*N
. This is expressed as O(n^2)
.
The next thing to consider is the amount of space, or memory, that is required to complete the solution. In this solution we have to initialize a few variables, but the size of those variables is constant. The size of the incoming array is also constant. The amount of memory used in this solution remains constant during the entire process. This makes the space complexity O(1)
.
After running some sample data through this algorithm you might notice that there are some edge cases that do not work. The first one is when you have an array that contains all negative numbers. The current algorithm doesn't seem to handle it very well. It reports that the maxSum
is zero depsite there being no way to have that value if all of the numbers are negative.
This can be easily fixed.
Instead of initializing the maxSum=0
, initialize it to the minimum integer value possible.
maxSum=MIN_INT
This simple change will now allow the algorithm to properly report on an array that has all negative numbers.
Another aspect of the current algorithm is that you cannot actually report on what the sub-array is that provides the maximum sum. You only know what that sum is. By iterating through the example we can see that the maximum sub-array is actual the array starting at index 1 and ending at index 6.
To be able to provide this information as a result of the algorithm we need to make some minor changes. We need to track what the beginning and ending indexes are. We can do this by adding a few extra variables.
maxStartIndex = 0
maxEndIndex = 0
Initially we assign the values to zero. Each time we find a currentSum
that is larger than the maxSum
we assign the index values. This results in the outer loop iterator i
being assigned to the maxStartIndex
and the inner loop iterator j
being assigned to maxEndIndex
.
We now know at the end of all of the scans that the start and end index is for the sub-array that results in a maximum sum.
Using this information a new sub-array can be returned or just the indexes of the sub-array can be returned.
It has been some time in my career since I had to really analyze an algorithm and come up with an efficient way to solve a problem. When I do, it can really tax my brain.
Perhaps you have the same issue. It would bring me comfort to know that I am not the only one with this problem.
Perhaps you were never presented with a particular type of problem. Perhaps you never learned how to create an algorithm and determine how efficient it is. Perhaps it has just been a really long time since you had a real world problem that required a really smart and efficient solution and "good enough" has been, well, good enough. Let's face it, there are plenty of us that do not work in a life-or-death environment or where the pressure to optimize everything is paramount.
After being stumped on how to compute the largest sub-array with a maximum sum I decided to do a little research on how this could be solved.
There are several ways you can implement this solution. We will consider a few of them.