C# Text Processing - Lessons
The Character (char) Data Type
The Char data type gives you access to individual characters in a string. This is the first step being able to manipulate strings. C# has a rich collection of tools to make all this happen.
Substring Overview
The following video is an overview of all the methods you can use to manipulate strings. This video will be followed by a series of videos showing you how to use each method effectively. I’m doing each video separately because these are the things you’ll want to come back to and refer to individually when writing code. Some of these methods you’ll use rarely so you don’t want them cluttering up your brain. The ones you use regularly will become a part of your programming style. They’ll stick with you.
Contains Method
The Contains method allows you to find a substring within a larger string. Basically, you enter the substring or a character and let the method determine if that substring or character is contained within the larger string. Basically, it returns a true or a false.
The following video shows you how to set this up simply. It then follows up with a practical example of how you might search a database of movies to find several occurrences of a search.
StartsWith Method
The StartsWith method allows you to find a substring within a larger string. It is similar to the Contains method except that the Substring must be at the beginning of the target string. Basically, you enter the substring and let the method determine if that substring or character is at the beginning of the larger string. Basically, it returns true or false.
The following video shows you how to set this up simply. It then follows up with a practical example of how you might search a database of questions and answers to produce what could be a quiz.
EndsWith Method
The EndsWith method allows you to find a substring within a larger string just like StartWith. Really the only difference is that when it does the search it starts from the end of the target string. The Substring must be at the end of the target string. No exceptions.
Basically, you enter the substring and let the method determine if that substring or character is at the very end of the larger string. It returns true or false.
The following video shows you how to set this up simply. It then follows up with a practical example of how you might code a company of cartoon soldiers and assign them to one of four platoons. The EndsWith method helps you display only the soldiers in a particular platoon.
IndexOf Method
The IndexOf method returns the position of the substring you are searching for. That position is returned as an integer to your application. Once you have that integer you are in a position to use other methods to modify your string. You might remove content from the string. You might insert content into a string.
IndexOf comes with two optional parameters. Start allows you to specify where in the string you would like to begin your search. Count gives you the means to limit the search to a specific number of positions.
The following video shows you how to set this up and use the two parameters. I follow this up with a practical example of how you might format a column of numbers, so they align under a decimal point no matter how large or how small the numbers.
LastIndexOf Method
The LastIndexOf method returns the position of the substring you are searching for. It is identical to the IndexOf method except it reads the file in reverse order – backwards. That position is returned as an integer to your application. Once you have that integer you can use other methods to modify your string. You might remove content from the string. You might insert content into a string.
Like IndexOf, LastIndexOf comes with two optional parameters. Start allows you to specify where in the string you would like to begin your search. Count gives you the means to limit the search to a specific number of positions. This is confusing, however, as the searching is going in reverse order.
Substring Method
The Substring method returns a portion of the target string beginning with a starting position. The remaining portion of the string is returned unless an optional count parameter is added to the method which will limit the returned portion of the string to a set number of characters.
Think of it this way. If your target string is a phone number, you could extract the exchange by specifying the starting position of the exchange and the length (3) of the exchange. And that’s exactly what this next video will show you.
Insert Method
The Insert method allows you to place characters, a string, inside the target string at a specific position. This will change the original string, or new string if you elect to assign the results to a new string variable.
Think of it this way. I you have a phone number consisting of 10 digits, nothing else, you could format that number by inserting parentheses, spaces, dashes, or any number of things to format the phone number a certain way.
That’s what you are going to see in this next video. We’re going to take 10 digits and surround the first 3 digits with an opening and closing parenthesis. We’re going to place a space before the exchange, or next set of 3 digits. And then we are going to place a hyphen after the exchange.
Remove Method
If you think back to Substring Start, we were able to extract a substring from a larger string. The Remove method allows us to remove a substring from the larger string leaving us with whatever is left.
So, Remove is like Insert in that it affects the existing string. We don’t need another variable. We just change the contents of our string.
In this video I have an example of a list of movies. The original list was in a spreadsheet. I extracted the contents to a text file. And for this example, I just took the first record and pasted it into an application. Now the record contained the movie name, the year, the CD it’s located on and the position on that CD. I think there were a few other things as well. But all I want is the movie name. I want to remove everything else. That’s where the Remove method comes into play.
ToUpper() / ToLower()
It’s pretty easy to convert a character from lower case to upper case. A lower case “a” is an ASCII 113. Subtract 32 from it and you get 81. ASCII 81 is a capital “A”. And since you can to that with a character you can to that to an entire string. That’s what the methods ToLower and ToUpper do for us. So, with these two methods in hand we can convert the case of a single character or an entire paragraph. But why would you want to do that?
A very good answer to that question is to compare strings. When you are trying to find a single name in a file of 5,000 names you need to consider not just how the name is spelled but the case of the letters used to spell that name. If you are looking for “Mary” and you capitalize the “M” any search is only going to find Mary if a match is found including the capitalized “M”. In other words, you are only going to find things that are spelled exactly as you typed them, including case.
The following video demonstrates how you can accept a login ID and not care how the user types it in. By converting what the user typed and your database of matching login IDs to all upper case or all lower case you can guarantee a match will be found as long as all the letters are there.
Trim(), TrimStart(), TrimEnd()
When you work with databases, you’ll find that the data stored in a database is fixed. By that I mean if you have a first name field it is typically set for a certain number of characters. Imagine a 15 character field for first name. If you are storing the name “Sam” in that field then you have 3 characters and 12 space characters in the name. If you have a first name like “Tyrannosaurus Rex” the database will only store the first 15 characters. The rest is truncated.
Now, when you use data stored in that database you typically want to remove the spaces from the fields so you can concatenate the name and display it like “Mr. Sam Spade”. Same with dollar amounts. You may have a set of leading or trailing spaces attached to a dollar amount. You want to remove that for display purposes. That’s where Trim, TrimStart, and TrimEnd come into play. These are very useful methods for working with string data. And you’ll use them a lot.