C# Text Processing - Applications

 

Password Validation

Once you have the Char data type down it’s good to see how you can use it in a real application.

This is the Password Validation application. The idea here is a user enters a potential password into the text box. The password needs to be at least 8 characters long. It should also have at least 1 numeric digit, 1 uppercase character and 1 lowercase character. Anything else is up to you.

The rules a printed on the form and when you press the verify button the application will tell you which rules passed, and which were in error.

There’s some interesting decision making going on in this application. And this is a great example of how to use Booleans in the verification process.

Phone Number Application Part 1

When you write programs a lot of them require input from other sources. Often those sources can be a database or a text file. And the data you are receiving may not be in a constant format. Phone numbers are typical. People format phone numbers many different ways. And that’s fine for them. But in our program a phone number is much easier to use if all of them are in a constant format.

This video is the first of several which will show you how to create a program to read in phone numbers in a variety of formats and produce a new file with all phone numbers in a constant format. It’s pretty simple when you break it down. You read the file in. You remove everything that is not a digit. And once the records are a like series of digits you either concatenate a new format or you use insert to format the number. We’ll do it both ways.

Phone Number Application Part 2

The mixed batch of phone numbers have been successfully loaded into the input list box. The next step is to process that data. There are basically two steps involved. The first step is to filter the digits. In other words, we use the “IsDigit” function to test each character in the input. If it’s not a digit we simply bypass it. If it is a digit we concatenate the digit to a string variable. When we’ve completed the process of one phone number, we add it to an intermediate list box and then tackle the next number.

Phone Number Application Part 3

In the intermediate list box, we now have a set of phone numbers that are similar in length and contain nothing but digits. There are a couple of ways to format these numbers. The approach we are going to use here is the Substring method combined with concatenation. In other words, we are going to extract the area code, the exchange, and the remaining four numbers into separate variables and then combine those three variables with a set of parentheses, a space and a dash. The result will be placed in the output list box.

Phone Number Application Part 4

With the formatted phone numbers resting safely in the output list box it is a simple matter to write a routine that saves this data to a text file. All we’ll do is loop through the output list box and write the data to a file the user has selected through the Save Dialog control.

Phone Number Application Part 5

This video backs up one step. Everything will remain the same except for the FormatOutput method. Much of what is in the FormatOutput method will be stripped out and replaced with code using the Insert method. There will be no concatenations is this video. Rather we will take each 10-digit phone in the intermediate list box and insert parenthesis, a space and a dash to create the 14-digit formatted phone number. To the user it will look like nothing has changed. But inside there is a significant change.

Phone Number Format Module

We did quite a bit of work to format the phone number. Typically, this is a minor function of a larger program. Down the road you will see that we would probably make this routine a class that can be called by many programs. And, while we won’t make it a class right now, we will make it a module that might simulate a class. In other words, we will use what we have already done as a prototype to creating a method that will do what we want it to without displaying everything to the user. In other words, give me a phone number in any format and I’ll return to you a formatted phone number. This will be a black box. Something’s going on inside, but the user doesn’t care. The user just wants results.

Text Processing Hacking - Part 1

We are going to look at the F1 Schedule 2024 website in an effort to extract a concise calendar of events for our own personal use. How we are going to do this we don’t know. The web page is an html document (download here) that we are going to extract to a text file. It will have over 5,000 lines of data in it. We will want the dates and location of every race in 2024. But that’s all we want. How we do this is the big question because this is a unique problem we are going to try to solve.

Basically, we are going to use text processing methods to reduce the amount of data in the html document to just the lines of data we need. Then we are going to use more text processing tools to extract just the data we need from the selected lines.

We’re hacking here. We have no rules. We need to find the patterns in the html data that will allow us to find the data we need.

Text Processing Hacking - Part 2

Now that we have our 73 records it’s time to pull the actual data from those records. And we need specific code to process each type of data. We’ll start with the event location.

First thing we need to do is search each of our “found” records to find the unique key word that will get us our data. For most of the event locations that key word is “event-place d-block”.

Once that record is found we determine where in the string it begins. This will be our starting point. And knowing how long the key word is we can determine where the event location is in the string.

The next thing we need to do is find where a unique key word can be found that will help us determine where the end of the event location is in the string.

Since we know where the event location begins and where it ends, we can subtract the ending position from the starting position to get the length of the event location. This will be different for each event location.

Text Processing Hacking - Part 3

Remember we have that practice session in Sakhir. Those records are different in our notepad file from the actual race sessions. Good thing we checked that out in our data. But that shouldn’t be a problem. At least for the dates. The keywords we need seem to be “start-date” and “end-date” and they seem to be the same in all records.

Again, we’ll need to find the record and get our starting position. In this case since the key word is always the same size our new starting position the current position plus 12 for start date and plus 10 for end date. And we’ll process the start date and the end date separately. The code, however, will be very similar.

We don’t need to worry about the size for the date being extracted as all dates are 2 positions. That makes it simple for us too!

Text Processing Hacking - Part 4

Getting the month requires we do a little extra processing.

Couple of things here. The Sakhir practice session has a different key word from the regular racing sessions. No problem, we get all the racing session first by using their specific key word. To get Sakhir we’ll just use “month-wrapper” as the key word and extract it in the “else” section of our month “IF” statement. It’ll make more sense when you see it.

The extra piece of processing is the month name or names itself. You’ll note that when a race bridges a month, we end up with both month names in the data separated by a hyphen. When it’s just a single month you have 3 characters of data.

Well, you solve this little problem by saving 7 characters of data in all cases. You save that in a work area. Then you examine the 7-character variable. If it contains a hyphen, you want all 7 characters. If there is no hyphen you just want the first 3 characters. Simple as that.

Text Processing Hacking - Part 5

In this video all that hacking that we did to get our calendar data is hidden from the user. We’ll actually change the look of the form so that all the user sees is the completed calendar. We’ll use a number of string methods in this video including padding our variables with blanks to make them the same size.

You'll note that the code for this project is available here in pdf format.