Extracting Text Using ArcMap’s Field Calculator

There are a lot of tools in Microsoft Excel I use on a regular basis. I use the LEFT(), RIGHT(), or MID() formulas in Excel often to extract sections of text. For example, I may have a column with addresses, like below, but all I need is the street name.

Click on any image for higher resolution version.

In Excel, it is fairly simple to do that. In this case, we need to find the first space between the number, then return everything after that. Here’s the formula:

=MID(E2,FIND(“ “,E2)+1,256)

This command tells Excel to go to cell E2, find the first space “ “ in that cell, move over one character, and then return that character and all other characters up to 256 characters after it (the maximum for an Excel cell). Here’s what it looks like:

Fairly quick and painless to do.

But what if you want to do the same thing in ESRI’s ArcMap?

You could do what I used to do. Export the attribute table, open that in Excel, work that formula magic, and then import/join the table back to the original layer. It was a bit cumbersome. And I would end up with all these interim files and layers that needed to be cleaned up.

Wouldn’t it be nice to do the text extraction right in ArcMap?

Since around 2004, ArcMap has been released with Python capabilities. And while it hasn’t taken me over 10 years to realize that, it has taken me at least five years to start using Python regularly to do the same task above within ArcMap.

So, while this is old hat to a lot of folks, I’ve only been taking advantage of Python for the past five years. I’ve learned that even if I have to go learn a new function, it is always quicker to go the Python way. Let me show you.

Python .split() function and the re module (and some good old VBA, too)

In ArcMap, you can use Python functions in a number of ways. And like Microsoft Excel formulas, there are several ways we can go at this with Python in ArcMap.

Here’s the problem: We have addresses that contain numbers and text, separated by spaces. All we want is the text part, with its clarifying spaces (but no leading space – the space between the numbers and text).

We could use the .split() function. This is probably the simplest function you can use. Like all Python functions, it operates by assigning the input (our field name surrounded by exclamation points) followed by a period, then calling out the function, then specifying the parameters to base the split on within the parentheses (in our case, a space). If you need to, you can add a number in brackets to specify the index position of the text string to return. In our case, we’d want it to start returning from the second position in the text block (after the numbers), which would be 1 (remember, in Python, the first position of any list is number 0). It would look something like this:

!NAME_OF_FIELD!.split(“ “)[1]

So, for an address like:

71 BOWSPRIT CLOSE (0 position is ‘71’, 1 position is ‘BOWSPRIT’, 2 position is ‘CLOSE’)

We would get this:

BOWSPRIT

Yeah! Success! We got just the street name!

But that’s not what we want. We want:

BOWSPRIT CLOSE

You could make this cumbersome and separate out each word and then join them back up with the Python function .join(). But that would mean creating extra fields and clean up work afterwards. And that’s exactly what we are trying to avoid! While .split() is not the right tool for this particular exercise, it is a very functional function (pun intended) that does work well to extract text so I wanted to go over it here. Keep this in your back pocket for other instances where only one line of code is needed.

However, for this exercise, we’ll want to use a python sample I snagged from the internet. Three simple lines of code takes advantage of the Python module ‘re’ and gets us (almost) what we need.

Let’s go through it step-by-step.

Step 1 – Create a new field

Add the layer you want to alter into ArcMap and open its Attribute Table (right click on layer, then select Open Attribute Table).

Add a new field (column) (click on the Table Options drop down menu, then select Add Field).

Name your new field and select the type needed. For this example, we’ll name it StrtNm and select Text for the type. We can leave field length at the default of 50 characters. If it is not long enough, we can re-do this whole process.

Step 2 – Calculate the field contents

Now we can start the Python magic. Right click on the heading of your new field (column). From the pop-up menu, click on Field Calculator.

At the top of the Field Calculator pop-up dialog box, choose the Python Parser. The functions on the right side of the dialog box will change. Since we want to alter text data, click on the String type to filter the list of functions.

For this exercise, we’ll need the Codeblock, so click on the Show Codeblock option. It should look like this:

NOTE: If you don’t know how to use a particular Python function, you can click on the link About calculating fields. This will open ArcMap’s Help documentation samples. In addition, you can always look it up online. The GIS Stack Exchange has a wealth of samples, which is where I got the one I’m about to show you.

Next, you’ll want to copy/paste the following into the Pre-Logic Script Code box.

import re

def strip_digits(s):

return re.sub(“\d+”, “”, s)

The first line imports a module called ‘re’ which will allow us to use regular expression matching operations similar to Perl. In the second line, we define our new function strip_digits. The magic happens in the third line, which will strip out all the digits (d) from whatever we feed it. I’m not a programmer, so don’t ask me to explain it. Suffice to say it is calling on a re subroutine that strips out digits (\d+).

Trust me and put the following in the result box (the box beneath the Pre-Logic box):

strip_digits(!SitusFrm1!)

This last line calls on the function we just created and applies it to the contents in the address field of interest (in this case SitusFrm1).

Your Field Calculator dialog box should look something like this:

Once you click on OK, your results in your table should look like this:

Woohoo! We’re done!

Right?

Actually, we have one more step. If you look closely to the image above, there’s a leading space on the street names. Let’s get rid of it.

Step 3 – Clean Up Work using VBA

I know, I know. We want to eliminate clean up work, but this is probably the simplest cleanup task you’ll have to do today. Besides, it involves using good old VBA, a programming language on its way out. Might as well use this ancient language while we still can!

Right-click on the field name again and choose Field Calculator again. This time, click on the VB Script Parser. We don’t need the Codeblock for this, so you can click that off. Then click on the String for the function type to filter the function list. Since the space we want to get rid of is on the left leading side, you’ll want to double-click on the LTrim() Function. With your cursor in the parentheses, next double-click on the name of the field we want to trim (in this case, the same one we’re altering). It should look like this:

Click OK and you should have your street name with no leading space.

Viola! All done!

And there are no messy interim files to re-discover months from now and wonder if you should save them or not. 😉

Live long and python.


Looking for more informative and interesting content like this? Click here.