Sunday, March 8, 2009

Measuring NLP is like Marksmanship

There is an interesting correlation between shooting and NLP improvement - both require mathematical analysis in order to be improved upon. In the case of marksmanship the emphasis is the removal of bias. In NLP the emphasis is the harmonic improvement of recall and precision.

Recall in entity extraction is the finding of correct items in a category. In other aspects of NLP it is the correct networking of terms. In marksmanship one could say that recall is putting the bullet into the X ring - though you still score for being close. Hence the emphasis is really not about hitting the target (it is rare to not hit the target) but more the quality of the hit.

Just for fun lets look at some numbers. These numbers came from my shooting at 100 yards without a brace using a carbine. 2 shots were "flyers" as in totally off the paper. Those are in the 2nd quadrant. I list the quadrants as 1st (upper left), 2nd (upper right), 3rd (lower right) and 4th (lower left). The numbers show I am biased up a bit. This is actually not too unexpected as I know that with the trajectory of the bullets I use and the range to the target they are still rising to their apogee. From the numbers I see I am shooting a little to the right as well.

The other number of value is my average score. I got an 8.88 which means my shots were with the same space as a cantaloupe. Ideally I'd like to be hitting something the size of a baseball. That would probably happen if I used a brace.

Marksmen use other measures as well. Grouping is important. In NLP that would be analogous to clustering.

The point I am trying to make is that the marksman uses a lot more numbers to refine his process. Why aren't we doing the same with NLP? My first forays into the world of marksmanship were without knowledge of the math behind the shooting and when I had groups the size of a medium pizza at 100 yards I wanted to improve my skill. All of these numbers help me to diagnose a wide range or problems. I learned I was able to tell where my problems were (breathing, stance, grip, trigger pull, etc.)

We seem to have a lack of these controls in NLP at the moment. I have been an advocate of using f-measure in the development arena early on so that algorithmic changes can be evaluated. I am also an advocate of having a running f-measure as one trains an entity extraction system. However, this is just a first step. I now advocate that we start to develop better controls through stronger math to help the scientist, developer or trainer to better understand their NLP system and make corrections.

This with have further benefits when the system in question is being automatically trained by another software system. Automating the process would require a strong understanding but would have the advantage of removing random bias that humans tend to add to training.

Target Scoring         
March 8th, 2008
1st Quad
0 7 8 9 10 X Total Score Avg. Score
0 0 4 5 2 0 11 97 8.82
2nd Quad
0 7 8 9 10 X
2 1 2 4 0 1 10 79 7.90
3rd Quad
0 7 8 9 10 X
0 0 2 1 4 0 7 65 9.29
4th Quad
0 7 8 9 10 X
0 0 0 1 1 0 2 19 9.50
Totals 2 1 8 11 7 Bias Up 12 260 8.88
Bias Left -4