Monday, June 30, 2008
Zappos gets Kiva Systems drive units
Zappos picked up the Kiva System Drive Units. Zappos already had an incredibly low order to fulfilment cycle. Now with these robots that productivity will increase even faster. The real benefit I see is not only in productivity but also in enabling disabled workers a chance to be employed as the robots do the hard stuff. The article I've linked to also makes an interesting observation. Robots may make it so that people here in the USA can be hired to do these jobs instead of the whole operation shipped overseas. If so that would be remarkable.
Wednesday, June 25, 2008
Do Illiterate People get the Full Effect of Alphabet Soup?
The title comes from a George Carlin joke and in reverence I've used it as a very appropriate title for today's entry.
From the people I spoke with at Text Analytics Summit 2008 it seems that every one gets recall and precsions, some get f-measure and few if any get any other measurement for analyzing the quality of analytics from products. This seems weird to me. First off the f-measure is pretty easy to get. What I find more difficult are defining recall and preceision. In fact it is questions of how to measure those that generally screw people up the most.
Recall: To simplify this consider that a document is full of entities. You have a conceptual set of relevent entities. It is important to make sure that when you go through the document you only find the ones that are actually relevent. For example if you were looking for Populated Place names (PPLs) then you would want to throw out anything that is a personification or adjective. "I'm going to Washington" would be good but "Washington was urged to sign the Kyoto Agreement" would not be. In the second case the entity Washington is the administation of the United States government. So assuming you have identified all of these then the next task is to sum them up. The sum of every hit that you get as a result that matches (with perfect registration) is compared against the total relevent entities and that is your recall. So if there are 10 PPLs and you get 6 of them then your recall is 0.6.
Precision: This is really simple. Take the number of relevent hits you have an divide by all the hits you have. So if you have 6 relevent hits but your total hits are 12 then your precision is 0.5.
Registration: This is where people cheat and fudge numbers. You have to show the instance of the term that was hit to know if you got it right. In the Washington example above if both of those sentences were in the target document then you'd want to know WHICH Washington was picked up. What cheaters will do is note how many Washingtons are relevent and then count the number of hits without checking registration so if there is a false positive it will look like a true positive. Another cheat I've seen is to take any hits on Washington and flatten them - ignoring the counts and just counting that as a true positive. These are real life examples and show you can't just trust the vendor.
So don't let someone scam you with their recall and precision numbers. Ask how they were derived. Don't just accept them as given. Once you have recall and precision then there are two ways you can calculate the f-measure:
1) Unweighted
2RP
---
R+P
2) Weighted
2RPb
----
R+Pb
With the weighted version you put in a value for b between 0.5 and 1.5 and it shifts from prefering recall to prefering precision. It depends on the individual needs on the analysis you are doing.
The point of this post is that you need to know what goes into making an accurate calculation of f-measure. The fact is that if you have someone doing it for you they have to really understand recall and precision. If you take shortcuts you reduce the benefit of the analysis to the point where you start promoting systems that just don't work. If you rely upon the vendor they are likely to sell you a pack of lies. The best approach is to be knowlege able about how to do the measurement and do it yourself or find someone who is skilled at doing it. In the end the security and comfort you get from validating the f-measure will keep you from losing sleep.
From the people I spoke with at Text Analytics Summit 2008 it seems that every one gets recall and precsions, some get f-measure and few if any get any other measurement for analyzing the quality of analytics from products. This seems weird to me. First off the f-measure is pretty easy to get. What I find more difficult are defining recall and preceision. In fact it is questions of how to measure those that generally screw people up the most.
Recall: To simplify this consider that a document is full of entities. You have a conceptual set of relevent entities. It is important to make sure that when you go through the document you only find the ones that are actually relevent. For example if you were looking for Populated Place names (PPLs) then you would want to throw out anything that is a personification or adjective. "I'm going to Washington" would be good but "Washington was urged to sign the Kyoto Agreement" would not be. In the second case the entity Washington is the administation of the United States government. So assuming you have identified all of these then the next task is to sum them up. The sum of every hit that you get as a result that matches (with perfect registration) is compared against the total relevent entities and that is your recall. So if there are 10 PPLs and you get 6 of them then your recall is 0.6.
Precision: This is really simple. Take the number of relevent hits you have an divide by all the hits you have. So if you have 6 relevent hits but your total hits are 12 then your precision is 0.5.
Registration: This is where people cheat and fudge numbers. You have to show the instance of the term that was hit to know if you got it right. In the Washington example above if both of those sentences were in the target document then you'd want to know WHICH Washington was picked up. What cheaters will do is note how many Washingtons are relevent and then count the number of hits without checking registration so if there is a false positive it will look like a true positive. Another cheat I've seen is to take any hits on Washington and flatten them - ignoring the counts and just counting that as a true positive. These are real life examples and show you can't just trust the vendor.
So don't let someone scam you with their recall and precision numbers. Ask how they were derived. Don't just accept them as given. Once you have recall and precision then there are two ways you can calculate the f-measure:
1) Unweighted
2RP
---
R+P
2) Weighted
2RPb
----
R+Pb
With the weighted version you put in a value for b between 0.5 and 1.5 and it shifts from prefering recall to prefering precision. It depends on the individual needs on the analysis you are doing.
The point of this post is that you need to know what goes into making an accurate calculation of f-measure. The fact is that if you have someone doing it for you they have to really understand recall and precision. If you take shortcuts you reduce the benefit of the analysis to the point where you start promoting systems that just don't work. If you rely upon the vendor they are likely to sell you a pack of lies. The best approach is to be knowlege able about how to do the measurement and do it yourself or find someone who is skilled at doing it. In the end the security and comfort you get from validating the f-measure will keep you from losing sleep.
Tuesday, June 24, 2008
Would you buy a used car from this man?
Today, the emphasis on textual data mining is on the breadth of unstructured text that can be reviewed. For example, many tools emphasize the volume of data that can be ingested. However by emphasizing this approach, the quality of the mined data is often ignored. Just because a tool can ingest hundreds of thousands of document within a tractable time period does not mean that the produced results are meaningful, accurate or pertinent. Currently, there are no widely accepted measurement tools that can provide insight to the quality of the mined data, including he integrity of the derived associations, or its usefulness to the end user. Rather, the suppliers of such tools approach these concerns much like the sales pitch of a used car salesman: “Trust me. I personally know that this car was only driven on to church on Sunday's by the sweetest, little old lady you could ever meet.”
The few examples of textual data mining tools that I know are "correct" are so because they have identified a finite lexicon from which they have extracted a known set of associations. These applications are for targeted areas and have limited, if any, broad applicability. The "process" that was implemented in these applications consisted of a brute force analyses of the corpus and observation of the environment from which the corpus was derived. It is not a repeatable process, and as a result, there is no chance of developing an algorithm or quantitative method to provide such analyses. In terms of "correctness," I can state with 100% that for the referenced applications the defined associations across documents are correct. Please note that I have said nothing about completeness. That is, it is unknown if every potential contextual association across documents are identified. One can assume that all such associations cannot be identified a priori.
As classes of textual data mining tools evolve that do not require a fixed lexicon or an a priori set of contextual associations, the need for a repeatable process to demonstrate both correctness and completeness of the derived information is of paramount importance. Without such measures, the end user has no way of knowing the validity of derived information. Similarly, the tool developer has no way to verify the correctness of the extracted data. Until there exists an analytical means to verify and validate a textual data mining process, than I assert that the confidence in the results provides is, at best, questionable.
The few examples of textual data mining tools that I know are "correct" are so because they have identified a finite lexicon from which they have extracted a known set of associations. These applications are for targeted areas and have limited, if any, broad applicability. The "process" that was implemented in these applications consisted of a brute force analyses of the corpus and observation of the environment from which the corpus was derived. It is not a repeatable process, and as a result, there is no chance of developing an algorithm or quantitative method to provide such analyses. In terms of "correctness," I can state with 100% that for the referenced applications the defined associations across documents are correct. Please note that I have said nothing about completeness. That is, it is unknown if every potential contextual association across documents are identified. One can assume that all such associations cannot be identified a priori.
As classes of textual data mining tools evolve that do not require a fixed lexicon or an a priori set of contextual associations, the need for a repeatable process to demonstrate both correctness and completeness of the derived information is of paramount importance. Without such measures, the end user has no way of knowing the validity of derived information. Similarly, the tool developer has no way to verify the correctness of the extracted data. Until there exists an analytical means to verify and validate a textual data mining process, than I assert that the confidence in the results provides is, at best, questionable.
Labels:
lexicon,
textual data mining,
validation,
verification
Monday, June 23, 2008
Learning Robots
Came across the site Learning Robots and was really impressed. From the site:
Some hardwired, pre-programmed robots such as TU Munich's humanoid walking biped and BU Munich's fast robot car perform impressive tasks. But they do not learn like humans do.
So how can we make them learn from experience? Unfortunately, traditional reinforcement learning algorithms are limited to simple reactive behavior and do not work well for realistic robots.
Robot learning in realistic environments requires novel algorithms for learning to identify important events in the stream of sensory inputs, and to temporarily memorize them in adaptive, dynamic, internal states until the memories can help to compute proper control actions.
Correctness and Utility
A theme I've been working on the past few months is about the interplay of correctness and utility. At times there is a tradeoff between the two concepts and I think they deserve discussion. Generally speaking in computer science terms, corretness applies to the amount an algorithm of implemented software compares to a specification. Given a specification for addition, an algorithm that takes 2 and 2 and produces a value of 4 is deemed "correct." What a lot of people have tried in the past with machine learning is to impose a correct model of language on a system and then shoe horned the data into that model. While the results work reasonably well for white papers, they don't for the 99.9% of all other inputs.
The reason for this is because language itself is not correct. In almost all documents, this one included, you will find spelling mistakes, bad diction, bad grammer, neologisms, double negatives, sarcasm, run-on sentences and so many other ills. T33n SMS Sp3@k... You name it, we manage to communicate in spite of the rules of standard language. In fact at times we invent grammer, words and turn things on their ear to communicate more specifically and with more impact than if we had just made statements in standard correct English. Take a look at advertising, literature or even the script they handed Frank Oz when he took on the part of Yoda.
So even if I spell something wrong or perhaps use awkward phrasing can you still make utility out of what I write? Can you still find the essencial meaning of my text? We all know this is essencial for data mining, text analytics and machine learning. We have to overcome human weakness in the way that humans do. We have to be flexible. We have to value utility over correctness because what we have to work with is, itself, not correct.
This leads to another thought which I won't expand upon much here but requires it's own series of articles. When you score a system for its quality of analytics it would be a huge mistake to spare it from having made a mistake due to the text itself being incorrect. The reason why is we need to accept the fact that text will always have mistakes in it. While it is understandable why your system did not get 100% it would be important to rate a system that did get the right relationship more highly.
I'll be writing more of my concepts on quality of analytics as time goes on.
Sunday, June 22, 2008
Educate thy self
Since I've been working in this field for only three years I've had the need to educated myself about my own language. While knowing what a past participle is in real life won't improve your ability to speak or communicate it is important to know what it is when you are trying to teach a comptuer to find important relationships in unstructured documents.
So here are some useful websites that I enjoy and have used to educate myself and as reference:
Well that should get you started. Later on I'll start a blog roll of blogs I think are worthy of merit.
So here are some useful websites that I enjoy and have used to educate myself and as reference:
- Dr. Grammar - a very useful site for improving your understanding of grammar.
- Online Writing Support - Towson University's great resource
- ESL in Canada - generally any ESL sites are fantastic for adults looking to reducate themselves in English and this one is one of my favorites
- Part of Speech Tagging - yeah, gotta have at least one Wikipedia entry. This one is worth going over though!
- UPenn Treebank - you are not involved in part of speech tagging and text analytics if you are not familiar with this project. Seriously.
- Text Analytics Wiki - this is a new one to my collection. It has promise. Give it a look!
- Visuwords - Using the Princeton Wordnet database this is a very visual dictionary/thesaurus. Very useful
- LIWC - an interesting peice of software that I am looking at now. Linguistic Inquiry and Word Count (LIWC) is a text analysis software. LIWC is able to calculate the degree to which people use different categories of words across a wide array of texts.
Well that should get you started. Later on I'll start a blog roll of blogs I think are worthy of merit.
Saturday, June 21, 2008
Trust me, I know what I'm doing...
As I come up with ideas for blog entries, it allows me to look back at the lessons learned during my college years. One memory in particular that lends itself well to this forum was a physics experiment gone bad. The premise of this experiment was to measure the speed of light using lasers and fundamental measuring equipment. Once my lab partner and I performed the rudimentary testing of the individual pieces of equipment to ensure that everything was in working order, we proceeded with the experiment before us. At the end of the lab session, we had unequivocally proven that light crawled at a dismal 10 cm/sec*sec. Without a doubt, our discovery could set the fields of physics and science back centuries. As the teaching assistant peered over our results, he looked at us with total disdain. He informed my lab partner and me that we were to remain in the lab to demonstrate to him how we achieved this remarkable finding.
As the lab room became vacant, the teaching assistant had as walk through the experimental process that we performed that provided our amazing discovery. As we set-up the various lens used to refract the laser beam for measurement, he started shaking his head. My lab partner and I had reversed two of the lenses, and as a result, we were not measuring the refracted light at the appropriate angle! Even though we had painstakingly tested each component of our equipment prior to set-up to ensure its viability, we never considered incrementally testing the set-up during assembly. Rather, we naively believed that since each individual part worked correctly, than so would the assembled creation. To this day, I carry the lesson learned from that day: test early, test often and than test again! If everyone followed this hard learned philosophy, then the world of software development would be a much better place.
As the lab room became vacant, the teaching assistant had as walk through the experimental process that we performed that provided our amazing discovery. As we set-up the various lens used to refract the laser beam for measurement, he started shaking his head. My lab partner and I had reversed two of the lenses, and as a result, we were not measuring the refracted light at the appropriate angle! Even though we had painstakingly tested each component of our equipment prior to set-up to ensure its viability, we never considered incrementally testing the set-up during assembly. Rather, we naively believed that since each individual part worked correctly, than so would the assembled creation. To this day, I carry the lesson learned from that day: test early, test often and than test again! If everyone followed this hard learned philosophy, then the world of software development would be a much better place.
Friday, June 20, 2008
My dog taught me everything I know...
My first serious exposure to machine learning occurred in the fall of 1986 when I took an introductory graduate course in robotics. To this day, I still remember how awe inspiring it was to develop the control language that allowed a robotic arm to pour a sequence of alcohol from bottles to create a mixed drink. It must be noted that the bottles required arrangement in a particular order, but that did not dampen our enthusiasm. We harnessed the cutting edge technology of the day to perform a task that would entertain any college student: we “trained” a machine to create a cocktail.
Now while this feat may not seem very awe inspiring today, it demonstrates a very fundamental principle of machine learning/artificial intelligence: it can never surpass the available technology or separate itself from its dependence on humans. In the twenty odd years since this event, processors have evolved from 8-bit machines to 64-bit machines and beyond. Memory has had an equally impressive evolution. Similarly, robotic arms are now used in many facets of manufacturing in lieu of humans. However, the tasks performed by these machines are still devised and programmed by humans. We still have not harnessed the capability to allow machines to teach other machines how to perform a task, and in turn, demonstrate true artificial intelligence.
Even as technology evolves and allows machines to perform more complex tasks, there is still an intrinsic need for a human to identify the task, to develop a process by which a machine can learn the task, and then determine if the machine can properly perform the task. However, this process can never be undertaken unless the available human-developed technology supports the creation of the needed machine. Similarly, the quality of the task performance by the machine is intrinsically related to the human capability to devise a sufficient training schema. Therefore, I assert that machine learning/artificial intelligence as it stands today is simply a model or collection of models reflecting the beliefs of its creator. This statement should not be taken as ridicule, but rather as a stern rationalization of fact. Furthermore, I assert that this belief should be infused across any application that uses a computer-based system. If we forget the fact that humans are fallible and humans create the machines and processes that support machine learning/artificial intelligence, then we as a society will suffer the consequences. If we recognize this fallibility of human design, then the machine learning/artificial intelligence community at large must be begin to address en masse how to demonstrate that their creations are validated and verified.
Now while this feat may not seem very awe inspiring today, it demonstrates a very fundamental principle of machine learning/artificial intelligence: it can never surpass the available technology or separate itself from its dependence on humans. In the twenty odd years since this event, processors have evolved from 8-bit machines to 64-bit machines and beyond. Memory has had an equally impressive evolution. Similarly, robotic arms are now used in many facets of manufacturing in lieu of humans. However, the tasks performed by these machines are still devised and programmed by humans. We still have not harnessed the capability to allow machines to teach other machines how to perform a task, and in turn, demonstrate true artificial intelligence.
Even as technology evolves and allows machines to perform more complex tasks, there is still an intrinsic need for a human to identify the task, to develop a process by which a machine can learn the task, and then determine if the machine can properly perform the task. However, this process can never be undertaken unless the available human-developed technology supports the creation of the needed machine. Similarly, the quality of the task performance by the machine is intrinsically related to the human capability to devise a sufficient training schema. Therefore, I assert that machine learning/artificial intelligence as it stands today is simply a model or collection of models reflecting the beliefs of its creator. This statement should not be taken as ridicule, but rather as a stern rationalization of fact. Furthermore, I assert that this belief should be infused across any application that uses a computer-based system. If we forget the fact that humans are fallible and humans create the machines and processes that support machine learning/artificial intelligence, then we as a society will suffer the consequences. If we recognize this fallibility of human design, then the machine learning/artificial intelligence community at large must be begin to address en masse how to demonstrate that their creations are validated and verified.
Thursday, June 19, 2008
-1 Days since Machine Uprising
The title of this entry comes from a safety sign on the floor of Kiva Systems where they make very impressive commercial robots (but don't call them robots, they tell me!)
This blog's purpose is to discuss the concepts of machine learning. I've been working in the field of machine learning for 3 years at Digital Reasoning Systems, Inc. I've been in IT professionally since 1989. My tour of Kiva last night was through a friend who I have known for about 4 years but until last night had never met. We play a very popular computer game on xbox over the Internet. What surprised me greatly was that some of the systems used in the video game for designing floor layout for games was actually quite similar but not as powerful as the systems Kiva has for its drive units (*whisper* they are robots!)
The work I do is related to text analytics. DRS does entity extraction but so do 3 dozen other companies I could name. Add in European companies and that number goes up and then add in Asia companies and it expands again. The thing I've been thinking about all day is that eventually machines will respond to our comments, rearranging our space without much in the way of our intervention. Decisions are already being made based upon text analytics to improve the experience of the customer. Assuming we survive the coming energy singularity point and fend off the collapse of society we have a pretty bright future ahead of us. If not, then we will be fighting drive units for every inch of ground. Ok, maybe not but that won't stop Hollywood from speculating.
This blog's purpose is to discuss the concepts of machine learning. I've been working in the field of machine learning for 3 years at Digital Reasoning Systems, Inc. I've been in IT professionally since 1989. My tour of Kiva last night was through a friend who I have known for about 4 years but until last night had never met. We play a very popular computer game on xbox over the Internet. What surprised me greatly was that some of the systems used in the video game for designing floor layout for games was actually quite similar but not as powerful as the systems Kiva has for its drive units (*whisper* they are robots!)
The work I do is related to text analytics. DRS does entity extraction but so do 3 dozen other companies I could name. Add in European companies and that number goes up and then add in Asia companies and it expands again. The thing I've been thinking about all day is that eventually machines will respond to our comments, rearranging our space without much in the way of our intervention. Decisions are already being made based upon text analytics to improve the experience of the customer. Assuming we survive the coming energy singularity point and fend off the collapse of society we have a pretty bright future ahead of us. If not, then we will be fighting drive units for every inch of ground. Ok, maybe not but that won't stop Hollywood from speculating.
Subscribe to:
Posts (Atom)