tag:blogger.com,1999:blog-5905235103449148751.post3755194330000562971..comments2015-07-21T00:56:58.255-07:00Comments on Code Injection: The Pumping Lemma, and Why It's Slightly More Important Than You Might Have ThoughtRoberthttp://www.blogger.com/profile/03622901804239494322noreply@blogger.comBlogger14125tag:blogger.com,1999:blog-5905235103449148751.post-53110684149163957912012-10-02T10:26:15.963-07:002012-10-02T10:26:15.963-07:00I very much appreciate your description of the pum...I very much appreciate your description of the pumping lemma, I think it finally closed the gap that I was having in understanding it.<br /><br />That said I take a little issue with your Regex vs HTML and reinventing the wheel comments. Yes technically you are very correct and Regex is not the tool for general HTML parsing, a proper SAX or DOM style parser is much appropriate. However, that does not mean that those are always the right tool for the job. If the job is not general parsing but is much more specific, often a Regex can accomplish the task much more effectively than the use of a SAX or DOM parser. That is a regex is more than sufficient for identifying a tag within and HTML document with a specific attribute. This can be coded quickly and performs better than either a SAX or DOM style implementation. But then I could implement a finite state machine for this situation, because I have reduced the complexity to a specific tag and attribute and not any/all tag(s) any/all attribute.AaronMhttp://www.blogger.com/profile/13741516614890257311noreply@blogger.comtag:blogger.com,1999:blog-5905235103449148751.post-34907389533685478232012-07-04T22:15:30.776-07:002012-07-04T22:15:30.776-07:00theoretically, you are absolutely right.. but sinc...theoretically, you are absolutely right.. but since i have my first long exams tomorrow, then i am definitely in need of some examples on proving itself. but all the same, i still found this post useful. thumbs up for you! :)sura amilbaharhttp://www.blogger.com/profile/00852807367947302507noreply@blogger.comtag:blogger.com,1999:blog-5905235103449148751.post-90274793330489685102011-03-06T19:12:43.766-08:002011-03-06T19:12:43.766-08:00It's about time that CS texts are updated with...It's about time that CS texts are updated with the term 'Perl-regular'. Perl-regular expressions should be able to parse HTML.Harry Simonshttp://www.blogger.com/profile/04125462847996013122noreply@blogger.comtag:blogger.com,1999:blog-5905235103449148751.post-8719673576983770232011-02-11T08:43:53.227-08:002011-02-11T08:43:53.227-08:00Ok thanks. Great article btw.Ok thanks. Great article btw.matthttp://www.blogger.com/profile/00775737974381795824noreply@blogger.comtag:blogger.com,1999:blog-5905235103449148751.post-17364497476130619522011-02-10T15:44:07.923-08:002011-02-10T15:44:07.923-08:00matt, the computer on your desk is a Von Neumann a...matt, the computer on your desk is a Von Neumann architecture machine. From a theoretical perspective, it can be thought of as an approximation of a turing machine. Although it is technically correct that the finite amount of memory implies that it is possible to model your machine as a finite autmaton, it's not really practical. The amount of memory is so large that the number of states the machine can be in is astronomical.Roberthttp://www.blogger.com/profile/03622901804239494322noreply@blogger.comtag:blogger.com,1999:blog-5905235103449148751.post-14776695034357918242011-02-10T14:26:34.649-08:002011-02-10T14:26:34.649-08:00Thanks Yann and Gwenhwyfaer for clearing up the Tu...Thanks Yann and Gwenhwyfaer for clearing up the Turing machine/FSM distinction. I was thinking about this in terms of actual computers like the one on my desk, which of course do not have an infinite tape. So they are technically FSMs not Turing machines I believe. Which is to say that no actual computer could parse (i.e. correctly determine the grammaticality of) an HTML file that is sufficiently large, right? Though that would have to be a pretty large HTML file.matthttp://www.blogger.com/profile/00775737974381795824noreply@blogger.comtag:blogger.com,1999:blog-5905235103449148751.post-50005365096286224822011-02-10T12:58:35.743-08:002011-02-10T12:58:35.743-08:00This is all correct, from a theoretical stand poin...This is all correct, from a theoretical stand point.<br /><br />However, modern "regular expressions" (e.g. those found in Perl, Python, PHPO, etc.) are *not* regular expressions. They're more general. <br /><br />I can't remember exactly what class they're equivalent to, but they are more general due to their extra features, such as (?...), (?=...), (?(..)then|else). It may be that they're Turing Complete!Tinned_Tunahttp://www.blogger.com/profile/06837452458119663813noreply@blogger.comtag:blogger.com,1999:blog-5905235103449148751.post-67098271272014825422011-02-10T12:38:15.333-08:002011-02-10T12:38:15.333-08:00@matt: No, a computer program can implement a FSM,...@matt: No, a computer program can implement a FSM, but it can also implement anything up to a touring machine. if I understood this correctly. The type of automaton necessary to parse the language (a^n)(b^n) is called a push-down automaton and works like a series of operations on a stack. There is a second pumping lemma and any language to which it applies requires a touring machine to be parsed. Right?Yannhttp://www.blogger.com/profile/09484153347759731806noreply@blogger.comtag:blogger.com,1999:blog-5905235103449148751.post-3474883829426433852011-02-10T12:36:12.465-08:002011-02-10T12:36:12.465-08:00No, matt. A Turing machine has memory (the infinit...No, matt. A Turing machine has memory (the infinite tape), and a finite state machine doesn't; computers are equivalent to Turing machines, not FSMs. And (a^n)(b^n) is trivially parseable if you can keep a tally of how many a's you've seen.Gwenhwyfaerhttp://www.blogger.com/profile/03775254923855147509noreply@blogger.comtag:blogger.com,1999:blog-5905235103449148751.post-82333900737161508012011-02-10T11:45:54.635-08:002011-02-10T11:45:54.635-08:00Me not understand. If regular expressions are equ...Me not understand. If regular expressions are equivalent to FSMs, then how are HTML parses possible at all? Doesn't any computer program by its very nature implement a FSM?matthttp://www.blogger.com/profile/00775737974381795824noreply@blogger.comtag:blogger.com,1999:blog-5905235103449148751.post-9621797930326056442011-02-10T11:03:51.661-08:002011-02-10T11:03:51.661-08:00Instead of changing the diagram you can modify the...Instead of changing the diagram you can modify the regular expression for your language to be: ((a^n)(b^n))^k so that it accommodates strings like 'abab' and 'aabbaaaabbb'.Jayhttp://www.blogger.com/profile/01660735087788121492noreply@blogger.comtag:blogger.com,1999:blog-5905235103449148751.post-65589228571313962692011-02-10T10:42:51.929-08:002011-02-10T10:42:51.929-08:00Hey la timos, you are correct about the diagrams a...Hey la timos, you are correct about the diagrams accepting additional strings outside of the language making them incorrect, unfortunately the diagrams get significantly more complex when accounting for that problem. For this simple explantion I'm leaving them as is, I think they are sufficient to demonstrate the concept of pumping (just don't try to copy them for your homework!). If I can think of a way to make the diagrams account for this problem will still being easy to read I'll be sure to update them.Roberthttp://www.blogger.com/profile/03622901804239494322noreply@blogger.comtag:blogger.com,1999:blog-5905235103449148751.post-13594567207265287862011-02-10T10:13:49.702-08:002011-02-10T10:13:49.702-08:00Thanks for the overview! I'm studying the proo...Thanks for the overview! I'm studying the proof for the pumping lemma in a Theory of Computation course, and this helps.Tomhttp://www.blogger.com/profile/00190483760906597819noreply@blogger.comtag:blogger.com,1999:blog-5905235103449148751.post-88743509390768565792011-02-10T09:14:52.368-08:002011-02-10T09:14:52.368-08:00whoops, your state machines seem to accept strings...whoops, your state machines seem to accept strings like "ababababababababaaabbb", as well :)la timoshttp://www.blogger.com/profile/08514218440159967492noreply@blogger.com