<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'><id>tag:blogger.com,1999:blog-8039905363673116148.post5603344493382027040..comments</id><updated>2007-10-25T02:32:23.286-07:00</updated><title type='text'>Comments on comments on Andrew Dalke's writings: comments on Wide Finder</title><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://dalkescientific.blogspot.com/feeds/5603344493382027040/comments/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default'/><link rel='alternate' type='text/html' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html'/><author><name>Andrew Dalke</name><uri>http://www.blogger.com/profile/17091314849699854287</uri><email>noreply@blogger.com</email></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>8</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-8039905363673116148.post-1274627280144600415</id><published>2007-10-25T02:32:00.000-07:00</published><updated>2007-10-25T02:32:00.000-07:00</updated><title type='text'>But I'm not a game developer.  :)</title><content type='html'>But I'm not a game developer.  :)</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/1274627280144600415'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/1274627280144600415'/><link rel='alternate' type='text/html' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html?showComment=1193304720000#c1274627280144600415' title=''/><author><name>Andrew Dalke</name><uri>http://www.blogger.com/profile/17091314849699854287</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='11273525496096012439'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html' ref='tag:blogger.com,1999:blog-8039905363673116148.post-5603344493382027040' source='http://www.blogger.com/feeds/8039905363673116148/posts/default/5603344493382027040' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-8039905363673116148.post-1178614553691698203</id><published>2007-10-25T01:24:00.000-07:00</published><updated>2007-10-25T01:24:00.000-07:00</updated><title type='text'>Game developers who live and die on disk access sp...</title><content type='html'>Game developers who live and die on disk access speed have known for years that careful sequential reading is faster than mmap - this shouldn't surprise you.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/1178614553691698203'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/1178614553691698203'/><link rel='alternate' type='text/html' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html?showComment=1193300640000#c1178614553691698203' title=''/><author><name>Anonymous</name><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html' ref='tag:blogger.com,1999:blog-8039905363673116148.post-5603344493382027040' source='http://www.blogger.com/feeds/8039905363673116148/posts/default/5603344493382027040' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-8039905363673116148.post-6840595887809733717</id><published>2007-10-11T09:16:00.000-07:00</published><updated>2007-10-11T09:16:00.000-07:00</updated><title type='text'>I managed to shave a third off the command line ve...</title><content type='html'>I managed to shave a third off the command line version which would make it the fastest if your result was scaled by the same amount:&lt;BR/&gt; http://paddy3118.blogspot.com/2007/10/wide-finder-on-command-line.html&lt;BR/&gt;&lt;BR/&gt;The main time saver was to use gawk to do the counting instead of sort+uniq.&lt;BR/&gt;&lt;BR/&gt;- Paddy.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/6840595887809733717'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/6840595887809733717'/><link rel='alternate' type='text/html' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html?showComment=1192119360000#c6840595887809733717' title=''/><author><name>Paddy3118</name><uri>http://www.blogger.com/profile/06899509753521482267</uri><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html' ref='tag:blogger.com,1999:blog-8039905363673116148.post-5603344493382027040' source='http://www.blogger.com/feeds/8039905363673116148/posts/default/5603344493382027040' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-8039905363673116148.post-7659432365734189008</id><published>2007-10-11T01:30:00.000-07:00</published><updated>2007-10-11T01:30:00.000-07:00</updated><title type='text'>Thanks for the nice article, Andrew !mxTextTools' ...</title><content type='html'>Thanks for the nice article, Andrew !&lt;BR/&gt;&lt;BR/&gt;&lt;A HREF="http://www.egenix.com/products/python/mxBase/mxTextTools/" REL="nofollow"&gt;mxTextTools&lt;/A&gt;' support for the buffer interface was removed when adding the Unicode support. We will re-add it again in one of the future versions.&lt;BR/&gt;&lt;BR/&gt;BTW: You can achieve further speedups in the dalke-wf-9.py version by using the new mxTextTools CharSet() for DIGITS, ie. DIGITS = CharSet('0123456789'). &lt;BR/&gt;&lt;BR/&gt;IsIn will do a sequential search through the set string, while CharSet() uses a bitmap for faster access. You then use IsInCharSet instead of IsIn.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/7659432365734189008'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/7659432365734189008'/><link rel='alternate' type='text/html' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html?showComment=1192091400000#c7659432365734189008' title=''/><author><name>Marc-Andre Lemburg</name><uri>http://www.egenix.com/</uri><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html' ref='tag:blogger.com,1999:blog-8039905363673116148.post-5603344493382027040' source='http://www.blogger.com/feeds/8039905363673116148/posts/default/5603344493382027040' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-8039905363673116148.post-5681326566986791086</id><published>2007-10-09T23:49:00.000-07:00</published><updated>2007-10-09T23:49:00.000-07:00</updated><title type='text'>The Macs seem to perform pretty badly on the paral...</title><content type='html'>The Macs seem to perform pretty badly on the parallel disk access tests; wf-5 and wf-6 doesn't seem to buy you much over a single-threaded program on the Mac, but gives major speedups on multicore Windows and Unix boxes.&lt;BR/&gt;&lt;BR/&gt;(and for the record, the egrep/awk pipeline is the fastest also on Windows, but the result doesn't look entirely correct:&lt;BR/&gt;&lt;BR/&gt;(c:\bin\uniq.exe 1000) Exception trapped!&lt;BR/&gt;(c:\bin\uniq.exe 1000) exception C0000005 at 10011E58&lt;BR/&gt;(c:\bin\uniq.exe 1000) exception: ax 0 bx 0 cx FFFFFFFF dx 0&lt;BR/&gt;(c:\bin\uniq.exe 1000) exception: si 0 di 240EED0 bp 240EE84 sp 240EE7C&lt;BR/&gt;(c:\bin\uniq.exe 1000) exception is: STATUS_ACCESS_VIOLATION&lt;BR/&gt;&lt;BR/&gt;;-)</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/5681326566986791086'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/5681326566986791086'/><link rel='alternate' type='text/html' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html?showComment=1191998940000#c5681326566986791086' title=''/><author><name>Fredrik</name><uri>http://www.blogger.com/profile/10634415660188501055</uri><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html' ref='tag:blogger.com,1999:blog-8039905363673116148.post-5603344493382027040' source='http://www.blogger.com/feeds/8039905363673116148/posts/default/5603344493382027040' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-8039905363673116148.post-2048067376792009116</id><published>2007-10-09T19:54:00.000-07:00</published><updated>2007-10-09T19:54:00.000-07:00</updated><title type='text'>I'm not surprised that the command line version us...</title><content type='html'>I'm not surprised that the command line version using egrep/awk/sort/uniq is one of the fastest. These programs are in the UNIX tradition of doing one or a few things well. They also divide the problem into nicely modular pieces that are easy to debug.&lt;BR/&gt;&lt;BR/&gt;Nice post,&lt;BR/&gt;James Thiele</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/2048067376792009116'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/2048067376792009116'/><link rel='alternate' type='text/html' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html?showComment=1191984840000#c2048067376792009116' title=''/><author><name>James</name><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html' ref='tag:blogger.com,1999:blog-8039905363673116148.post-5603344493382027040' source='http://www.blogger.com/feeds/8039905363673116148/posts/default/5603344493382027040' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-8039905363673116148.post-2303753248641232102</id><published>2007-10-09T01:08:00.000-07:00</published><updated>2007-10-09T01:08:00.000-07:00</updated><title type='text'>Tim's original regexp doesn't check to see if the ...</title><content type='html'>Tim's original regexp doesn't check to see if the match was in the correct field, and pretty much everyone is keeping with that assumption.&lt;BR/&gt;&lt;BR/&gt;I don't know if the referer field goes through a normalization set (eg, " " to "+", and using % escapes), so I would use the user-agent field.&lt;BR/&gt;&lt;BR/&gt;Give it a go and see if Tim mentions it in the future!  :)</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/2303753248641232102'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/2303753248641232102'/><link rel='alternate' type='text/html' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html?showComment=1191917280000#c2303753248641232102' title=''/><author><name>Andrew Dalke</name><uri>http://www.blogger.com/profile/17091314849699854287</uri><email>noreply@blogger.com</email><gd:extendedProperty xmlns:gd='http://schemas.google.com/g/2005' name='OpenSocialUserId' value='11273525496096012439'/></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html' ref='tag:blogger.com,1999:blog-8039905363673116148.post-5603344493382027040' source='http://www.blogger.com/feeds/8039905363673116148/posts/default/5603344493382027040' type='text/html'/></entry><entry><id>tag:blogger.com,1999:blog-8039905363673116148.post-6070761655297268717</id><published>2007-10-08T05:45:00.000-07:00</published><updated>2007-10-08T05:45:00.000-07:00</updated><title type='text'>Interesting article, thanks.Is it possible that th...</title><content type='html'>Interesting article, thanks.&lt;BR/&gt;&lt;BR/&gt;Is it possible that the pattern may occur more than once per line?  Early versions of Fredrik's script only counted the first occurence per line before he switched to findall().  I don't know if it's possible for the Referer [sic] field in the log file to contain the pattern, either naturally or maliciously.&lt;BR/&gt;&lt;BR/&gt;Cheers, Ralph Corderoy.</content><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/6070761655297268717'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8039905363673116148/5603344493382027040/comments/default/6070761655297268717'/><link rel='alternate' type='text/html' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html?showComment=1191847500000#c6070761655297268717' title=''/><author><name>Ralph</name><uri>http://www.blogger.com/profile/13140975971019765573</uri><email>noreply@blogger.com</email></author><thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0' href='http://dalkescientific.blogspot.com/2007/10/comments-on-wide-finder.html' ref='tag:blogger.com,1999:blog-8039905363673116148.post-5603344493382027040' source='http://www.blogger.com/feeds/8039905363673116148/posts/default/5603344493382027040' type='text/html'/></entry></feed>