<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: If I had a hammer</title>
	<atom:link href="http://bit-player.org/2009/if-i-had-a-hammer/feed" rel="self" type="application/rss+xml" />
	<link>http://bit-player.org/2009/if-i-had-a-hammer</link>
	<description>An amateur's outlook on computation and mathematics.</description>
	<pubDate>Thu, 17 May 2012 09:50:56 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.3</generator>
		<item>
		<title>By: Matt</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1970</link>
		<dc:creator>Matt</dc:creator>
		<pubDate>Thu, 26 Feb 2009 00:41:24 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1970</guid>
		<description>Hi,
I'm not sure what you're trying to do exactly, but if it involved doing it lots of times the first tool I'd turn to is Python.</description>
		<content:encoded><![CDATA[<p>Hi,<br />
I&#8217;m not sure what you&#8217;re trying to do exactly, but if it involved doing it lots of times the first tool I&#8217;d turn to is Python.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave In Tucson</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1945</link>
		<dc:creator>Dave In Tucson</dc:creator>
		<pubDate>Sat, 14 Feb 2009 02:55:03 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1945</guid>
		<description>When you get to documents that large, the editing tool you want is called #!/usr/bin/perl.

D&#8712;T</description>
		<content:encoded><![CDATA[<p>When you get to documents that large, the editing tool you want is called #!/usr/bin/perl.</p>
<p>D&isin;T</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: brian</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1943</link>
		<dc:creator>brian</dc:creator>
		<pubDate>Fri, 13 Feb 2009 19:26:09 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1943</guid>
		<description>@Carl Witty: Sorry for the confusion. Yes, I started out with ~30MB files, but there were five of them to be concatenated. This is image data in ASCII format. Each pixel consists of one to five decimal digits, possibly with a minus sign, padded with spaces to fill a field of seven characters. 7200 pixels per row, 3,000 rows in the assembled file.

I was trying this with Aquamacs, an OS X port of Emacs.

Version info: "GNU Emacs 22.3.2 (i386-apple-darwin9.5.0, Carbon Version 1.6.0) of 2009-01-11 on plume.sr.unh.edu - Aquamacs Distribution 1.6"

I tried several times, with results that weren't entirely consistent. In general, I could do anything I wanted near the beginning of the file, but getting to the end of the buffer (either by scrolling or by M-&gt;) was very slow. Mode was "Text" (also tried "Text Wrap", but that was worse). 

As for www.gravatar.com, thanks for alerting me. I had no idea. I'll see if I can figure out what's up, and fix it.</description>
		<content:encoded><![CDATA[<p>@Carl Witty: Sorry for the confusion. Yes, I started out with ~30MB files, but there were five of them to be concatenated. This is image data in ASCII format. Each pixel consists of one to five decimal digits, possibly with a minus sign, padded with spaces to fill a field of seven characters. 7200 pixels per row, 3,000 rows in the assembled file.</p>
<p>I was trying this with Aquamacs, an OS X port of Emacs.</p>
<p>Version info: &#8220;GNU Emacs 22.3.2 (i386-apple-darwin9.5.0, Carbon Version 1.6.0) of 2009-01-11 on plume.sr.unh.edu - Aquamacs Distribution 1.6&#8243;</p>
<p>I tried several times, with results that weren&#8217;t entirely consistent. In general, I could do anything I wanted near the beginning of the file, but getting to the end of the buffer (either by scrolling or by M->) was very slow. Mode was &#8220;Text&#8221; (also tried &#8220;Text Wrap&#8221;, but that was worse). </p>
<p>As for <a href="http://www.gravatar.com" rel="nofollow">http://www.gravatar.com</a>, thanks for alerting me. I had no idea. I&#8217;ll see if I can figure out what&#8217;s up, and fix it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Carl Witty</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1942</link>
		<dc:creator>Carl Witty</dc:creator>
		<pubDate>Fri, 13 Feb 2009 17:28:22 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1942</guid>
		<description>Hmm... "30 megabytes", "50K characters per line", and "a few thousand lines" are not mutually consistent.

I just did an experiment with Emacs on a file of the described size (600 lines of 50K characters per line == 30 megabytes, assuming that "a few thousand lines" was the incorrect part), and it seemed fine to me.  Typing occurs at normal speed; scrolling is occasionally a little slow, but never took more than a couple of seconds.

This is the Debian emacs package 22.2+2-5, running on x86 Linux.

My file consisted mostly of lots of 'a' characters; maybe the contents of your file matter?  Or maybe you ended up in some major mode that was trying to do syntax highlighting, or some other "clever" functionality?  My file was in Fundamental mode.

In fact, editing that big file in emacs was faster and more responsive than editing this comment.  Part of the problem here seems to be that it's doing a HTTP request to www.gravatar.com after every keystroke; is that intentional?</description>
		<content:encoded><![CDATA[<p>Hmm&#8230; &#8220;30 megabytes&#8221;, &#8220;50K characters per line&#8221;, and &#8220;a few thousand lines&#8221; are not mutually consistent.</p>
<p>I just did an experiment with Emacs on a file of the described size (600 lines of 50K characters per line == 30 megabytes, assuming that &#8220;a few thousand lines&#8221; was the incorrect part), and it seemed fine to me.  Typing occurs at normal speed; scrolling is occasionally a little slow, but never took more than a couple of seconds.</p>
<p>This is the Debian emacs package 22.2+2-5, running on x86 Linux.</p>
<p>My file consisted mostly of lots of &#8216;a&#8217; characters; maybe the contents of your file matter?  Or maybe you ended up in some major mode that was trying to do syntax highlighting, or some other &#8220;clever&#8221; functionality?  My file was in Fundamental mode.</p>
<p>In fact, editing that big file in emacs was faster and more responsive than editing this comment.  Part of the problem here seems to be that it&#8217;s doing a HTTP request to <a href="http://www.gravatar.com" rel="nofollow">http://www.gravatar.com</a> after every keystroke; is that intentional?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jess</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1941</link>
		<dc:creator>Jess</dc:creator>
		<pubDate>Fri, 13 Feb 2009 15:49:24 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1941</guid>
		<description>Oooh...  50k characters per line might disqualify the line-oriented command line tools.  Is this truly binary data, or just character data that doesn't care about line breaks?  If the latter, you can probably just insert a bunch of line breaks and use that.

If it's really binary data and line breaks are an artifact of interpreting it as character data (adding breaks would change the data in that case), then you might want to memory-map the files.  See mmap() on POSIX systems, or in python.  I think java has something similar as well.</description>
		<content:encoded><![CDATA[<p>Oooh&#8230;  50k characters per line might disqualify the line-oriented command line tools.  Is this truly binary data, or just character data that doesn&#8217;t care about line breaks?  If the latter, you can probably just insert a bunch of line breaks and use that.</p>
<p>If it&#8217;s really binary data and line breaks are an artifact of interpreting it as character data (adding breaks would change the data in that case), then you might want to memory-map the files.  See mmap() on POSIX systems, or in python.  I think java has something similar as well.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: brian</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1940</link>
		<dc:creator>brian</dc:creator>
		<pubDate>Fri, 13 Feb 2009 14:22:45 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1940</guid>
		<description>Thanks for all the helpful suggestions. It's interesting that so many of the recommended solutions come from the Software Antiques Roadshow. Interesting but not entirely surprising. Programs like sed and vi come from an era when memory was a scarce resource, and so it had to be used efficiently.

About emacs: It's what I tried first. I was able to load large files into a buffer without much fuss, but insertion, deletion and scrolling were excruciatingly slow (many minutes per character). I think the problem may be that emacs expects a file to be broken into lines of reasonable length. The files I received had only a few thousand lines, but roughly 50,000 characters per line.</description>
		<content:encoded><![CDATA[<p>Thanks for all the helpful suggestions. It&#8217;s interesting that so many of the recommended solutions come from the Software Antiques Roadshow. Interesting but not entirely surprising. Programs like sed and vi come from an era when memory was a scarce resource, and so it had to be used efficiently.</p>
<p>About emacs: It&#8217;s what I tried first. I was able to load large files into a buffer without much fuss, but insertion, deletion and scrolling were excruciatingly slow (many minutes per character). I think the problem may be that emacs expects a file to be broken into lines of reasonable length. The files I received had only a few thousand lines, but roughly 50,000 characters per line.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jess</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1939</link>
		<dc:creator>Jess</dc:creator>
		<pubDate>Fri, 13 Feb 2009 03:08:58 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1939</guid>
		<description>Yeah, no matter how "powerful" the visual editor, there will be some file size that makes it puke (especially if it's loading the whole thing in memory).  In addition, no matter how great the macro system, there will be some repetition required.

That's why everyone is recommending unix's line-editing tools.  head, tail, cat, grep, cut, paste, sort, uniq, tr, etc. are all useful piped together in particular situations.  But if you want a hammer, familiarize your self with sed.  Every file will become a nail. (this sort of assumes a familiarity with regular expressions)

Alternatively you can use your preferred scripting language, although that process typically ends up being "heavier" than tools and pipes on the command line.  I'd suggest python, but perl and awk have also been popular.</description>
		<content:encoded><![CDATA[<p>Yeah, no matter how &#8220;powerful&#8221; the visual editor, there will be some file size that makes it puke (especially if it&#8217;s loading the whole thing in memory).  In addition, no matter how great the macro system, there will be some repetition required.</p>
<p>That&#8217;s why everyone is recommending unix&#8217;s line-editing tools.  head, tail, cat, grep, cut, paste, sort, uniq, tr, etc. are all useful piped together in particular situations.  But if you want a hammer, familiarize your self with sed.  Every file will become a nail. (this sort of assumes a familiarity with regular expressions)</p>
<p>Alternatively you can use your preferred scripting language, although that process typically ends up being &#8220;heavier&#8221; than tools and pipes on the command line.  I&#8217;d suggest python, but perl and awk have also been popular.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David F.</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1938</link>
		<dc:creator>David F.</dc:creator>
		<pubDate>Fri, 13 Feb 2009 00:00:10 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1938</guid>
		<description>I've had emacs baulk at opening files of this size (I seem to have had better results with vi, but haven't tried any serious comparisons). I'm usually trying to pull something out from near the end, rather than editing it, so I tend to use tail to cut off a smaller section to open in emacs.

I second the use of traditional unix text tools. There's an awful lot of power in their simplicity.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve had emacs baulk at opening files of this size (I seem to have had better results with vi, but haven&#8217;t tried any serious comparisons). I&#8217;m usually trying to pull something out from near the end, rather than editing it, so I tend to use tail to cut off a smaller section to open in emacs.</p>
<p>I second the use of traditional unix text tools. There&#8217;s an awful lot of power in their simplicity.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: I. J. Kennedy</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1935</link>
		<dc:creator>I. J. Kennedy</dc:creator>
		<pubDate>Thu, 12 Feb 2009 16:20:46 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1935</guid>
		<description>I work with large text files and use Large Text File Viewer from swiftgear.com.  
Find it here: http://www.swiftgear.com/ltfviewer/features.html</description>
		<content:encoded><![CDATA[<p>I work with large text files and use Large Text File Viewer from swiftgear.com.<br />
Find it here: <a href="http://www.swiftgear.com/ltfviewer/features.html" rel="nofollow">http://www.swiftgear.com/ltfviewer/features.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Derek R</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1934</link>
		<dc:creator>Derek R</dc:creator>
		<pubDate>Thu, 12 Feb 2009 15:51:04 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1934</guid>
		<description>I agree with Tophe. In general, you can use the unix text utilities to process gigantic files. They're mostly line oriented, so they don't load the entire file into memory. Some useful apps are grep, cut, tail, head, sed, tr, cat, tac, etc.</description>
		<content:encoded><![CDATA[<p>I agree with Tophe. In general, you can use the unix text utilities to process gigantic files. They&#8217;re mostly line oriented, so they don&#8217;t load the entire file into memory. Some useful apps are grep, cut, tail, head, sed, tr, cat, tac, etc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tophe</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1933</link>
		<dc:creator>Tophe</dc:creator>
		<pubDate>Thu, 12 Feb 2009 12:44:10 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1933</guid>
		<description>Oops, I missed the -q on the command line, the one above will prepend the original filename to each section in the output.</description>
		<content:encoded><![CDATA[<p>Oops, I missed the -q on the command line, the one above will prepend the original filename to each section in the output.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tophe</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1932</link>
		<dc:creator>Tophe</dc:creator>
		<pubDate>Thu, 12 Feb 2009 12:40:34 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1932</guid>
		<description>For things like removing headers from many large files, I prefer to just use the unix commands &lt;code&gt;tail&lt;/code&gt; and &lt;code&gt;cat&lt;/code&gt;. They perform not much worse than a file copy, which is essentially what you are doing.

For what you describe, a command similar to this would work:

&lt;code&gt;tail --lines=+5  *.dat &#62; output.result&lt;/code&gt;

It just copies all dat files in the current directory into the &lt;code&gt;output.result&lt;/code&gt; file, leaving out the first 4 lines of each file. (+5 means to start outputing the fifth line)</description>
		<content:encoded><![CDATA[<p>For things like removing headers from many large files, I prefer to just use the unix commands <code>tail</code> and <code>cat</code>. They perform not much worse than a file copy, which is essentially what you are doing.</p>
<p>For what you describe, a command similar to this would work:</p>
<p><code>tail --lines=+5  *.dat &gt; output.result</code></p>
<p>It just copies all dat files in the current directory into the <code>output.result</code> file, leaving out the first 4 lines of each file. (+5 means to start outputing the fifth line)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: MCH</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1931</link>
		<dc:creator>MCH</dc:creator>
		<pubDate>Thu, 12 Feb 2009 12:13:38 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1931</guid>
		<description>I've always used hex editors to handle big/huge files (as they typically only read the portion of the file you're working on and not the whole thing) but that works best if you're willing to make minor changes. I think among text editors, Emacs is by far the best choice if one is concerned about efficiency.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve always used hex editors to handle big/huge files (as they typically only read the portion of the file you&#8217;re working on and not the whole thing) but that works best if you&#8217;re willing to make minor changes. I think among text editors, Emacs is by far the best choice if one is concerned about efficiency.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Zac</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1929</link>
		<dc:creator>Zac</dc:creator>
		<pubDate>Thu, 12 Feb 2009 07:07:04 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1929</guid>
		<description>I don't know if I've ever loaded a file *quite* that big, but I would expect that vim or vi would work fine ... never had a problem with vim on big files so far.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t know if I&#8217;ve ever loaded a file *quite* that big, but I would expect that vim or vi would work fine &#8230; never had a problem with vim on big files so far.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Cowan</title>
		<link>http://bit-player.org/2009/if-i-had-a-hammer#comment-1927</link>
		<dc:creator>John Cowan</dc:creator>
		<pubDate>Thu, 12 Feb 2009 05:58:07 +0000</pubDate>
		<guid isPermaLink="false">http://bit-player.org/?p=271#comment-1927</guid>
		<description>Emacs.</description>
		<content:encoded><![CDATA[<p>Emacs.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

