[Cialug] Command line tools can be faster than a Hadoop cluster
Matthew Nuzum
newz at bearfruit.org
Mon Feb 16 15:30:03 CST 2015
Saw this and thought some of you would find it interesting.
Command-line tools can be 235x faster than your Hadoop cluster
<http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html>
The main focus is the explanation of how your pipeline can do parallel
processing. One thing that may be an error (though not relevant to his
point) is that I think using `uniq` stores all the results in memory while
it processes them. This is a side note because he points out another issue
with uniq that precludes him from using it.
--
Matthew Nuzum
newz2000 on freenode, skype, linkedin and twitter
♫ You're never fully dressed without a smile! ♫
More information about the Cialug
mailing list