ATTENTION ALL FANS!!! THIS BLOG HAS MOVED!!!
go to: http://www.taotekaching.com

Showing posts with label binary comparison. Show all posts
Showing posts with label binary comparison. Show all posts

Saturday, March 07, 2009

Duplicate Files, Hash Codes, SQLite, and Me…

My wife’s been getting on my case about having a gazillion different hard drives with everything and our mothers on them all around the house.  I mean, come on everybody, she just wants her pictures in one @%$!# spot!  She also “misused” Picasa, and now has a bunch of duplicates on her laptop (she doesn’t read my blog, so I ain’t worried she’ll read that).

So, out shopping for Little Liam last weekend, and we decide to pop into Circuit City’s closing-its-doors blowout sale.  I grabbed her a 500 GB Western Digital external drive and, when we got home, proceeded immediately on a simple solution to shut her pie hole.

The result: MyPicturesConsolidator!  It is a WYSIWYG image grabber, duplicate detector, and file-copier-consolidator all in one, gorgeous package!

Ok, this program is NOT a work of art, but may contain some good stuff you can use, and it works pretty solidly, so…

mpc

How it works:

First, you select where you want any pictures it finds to get copied to:

mpc_dest

Second, select the logical drive you want to scan for pictures.  I included a Refresh Drive List button for changing between USB drives:

mpc_src

Third, click Find My Pictures!  And you’re good!

Behind the scenes:

I wanted a “list” to be maintained that kept track of files we’ve gone through.  I decided to use a SQLite database that would hold MD5 and SHA1 hashes of the pictures.  A good side effect of this is, just take it with the exe and SQLite dll to another computer along with your destination drive (or network share path, etc.), and the duplicates list maintained in the SQLite db should work golden for you.

MD5 and SHA1 generation is, for lack of a better phrase, retardedly easy via .NET.  An MD5 hash of a file, for instance, can be had in one line of code:

byte[] md5Hash = new System.Security.Cryptography.MD5CryptoServiceProvider().ComputeHash(System.IO.File.ReadAllBytes(filename));

The code is here.  Go ahead and take a look.  There’s some dumb things I’m doing in there that deal with my wife’s needs (i.e. Picasa uses file creation dates, ergo I try to find the earliest for her when I can, etc.).

~ZagNut

Submit this story to DotNetKicks

Tuesday, February 24, 2009

Binary Data Transforms, Hex Editing, Design Patterns, and Me…

Lately for work I have been responsible for developing a .NET layer to read from and write to an existing application's binary data files.  The task has proven tedious, but not necessarily boring, as I've discovered some invaluable tools and development principles from this.

First off, the HxD hex editor, and, more importantly, comparison tool is simply perfect for the job.

hxd

Many minute differences in my binary output were discovered with this tool.  By using the comparison feature:

hxd-compare

I could determine the first instance of a difference in output and, if unexpected, use the location from HxD to quickly narrow down the area of code responsible.

hxd-compare2

In my code, a Stream is passed to a method that will populate a struct representing the file.  This Stream is put into a BinaryReader for sequential type-based reading of the file.   By simply adding conditional breakpoints at various places in the code, locating the problem was easy.  The condition of the breakpoint would be something like:

myBinReader.BaseStream.Position > 10000

where 10000 would be just below the difference location given by HxD.

hxd-compare3

So this greatly, greatly, very much helped speed up the process of making sure I was reading and writing the native binary file format out correctly.  Next, I was to transform some of this data to XML for use by another application.  I already had the struct of native data, and wanted to make the transformation to and from XML as stupidly simple as possible.  To do this, I threw down a bunch of different classes, each representing the XML element I was to turn out, and had the elements inherit a base interface with a ToXML() contract method.  Some of the “element” classes contained List<>s of the other “element” classes, so when churning out the XML, it was as simple as doing a foreach and callling ToXML() from each of those to produce my child nodes correctly.

I very much realize this solution is neither new nor ingenious.  I am using it, in fact, as a tangent off into a discussion on this patterns-war stemming from the Spolsky comment on the SOLID method.  Quite frankly, I’m not sure what pattern or patterns I implemented above.  The Proxy or Facade?  I would like to know, as I’ve used this technique of a sort of “translation class” a bazillion times, except my bosses really don’t provide me much time to learn about it, much less, say, spend time with my family (I know you gobblers are reading this).  However, I have an opinion on this patterns-war that I would love some feedback on.  It centers around hiring:  if your team is hiring a new developer / engineer, and you or your team are big into design patterns, don't make knowledge of design patterns a requisite for hiring.

I have been programming most of my life, but only somewhat recently have been able to make it my career.  Already I've met a good range of coders: the hardcore enthusiast, the day-job-only coder, the serious professional.  They seem to come in all types, but all have the same common denominator:  they enjoy writing code.  Some more than others, but ultimately, there is a certain gene-pool that simply enjoys writing code.  It actually has little to do with being a computer enthusiast.  A good majority of systems administrators I've worked with or under don't like programming, period.  But there are those of us who are addicted to "realizing" our thoughts right there on the screen.

Until maybe two or three years ago, I knew little to nothing about design patterns.  As I became familiar with and researched them, I've discovered that I've been using many of them for a long, long time.  And that's exactly what they are:  patterns that have been "recognized" in the programming trade.  As such, they are most valuable as a means of communicating an approach to a task or problem at hand.  They are not, however, a requirement to attack a programming task, nor are they any indication of the competency of the prospective engineer / developer.

If your programming shop or R&D department or whatever group you work in is a pattern-heavy group, then make the applicant aware of this, but don't dismiss them if they simply don't know design patterns.  Most likely the applicant would be more than eager to learn them, and will discover they’ve already used many of them anyways.  Ergo, it's a subset of a lexicon we, as coders, may or may not need to know, depending only on how to be most efficient in our team.

I've noticed that a majority of the programmers I've worked with thus far who are heavy into patterns arrogantly criticize and judge their colleagues when they find out they don't know what a singleton is or the proxy pattern or whatever.  So far, from what I've seen, neither camp has shown to be better programmers than the other.  The real thing that quickly separates an experienced programmer from a truly great programmer, though, is ego.  I personally love everything from learning new tricks and techniques from my colleagues to having them point out where I was really dumb in my code.  It only makes me ultimately a better programmer (hopefully).  What I don’t want to do, and I think this is a pretty unanimous feeling, is converse with an asshole.

When you really think about it, the literal sense of “conversing with an asshole” is identical to it’s figurative sense: an asshole really never listens, and only barks out useless foulness you’ll want to stay away from.  Plus, your friends, family, and colleagues all may very well think less of you if they see you regularly conversing with assholes, even if it’s the same asshole.

So really, whether you’re big into design patterns or not, don’t be an asshole, because it just means you don’t listen and no one wants to be near you anyways.  And if you’re hiring, replace “do they know what a flyweight pattern is?” with “are they an asshole?” on your checklist.  You’ll always build a better team that way.

~zagnut

Submit this story to DotNetKicks