Open Sourcing MNIST and NIST Preprocessing Code
In our most recent post we discussed the current set of experiments that we are conducting, using the MNIST dataset. We’ve also been looking at the NIST dataset which is similar, but extends to handwritten letters (as well as digits).
These are extremely popular datasets and freely available, so make a great choice for testing and comparing an algorithm with the benchmarks.
The MNIST data is not available directly as images though. Even though it’s a standard format, it’s not common. It’s easy to find snippets of code to convert this format into standard images (such as PNG or JPG), but putting it together and getting it working is not where you want to spend your time – instead of designing and running your experiment!
We’ve been through that phase, so very happy to open source our code to make it easier for others to get going faster.
These are simple, small, self contained Java projects with ZERO dependencies. There are two projects, one for preprocessing MNIST files into images, the other is for NIST images, to make them equivalent to the MNIST images to be used in the same experimental setup easily. See the README for more information about the a steps taken.