Projects
This page includes various free software projects I have personally created or collaborated on. Some came out of school projects and assignments, some were started to scratch an itch, while others were created for the pure fun of it.
Purity
Purity is a lightweight static analysis tool that determines the purity of Erlang functions. It started as part of my diploma thesis back at NTUA and was subsequently enhanced and released as free software under the LGPL v2. Besides detection of side-effects, execution environment dependencies and exceptions, it also features a very simple termination analysis. Code and documentation is available on github.
A highly experimental unfinished prototype of user defined guards for Erlang —utilising the analysis’ results— can be found here.
Kafka tools
A set of command line tools to communicate with Kafka instances. The main benefit in terms of the existing scripts in the Kafka distribution is a more UNIX-like behaviour which allows use in pipes. Supports Kafka 7.x and 8.x (therefore 9.x and 10.x through their common API). Besides allowing to write to a producer and read from a consumer, the Kafka 8 branch provides support for offset and topic management under a single executable and unified command line interface.
The project was a quick hack so packaging is rough around the edges (jar or a all-in-one RPM), however I found it quite handy when trying to find missing data on production or just feeding mock data for dev and staging environments!
Have a look at the README for extensive usage instructions.
Scala libraries & tools
Bits and pieces of Scala code that I have collected in the form of libraries or simple command line tools. Tiny utils is mostly a bunch of implicits to allow working with Java’s atomic references, count down latches, scheduled executors and thread-locals in a more idiomatic way. Back-off utils is an implementation of a diverse set of back-off policies as immutable case classes, mostly inspired by Mark Brooker’s blog post on the subject. These can come in real handy when writing Scala HTTP clients for APIs and especially when crawling rate-limited ones. It also includes a handy library for generating unbiased uniform longs within a specific interval (conspicuously missing from java.util.Random in Java 7, but not from java.util.concurrent.ThreadLocalRandom which is where it’s borrowed from).
In terms of tools, NFST is a command line tool for working with Lucene’s Finite State Transducers. For some context, I highly recommend reading Andrew Gallant’s write-up on implementing FSTs in Rust. Another handy tool is scully, a command line tool for reading from and writing to 0mq sockets, which comes with an equivalent in Python, both handy if you are stuck in an environment where you cannot install packages. Finally, JML is an old prototype for a JSON mapping and transformation domain-specific language using parser combinators.
Spacenet
Spacenet is a distributed programming game. It was originally conceived as a fun way to present a workshop on declarative languages, along with two fellow members of the FOSS NTUA community. It is written in Erlang, Haskell and Prolog. I wrote the Erlang client/server code and some of the Prolog parts. The game has been tested successfully on a small LAN with 10 players. Code, documentation and screencast demos are available at the spacenet website.
Scheme48
A very simple scheme interpreter following the Write Yourself a Scheme in 48 Hours Haskell tutorial.
Labelme
Labelme is a simple image annotation tool developed for a project in my Human-Computer Interaction course with a fellow student. Since the code was relatively complete and functional I decided to publish it — who knows someone may find it useful someday. It is inspired by a similar web-based tool from the MIT CSAIL Lab but written in Python and Qt. The code is available on github and you can see it in action in this screencast tutorial.
Pytag
Pytag is command-line audio mass-tagging application written in Python and based on the mutagen library. I regularly use it to organise parts of my ever-growing music collection. I’m currently in the process of recovering the repository and documentation after the server hosting it was lost.
Scripts
Besides the aforementioned applications, I have put together a collection of some the scripts I’ve written from time to time, which are too small to warrant a section of their own.
aconv is a handy shell script for converting audio files from a variety of formats (anything mplayer supports) to ogg, mp3 or wav.
checksum is a script that calculates multiple checksums on files without reading them multiple times. This is most likely faster if you want to calculate multiple different checksums of large files (for whatever reason), since I/O tends to be the bottleneck in such cases.
wgrep can be thought of as a combination of wget and grep. I use it to extract hyperlinks that match certain patterns from webpages. The pattern can refer to the URL itself, the text inside the anchor or the URL of an IMG element inside the anchor. For example, to download all the JPEG images in a webpage, one could use the following:
wgrep '\.jpg$' http://some.website.com/ | xargs wget --content-disposition
zget is another handy script which downloads a zip or rar archive and extracts it in one go, deleting the archive afterwards.
Social API tools
A set of command line tools in Python for working with Facebook’s Graph API and Instagram’s API, these were invaluable in debugging production crawlers for those APIs or doing one-off bulk retrievals of content for simple analysis, the main benefit being the automatic handling of the different pagination parameters and support for different output formats (JSON or YAML). Examples of usage and simple analysis with command line tools like jq, sort and sed are provided in the README of each project.