Solving the "Two Language" Problem in Data Science
Submitted by Stefan Karpinski (@stefan1729) on Tuesday, 8 September 2015
An introduction to the excellent general purpose computing features in Julia, and what makes it a single ideal tool for data science. Also inroduce how Julia can call Python or C code with 0 overhead.
Before actually doing computation on data, one needs to clean up a messy data set, which almost always involves some kind of string manipulation, pattern matching, invoking shell commands, or even running some code in another language. After doing the computation, you probably again need to convert data to another
format, send it over the network or serve via a HTTP server.
Traditionally people working with data have accepted it as a fact of life that their computing language is terrible at doing any of this. Anything other than matrix or vector operations ought to be the job of
another general purpose language - be it Python, Ruby or Java. But Julia begs to differ, and provides principled primitives for doing all of this and more.
This talk is about the features other numerical computing environments seem to incorporate only as an afterthought.
Shelling out, Julia’s take on Unix pipes, generic asynchronous IO framework, string manipulation, string macros, package management and open source workflow - all of the stuff that let you do everything without jumping between languages, or hiring a team of bearded perl hackers.
Stefan is a data scientist and applied mathematician. He’s previously worked at Akamai, Citrix Online, and Etsy. He then went on to co-create Julia.