.“…to look at the stars always makes me dream, as simply as I dream over the black dots of a map representing towns and villages…”
— Vincent Van Gogh
Image Source: Vincent Van Gogh Painting Tilt Shifted: http://coolvibe.com/2011/16-van-gogh-paintings-tilt-shifted/tilt-shift-van-gogh-15/
To analyze hardware utilization within their data centers, Microsoft’s Online Services Division – Global Foundation Services (GFS) is working with Hadoop / Hive via HDInsight on Azure. A common scenarios is to perform joins between the various tables of data. This quick blog post provides a little context on how we managed take a query from >2h to <10min and the thinking behind it.
The join is a three-column join between a large fact table (~1.2B rows/day) and a smaller dimension table (~300K rows). The size of a single day of compressed source files is ~4.2GB; decompressed is ~120GB. When performing a regular join (in Hive parlance “common…
View original post 1,003 more words