Quick Tip for extracting SQL Server data to Hive

Denny Lee

While I have documented various techniques to transfer data from Hadoop to SQL Server / Analysis Services (e.g. How Klout changed the landscape of social media with Hadoop and BI Slides Updated, SQL Server Analysis Services to Hive, etc.), this post calls out the reverse – how to quickly extract SQL Server data to Hadoop / Hive.   This is a common scenario where SQL Server is being used as your transactional store and you want to push data to some other repository for analysis where you are mashing together semi-structured and structured data.

How to minimize impact on SQL Server

Analogous to the above linked documents, you could use a JDBC driver to connect and extract this data from SQL Server (other techniques may involve SQOOP, HiveODBC, etc.).  But if your security policies allow for this, a fast way to extract data from your SQL Server with…

View original post 307 more words

SQL Server Auditing

With SQL Server 2008 or higher, there is a powerful yet lightweight method to audit a SQL Server instance. But to manage and view the audits of your entire SQL Server environment, we have created the Centralized Auditing Framework that will parse, load, and report all of your audit logs.

From a high-level perspective, the architecture to audit sensitive operations for your SQL Server environment will be to:

  1. Turn on auditing on your various database and servers; you can find more information at http://msdn.microsoft.com/en-us/library/cc280386.aspx
  2. Have these audit logs go to one centralized location instead of a local folder
  3. Included in the Centralized Audit Framework is an SSIS package that will Parse through all of these audit log files.
  4. Populate a centralized database (also included) with audit dimension data (class, type, server names, etc.) and audit fact data (audit events separated by category of server, database, DDL, and DML actions).

Once the data is loaded, a nightly job will process the data to product report tables. These report tables can be viewed by SSRS reports (included).

The Auditing framework is available at GitHub to download the entire project.