SSIS to Oracle Connection

 

Do the following steps to connect to Oracle in SQL Server Integration Services:

 

1.      Install ODTwithODAC1120320_32bit to see Oracle provider for OLEDB in Visual Studio

 

a.      Download URL : http://www.oracle.com/technetwork/topics/dotnet/utilsoft-086879.html (you need also to download and install ODAC 64-bit)

 

b.      This software will install Oracle client, ODAC drivers and Oracle provider for OLEDB driver in Visual Studio

 

c.       Remember the installation path, it will be used in the next steps

 

2.      Now you need to configure connections to connect to oracle

a.      Go to the installation path you have chosen while installing in step 1c

 

b.      Point to C:\~\product\11.2.0\client_1\network\admin\sample and open tnsNames.ora

 

c.       Copy tnsNames.ora to C:\~\product\11.2.0\client_1\network\admin

 

d.      Copy a sample connection string and replace with your oracle connection strings e.g. : <Oracle_Server_Name_Custom>= (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = <Oracle_Server_Name>)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = <Oracle_Service_Name>) ) ) e. <Oracle_Server_Name_Custom> can be anything but its good practice to use <Oracle_Service_Name>

 

3.      Log off the computer and Login again (otherwise you will not see username and password option while using oracle provider for OLEDB in SSIS)

 

4.      Open any of the SSIS package, in connection manager > Right click > New OLEDB connection > Choose Oracle provider for OLEDB > Enter <Oracle_Server_Name_Custom> used in tnsNames.ora in point 3d, user name and password of the oracle > Test the connection

 

You need 32bit for SSDT and 64bit for executing the package as a job since SQL Server is 64bit.

 

Read my blog on using ODBC 32bit and 64bit drivers scenario.

Big Data and Legos

Great Analogy!

Denny Lee

I was recently asked the question – how to explain Big Data to an 8yo. So after realizing the 4 Vs of Big Data barely make sense to non-marketing (i.e. most of us) let alone to kids – I realized that the best construct would be to use Legos.

20140525-132454-48294755.jpg

When I was her age, the lego blocks were only squares and rectangles – I could build a lot of buildings and boxes which was great at that time (in data speak, relational databases).

Instead, Big Data is a massive amount (e.g. volume of data) of lego blocks of different shapes and sizes. With all the variety how quickly I can get more lego blocks, I can imagine new and different ways to build new toys or objects (in data speak, perform new novel analytics).

dwp lego nyc 130707

From Closer Look: United States of Lego.

With lego blocks in all shapes and sizes…

View original post 26 more words

Quick Tip for extracting SQL Server data to Hive

Denny Lee

While I have documented various techniques to transfer data from Hadoop to SQL Server / Analysis Services (e.g. How Klout changed the landscape of social media with Hadoop and BI Slides Updated, SQL Server Analysis Services to Hive, etc.), this post calls out the reverse – how to quickly extract SQL Server data to Hadoop / Hive.   This is a common scenario where SQL Server is being used as your transactional store and you want to push data to some other repository for analysis where you are mashing together semi-structured and structured data.

How to minimize impact on SQL Server

Analogous to the above linked documents, you could use a JDBC driver to connect and extract this data from SQL Server (other techniques may involve SQOOP, HiveODBC, etc.).  But if your security policies allow for this, a fast way to extract data from your SQL Server with…

View original post 307 more words

SQL Server Auditing

With SQL Server 2008 or higher, there is a powerful yet lightweight method to audit a SQL Server instance. But to manage and view the audits of your entire SQL Server environment, we have created the Centralized Auditing Framework that will parse, load, and report all of your audit logs.

From a high-level perspective, the architecture to audit sensitive operations for your SQL Server environment will be to:

  1. Turn on auditing on your various database and servers; you can find more information at http://msdn.microsoft.com/en-us/library/cc280386.aspx
  2. Have these audit logs go to one centralized location instead of a local folder
  3. Included in the Centralized Audit Framework is an SSIS package that will Parse through all of these audit log files.
  4. Populate a centralized database (also included) with audit dimension data (class, type, server names, etc.) and audit fact data (audit events separated by category of server, database, DDL, and DML actions).

Once the data is loaded, a nightly job will process the data to product report tables. These report tables can be viewed by SSRS reports (included).

The Auditing framework is available at GitHub to download the entire project.

SQL Server Reporting Services Disaster Recovery Case Study

Authors: Ayad Shammout & Denny Lee

Technical Reviewers: Lukasz Pawlowski, Sanjay Mishra, Prem Mehra, Michael Thomassy, Lindsey Allen, Burzin Patel, Andy Wu, Thomas Kejser, Carl Rabeler, Quoc Bui

Applies to: SQL Server 2005, SQL Server 2008

Introduction

There are many ways to perform disaster recovery with Microsoft SQL Server Reporting Services (SSRS). Based upon customer experience and internal testing, this technical note provides guidance around best practices to design and manage robust end-to-end disaster recovery (DR). This DR method will involve both automatic and manual failover in the form of content switches, SQL Server failover clustering, and database mirroring. This case study focuses on the lessons learned from CareGroup Healthcare System.

Environment Setup

Figure 1 shows a basic disaster recovery scenario. The left side (in green) represents the primary data center, while the right side (in orange) represents the disaster recovery (DR) site. In general the two sites should be identical or at least very similar to each other so that the DR site will be able to handle the load of the primary data center; all of this should be transparent to the users.

image

Figure 1: Primary data center and the disaster recovery site

Primary Data Center

The primary data center contains the Reporting Services Web service servers (there is only one for the purpose of this diagram); the report catalogs are placed on a separate server (RSDB).  As a general rule, you should set up the report catalogs (RSDB) in a failover cluster, because the report catalogs are central to the Reporting Services environment.

Disaster Recovery Site

The disaster recovery site should closely duplicate the primary data center Reporting Services environment. It should be located in another geographic location such that if the primary data center experiences a resource disruption (for example, in connectivity or power), all traffic can be redirected to the alternate site with minimal disruption to service. In noncritical scenarios, you can utilize fewer resources (for example, less powerful servers, fewer servers) but for mission-critical systems, we highly recommend a server-for-server duplication of resources.

Network Setup

To ensure connectivity from the clients to the primary data center and the disaster recovery site, a common technique is to use a content switch to load-balance traffic within the individual sites as well as between the global sites. In the case of CareGroup Healthcare, a Cisco GSS is used as the content switch. As well, there is direct fiber network connectivity between the primary data center and the disaster recovery site to ensure minimal latencies for any communication between the two centers. If the primary site goes down for any reason, the content switch transparently redirects all client traffic to the disaster recovery set of Reporting Services servers. If the content switch is unavailable, the IP address can be changed at the DNS level. This latter change is a manual switch with a slightly longer network outage, which is due to the DNS cache clearing the old IP address and pointing to the new one.

Database Setup

For the CareGroup Healthcare environment, the primary RSDB instance within the primary data center is named Boston\sql4 and is a SQL Server cluster using SQL Server 2008 Enterprise, with the content switch pointing to the alias sql4. An active/passive failover cluster is used; this allows other active database instances to be located on the server in the passive node. This technique works well if the passive node is not overutilized such that it can handle the Reporting Services workload if the Reporting Services active node goes offline. It may make sense to go to an active/active cluster if there is enough traffic, concurrent users, and/or resources, though the discussion of the tradeoffs between idle machines and high availability scenarios is outside the scope of this technical note. Each of these clustering techniques has unique advantages; ultimately, it is important to ensure that the report catalogs are consistently running with minimal downtime. For more information about how to best configure a SQL Server clustering environment, see SQL Server 2008 Failover Clustering.

To ensure that the RSDB within the disaster recovery site is up-to-date, a common technique is to use asynchronous database mirroring. For the CareGroup Healthcare environment, the mirrored RSDB instance is named Montréal\sql4 and is a SQL Server cluster using SQL Server 2008 Enterprise. Note that while the domain names are different (Montréal vs. Boston in this case), the SQL Server clusters for the primary data center and the disaster recovery site have the same name (more on this later). The key database to keep in sync is the report server (RSDB) database, because it stores all of the report metadata, including report definitions, report history and snapshots, and scheduling information. All Reporting Services operations must connect to this database to access its metadata to perform their functions. The reason asynchronous database mirroring is commonly chosen is that asynchronous mirroring has minimal or no impact on response time performance. In addition, asynchronous mirroring works well because RSDB metadata is not frequently updated.

The following table describes an example endpoint setup for these database mirrors.

image

Initializing Database Mirror

A relatively easy way to initialize a database mirroring setup is to:

1) Make full and transaction log backups of the Reporting Services databases on the principal server.

2) Copy the backups over to the disaster recovery site, restoring each Reporting Services database in no-recovery mode.

3) Set up the failover partner on the mirror (that is, the DR site) before you set up the failover partner on the principal server.

Failover Scenarios

There are three main disaster recovery scenarios that can occur within this Reporting Services disaster recovery environment:

· Reporting Services servers within the primary data center go offline.

· The active server for the RSDB in the primary data center failover cluster goes offline.

· The entire cluster for the RSDB in the primary data center goes offline.

Please note that this note does not explicitly cover the scenarios concerning the Reporting Source databases. While most of the information here is applicable to these databases (such as data marts, operational data stores, transactional systems, and data warehouses), they have their own disaster recovery requirements and procedures that should be separate from the reporting infrastructure.

Reporting Services Servers within the Primary Data Center Go Offline

As noted in Figure 2, both Reporting Services servers point to the same cluster of databases. In this case, they both point to the active/passive database cluster Boston\sql4.

image

Figure 2: Reporting Services servers within the Primary Data Center go offline

In this scenario where the Reporting Services servers within the primary data center go offline, the hardware content switch will detect a network connectivity issue and automatically fail over to the disaster recovery site. By default, the Reporting Services farm connects to the report catalog in the primary data center because it has the most up-to-date information (provided that the network is fast and responsive). Therefore in most cases, the client connectivity to the Reporting Services environment in the DR site is minimally impacted (clients might need to be closed and restarted to establish a new network connection, or they could receive a broken link/page in the browser and need to refresh or restart their report execution).

Active RSDB Server in the Primary Data Center Failover Cluster Goes Offline

If the active server in the primary data center failover cluster goes offline, the active/passive setup of the SQL Server failover cluster will kick in. The active server will automatically fail over to the passive instance of the node with little to no service interruption. The report server may report that it cannot connect to the report server database, but after the passive node comes online, it will automatically re-establish the connection. All report server connectivity will continue as previously performed except that now the passive instance has become the active instance.

image

Figure 3: Active cluster for RSDB in the primary data center goes offline

Entire RSDB Cluster in the Primary Data Center Goes Offline

If the primary data center RSDB SQL Server cluster goes offline, the disaster recovery solution is to manually fail over to the RSDB on the disaster recovery site to continue serving up reports. But note that this is a manual failover process, which requires manual configuration of the Reporting Services servers in the Reporting Services farm to point to the RSDB instance within the disaster recovery site. It is important to be aware that, with asynchronous mirroring, manual failover by definition forces the service to allow data loss. With Reporting Services, however, the data loss should be minimal because report metadata is not frequently updated and snapshots can be re-created. By configuring the primary data center and disaster recovery site RSDBs with identical instance names (recall that both instances are named sql4 but that they are in different domains), the manual failover is simplified because the Reporting Services servers will connect to an identically named instance of the RSDB (different domain, but same name).

image

Figure 4: Entire cluster for RSDB in the primary data center goes offline

Primary Data Center Outage Scenarios

If the entire primary data center goes offline for either a planned or unplanned outage, the content switch will automatically fail over to the Reporting Services farm in the disaster recovery site, but you will need to perform a manual failover to ensure that the Reporting Services servers can connect to the copy of the RSDB in the disaster recovery site.

Planned Outages

For planned outages:

1) The content switch will perform the task of suspending the primary IP addresses and activating the DR site IP addresses so that all connections will be redirected to the disaster recovery site.

2) You will need to execute a script on the primary database to manually switch to the mirrored database. After the partner database becomes the primary, the same script can be re-executed to switch back to the original primary database. Here is an example:

— Script to manually fail over to the mirror database

— Example: switch from Boston\sql4 to Montreal\sql4

USE [master]

GO

— Database ReportingServer

ALTER DATABASE ReportingServer SET PARTNER FAILOVER

GO

Unplanned Outages

For unplanned outages:

1) The content switch will perform the task of suspending the primary IP addresses and activating the DR site IP addresses so that all connections will be redirected to the disaster recovery site.

2) You will need to run a manual failover script to force the service to switch with possible data loss. The script needs to run on the mirrored server in the case of loss of the principal server. Here is an example:

— Script to force database fail over of Boston\sql4 to Montreal\sql4

— in case of SQL Server instance failure

USE [master]

GO

— Database ReportingServer

ALTER DATABASE ReportingServer

SET PARTNER FORCE_SERVICE_ALLOW_DATA_LOSS

GO

After this script is run, the mirror server is the principal server and the mirror itself is suspended.

For more information about Reporting Services configuration for the database connection, see Configuring a Report Server Database Connection.

Break/Rebuild the Mirror

After the system goes down, it is undesirable to have the active system push data to the passive system (it cannot anyways). So it is important to break the mirror with the primary data center server and then re-establish it after the primary data center (in this example, Boston) server is back online. Note that the disaster recovery (in this example, Montréal) server is the one that is now active, so you will need to manually transfer the data from the DR site server back to the primary data center server (in this example, Montréal to Boston). To fully break the mirror, execute the following on the primary server:

ALTER DATABASE ReportingServer SET PARTNER OFF

RESTORE DATABASE ReportingServer WITH RECOVERY

To rebuild the mirror, follow the original steps in the Initializing Database Mirror section above.

Detail Logistics

It is important to understand that the Reporting Services report jobs, schedules, and subscriptions will not be mirrored between the RSDB from the primary data center and the disaster recovery site. This occurs because SQL Server Agent jobs are stored in the msdb database, which is not mirrored. A common method to resolve this issue is to script all of these Reporting Services jobs in the primary site and then use these scripts to re-create them on the disaster recovery site (with the jobs disabled); activate these jobs only if the system fails over. In the case of Reporting Services database migration, it is not necessary to perform this operation, as noted in How to: Migrate a Reporting Services Installation.

Note that logins created on the primary site may not be replicated to the disaster recovery site. To work around this problem, we recommend storing all of these logins into their own database and mirroring this database to the DR site as well. Then create a script to generate the logins within the DR site to re-establish the logins. For more information, see How to transfer logins and passwords between instances of SQL Server.

Conclusion

With these guidelines, you can prepare your Reporting Services environment for disaster recovery scenarios. The key here is that you will need to use a combination of hardware components, network connectivity, SQL Server database mirroring, and architecture to create a disaster recovery infrastructure to support an enterprise SQL Server Reporting Services environment. As noted, this case study has used specific techniques such as active/passive clustering and asynchronous database mirroring to support their disaster recovery requirements. Your situation may be different, but this technical note should provide you with the basic framework for Reporting Services disaster recovery. The hope is that you will never need to use it, but at least now you are ready in case you do.

Quick Tips and Q&A for SQL Server Analysis Services to Hive

Denny Lee

Over the last few weeks I’ve fielded some questions concerning the paper that Dave Mariani (@dmariani) and I had contributed to the whitepaper / case study SQL Server Analysis Services to Hive; below is an aggregate of those tips – hope this helps!

Q: I’m running into the HiveODBC Error message “..expected data length is 334…”

A: Check out the post for details on how to potentially resolve this: HiveODBC error message “..expected data length is 334…”

 . 

Q: Can I connect Analysis Services Tabular to Hive instead of Multidimensional?

A: Yes! Ayad Shammout (@aashammout) has a couple of great blog posts that dig into exactly how to do this; it is similar to how PowerPivot can connect to Hive.

Import Hadoop Data into Analysis Services Tabular

SSAS Tabular using HiveODBC connection

 . 

Q: The original  SQL Server Analysis Services to Hive whitepaper had Analysis…

View original post 888 more words

Optimizing Joins running on HDInsight Hive on Azure at GFS

Denny Lee

.“…to look at the stars always makes me dream, as simply as I dream over the black dots of a map representing towns and villages…”
— Vincent Van Gogh

Image Source: Vincent Van Gogh Painting Tilt Shifted: http://coolvibe.com/2011/16-van-gogh-paintings-tilt-shifted/tilt-shift-van-gogh-15/

.

Introduction

To analyze hardware utilization within their data centers, Microsoft’s Online Services Division – Global Foundation Services (GFS) is working with Hadoop / Hive via HDInsight on Azure.  A common scenarios is to perform joins between the various tables of data.  This quick blog post provides a little context on how we managed take a query from >2h to <10min and the thinking behind it.

Background

The join is a three-column join between a large fact table (~1.2B rows/day) and a smaller dimension table (~300K rows).  The size of a single day of compressed source files is ~4.2GB; decompressed is ~120GB.  When performing a regular join (in Hive parlance “common…

View original post 1,003 more words