INTRODUCTION xxixCHAPTER 1: WELCOME TO SQL SERVER INTEGRATION SERVICES 1
SQL Server SSIS Historical Overview 2
Whats New in SSIS 2
Tools of the Trade 2
Import and Export Wizard 3
The SQL Server Data Tools Experience 4
SSIS Architecture 5
Packages 5
Control Flow 6
Data Flow 9
Variables 13
Parameters 14
Error Handling and Logging 14
Editions of SQL Server 14
Summary 15
CHAPTER 2: THE SSIS TOOLS 17
Import and Export Wizard 17
SQL Server Data Tools 23
Creating Your First Package 25
The Solution Explorer Window 26
The Toolbox 27
The Properties Windows 28
The SSIS Package Designer 28
Control Flow 29
Connection Managers 32
Variables 32
Data Flow 34
Parameters 34
Event Handlers 34
Package Explorer 35
Executing a Package 36
Management Studio 36
Summary 37
CHAPTER 3: SSIS TASKS 39
SSIS Task Objects 40
Using the Task Editor 40
The Task Editor Expressions Tab 40
SSDTCommon Properties 41
Looping and Sequence Tasks 42
Script Task (.NET) 43
Analysis Services Tasks 45
Analysis Services Execute DDL Task 45
Analysis Services Processing Task 46
Data Mining Query Task 46
Data Flow Task 47
Data Preparation Tasks 48
Data Profi ler 48
File System Task 50
Archiving a File 52
FTP Task 53
Getting a File Using FTP 54
Web Service Task 55
Retrieving Data Using the Web Service Task and XML Source Component 57
XML Task 60
Validating an XML File 62
RDBMS Server Tasks 64
Bulk Insert Task 64
Using the Bulk Insert Task 67
Execute SQL Task 68
Workfl ow Tasks 80
Execute Package Task 80
Execute Process Task 81
Message Queue Task 82
Send Mail Task 83
WMI Data Reader Task 84
WMI Event Watcher Task 86
Polling a Directory for the Delivery of a File 86
SMO Administration Tasks 87
Transfer Database Task 88
Transfer Error Messages Task 89
Transfer Logins Task 89
Transfer Master Stored Procedures Task 90
Transfer Jobs Task 91
Transfer SQL Server Objects Task 91
Summary 92
CHAPTER 4: CONTAINERS 93
Task Host Containers 93
Sequence Containers 94
Groups 95
For Loop Container 95
Foreach Loop Container 97
Foreach File Enumerator Example 98
Foreach ADO Enumerator Example 100
Summary 103
CHAPTER 5: THE DATA FLOW 105
Understanding the Data Flow 105
Data Viewers 106
Sources 106
OLE DB Source 107
Excel Source 109
Flat File Source 110
Raw File Source 114
XML Source 115
ADO.NET Source 115
Destinations 115
Excel Destination 116
Flat File Destination 116
OLE DB Destination 116
Raw File Destination 117
Recordset Destination 117
SQL Server and Mobile Destinations 118
Data Mining Model Training 118
DataReader Destination 118
Dimension and Partition Processing 118
Common Transformations 119
Synchronous versus Asynchronous Transformations 119
Aggregate 119
Conditional Split 121
Data Conversion 122
Derived Column 122
Lookup 123
Row Count 124
Script Component 125
Slowly Changing Dimension 126
Sort 126
Union All 127
Other Transformations 128
Audit 128
Character Map 129
Copy Column 130
Data Mining Query 130
DQS Cleansing 131
Export Column 131
Fuzzy Lookup 133
Fuzzy Grouping 138
Import Column 142
Merge 144
Merge Join 144
Multicast 145
OLE DB Command 145
Percentage and Row Sampling 147
Pivot Transform 147
Unpivot 150
Term Extraction 152
Term Lookup 156
Data Flow Example 157
Summary 160
CHAPTER 6: USING VARIABLES, PARAMETERS, AND EXPRESSIONS 161
Dynamic Package Objects 162
Variable Overview 162
Parameter Overview 162
Expression Overview 163
Understanding Data Types 164
SSIS Data Types 164
Date and Time Type Support 166
How Wrong Data Types and Sizes Can Affect Performance 167
Unicode and Non-Unicode Conversion Issues 167
Casting in SSIS Expressions 169
Using Variables and Parameters 170
Defi ning Variables 170
Defining Parameters 171
Variable and Parameter Data Types 172
Working with Expressions 173
C#-Like? Close, but Not Completely 174
The Expression Builder 175
Syntax Basics 176
Using Expressions in SSIS Packages 190
Summary 200
CHAPTER 7: JOINING DATA 201
The Lookup Transformation 202
Using the Merge Join Transformation 203
Contrasting SSIS and the Relational Join 203
Lookup Features 206
Building the Basic Package 207
Using a Relational Join in the Source 209
Using the Merge Join Transformation 211
Using the Lookup Transformation 216
Full-Cache Mode 216
No-Cache Mode 219
Partial-Cache Mode 220
Multiple Outputs 223
Expressionable Properties 226
Cascaded Lookup Operations 227
Cache Connection Manager and Cache Transform 229
Summary 231
CHAPTER 8: CREATING AN END-TO-END PACKAGE 233
Basic Transformation Tutorial 233
Creating Connections 234
Creating the Control Flow 237
Creating the Data Flow 237
Completing the Package 239
Saving the Package 239
Executing the Package 240
Typical Mainframe ETL with Data Scrubbing 241
Creating the Data Flow 242
Handling Dirty Data 242
Finalizing 246
Handling More Bad Data 247
Looping and the Dynamic Tasks 250
Looping 250
Making the Package Dynamic 250
Summary 252
CHAPTER 9: SCRIPTING IN SSIS 253
Introducing SSIS Scripting 253
Getting Started in SSIS Scripting 255
Selecting the Scripting Language 255
Using the VSTA Scripting IDE 256
Example: Hello World 257
Adding Code and Classes 259
Using Managed Assemblies 260
Example: Using Custom .NET Assemblies 261
Using the Script Task 264
Confi guring the Script Task Editor 265
The Script Task Dts Object 266
Accessing Variables in the Script Task 267
Connecting to Data Sources in a Script Task 271
Raising an Event in a Script Task 281
Writing a Log Entry in a Script Task 287
Using the Script Component 288
Diff erences from a Script Task 288
Confi guring the Script Component Editor 289
Accessing Variables in a Script Component 291
Connecting to Data Sources in a Script Component 292
Raising Events 292
Logging 293
Example: Data Validation 294
Synchronous versus Asynchronous 302
Essential Coding, Debugging, and Troubleshooting Techniques 305
Structured Exception Handling 305
Script Debugging and Troubleshooting 308
Summary 311
CHAPTER 10: LOADING A DATA WAREHOUSE 313
Data Profiling 315
Initial Execution of the Data Profiling Task 315
Reviewing the Results of the Data Profiling Task 317
Turning Data Profi le Results into Actionable ETL Steps 321
Data Extraction and Cleansing 322
Dimension Table Loading 322
Loading a Simple Dimension Table 323
Loading a Complex Dimension Table 327
Considerations and Alternatives to the SCD Transformation 335
Fact Table Loading 337
SSAS Processing 345
Using a Master ETL Package 350
Summary 351
CHAPTER 11: ADVANCED DATA CLEANSING IN SSIS 353
Advanced Derived Column Use 354
Text Parsing Example 355
Advanced Fuzzy Lookup and Fuzzy Grouping 357
Fuzzy Lookup 357
Fuzzy Grouping 363
DQS Cleansing 366
Data Quality Services 366
DQS Cleansing Transformation 370
Summary 373
CHAPTER 12: USING THE RELATIONAL ENGINE 375
Data Extraction 376
SELECT * Is Bad 376
WHERE Is Your Friend 377
Transform during Extract 378
Many ANDs Make Light Work 381
SORT in the Database 382
Modularize 384
SQL Server Does Text Files Too 385
Using Set-Based Logic 389
SQL Server Change Data Capture 391
Benefi ts of SQL Server CDC 392
Preparing CDC 393
Capture Instance Tables 394
The CDC API 396
Using the New SSIS CDC Tools 398
Querying CDC in SSIS 401
Data Loading 405
Database Snapshots 406
The MERGE Operator 408
Summary 411
CHAPTER 13: ACCESSING HETEROGENEOUS DATA 413
Excel and Access 415
64-Bit Support 415
Working with Excel Files 417
Working with Access 421
Importing from Oracle 427
Oracle Client Setup 428
Importing Oracle Data 429
Using XML and Web Services 431
Confi guring the Web Service Task 431
Working with XML Data as a Source 439
Flat Files 442
Loading Flat Files 443
Extracting Data from Flat Files 444
ODBC 447
Other Heterogeneous Sources 450
Summary 451
CHAPTER 14: RELIABILITY AND SCALABILITY 453
Restarting Packages 453
Simple Control Flow 455
Containers within Containers and Checkpoints 457
Variations on a Theme 459
Inside the Checkpoint File 461
Package Transactions 463
Single Package, Single Transaction 464
Single Package, Multiple Transactions 466
Two Packages, One Transaction 468
Single Package Using a Native Transaction in SQL Server 469
Error Outputs 471
Scaling Out 473
Architectural Features 474
Scaling Out Memory Pressures 474
Scaling Out by Staging Data 475
Scaling Out with Parallel Loading 479
Summary 485
CHAPTER 15: UNDERSTANDING AND TUNING THE DATA FLOW ENGINE 487
The SSIS Engine 488
Understanding the SSIS Data Flow and Control Flow 488
Handling Workfl ows with the Control Flow 491
Data Processing in the Data Flow 491
Memory Buff er Architecture 492
Types of Transformations 493
Advanced Data Flow Execution Concepts 501
SSIS Data Flow Design and Tuning 508
Data Flow Design Practices 508
Optimizing Package Processing 513
Troubleshooting Data Flow Performance Bottlenecks 516
Pipeline Performance Monitoring 518
Summary 520
CHAPTER 16: SSIS SOFTWARE DEVELOPMENT LIFE CYCLE 521
Introduction to Software Development Life Cycles 523
SDLCs: A Brief History 524
Types of Software Development Life Cycles 524
Versioning and Source Code Control 525
Subversion (SVN) 526
Team Foundation Server, Team System, and SSIS 533
Summary 547
CHAPTER 17: ERROR AND EVENT HANDLING 549
Using Precedence Constraints 549
Precedence Constraint Basics 550
Advanced Precedence Constraints and Expressions 551
Event Handling 557
Events 558
Using Event Handlers 560
Event Handler Inheritance 567
Breakpoints 569
Error Rows 572
Logging 576
Logging Providers 577
Log Events 577
Catalog Logging 582
Summary 584
CHAPTER 18: PROGRAMMING AND EXTENDING SSIS 585
The Sample Components 586
Component 1: Source Adapter 586
Component 2: Transform 587
Component 3: Destination Adapter 588
The Pipeline Component Methods 588
Design-Time Functionality 589
Runtime 593
Connection Time 594
Building the Components 596
Preparation 596
Building the Source Component 602
Building the Transformation
Component 614
Building the Destination Adapter 625
Using the Components 633
Installing the Components 633
Debugging Components 634
Design Time 635
Building the Complete Package 636
Runtime Debugging 637
Upgrading to SQL Server 2012 641
Summary 641
CHAPTER 19: ADDING A USER INTERFACE TO YOUR COMPONENT 643
Three Key Steps for Designing the UI: An Overview 644
Building the User Interface 644
Adding the Project 645
Implementing IDtsComponentUI 647
Setting the UITypeName 651
Building the Form 653
Extending the User Interface 658
Runtime Connections 658
Component Properties 661
Handling Errors and Warnings 663
Column Properties 665
Other UI Considerations 667
Summary 667
CHAPTER 20: EXTERNAL MANAGEMENT AND WMI TASK IMPLEMENTATION 669
External Management of SSIS with Managed Code 670
Setting Up a Test SSIS Package for Demonstration Purposes 670
The Managed Object Model Code Library 671
Catalog Management 672
Folder Management 673
Environments 674
The DTS Runtime Managed Code Library 676
SSIS Deployment Projects 676
Parameter Objects 677
Server Deployment 679
Executing SSIS Packages Deployed to the SSIS Catalog 680
EnvironmentReferences 681
Package Operations 682
Application Object Maintenance Operations 683
Package Operations 683
Package Monitoring 686
Project, Folder, and Package Listing 688
A Package Management Example 689
Package Log Providers 699
Specifying Events to Log 701
Programming to Log Providers 702
SQL Server 2012 Operation Logs 703
Package Confi gurations 705
Creating a Confi guration 706
Programming the Confi guration Object 707
Confi guration Object 708
Windows Management Instrumentation Tasks 709
WMI Reader Task Explained 709
WMI Data Reader Example 710
WMI Event Watcher Task 715
WMI Event Watcher Task Example 716
Summary 718
CHAPTER 21: USING SSIS WITH EXTERNAL APPLICATIONS 719
InfoPath Documents 720
ASP.NET Applications 727
Winform .NET Applications 731
Executing a Package with T-SQL 736
Summary 741
CHAPTER 22: ADMINISTERING SSIS 743
Using the SSIS Catalog 743
Setting the SSIS Catalog Properties 744
SSISDB 747
Deployment Models 748
Project Deployment Model 749
Package Deployment Model 751
Using T-SQL with SSIS 757
Executing Packages 757
Using Parameters 758
Querying Tables for Parameter Values 759
Using Environments 760
Using Data Taps 765
Creating a Central SSIS Server 766
Clustering SSIS 768
Package Confi guration 770
Command-Line Utilities 774
DTExec 774
DTExecUI 775
DTUtil 780
Security 782
Securing the SSIS Catalog 782
Legacy Security 785
Scheduling Packages 787
SQL Server Agent 787
Proxy Accounts 789
64-Bit Issues 790
Monitoring Package Executions 791
Built-in Reporting 791
Custom Reporting 795
Performance Counters 796
Summary 796
CHAPTER 23: CASE STUDY: A PROGRAMMATIC EXAMPLE 797
What You Will Take Away 798
Background 798
Business Problem 799
Solution Summary 799
Solution Architecture 801
Naming Conventions and Tips 804
Additional SSIS Tips Before You Start a Large Project 805
Data Architecture 805
File Storage Location Setup 806
Bank ACH Payments 806
Lockbox Files 807
PayPal or Direct Credits to the Corporate Account 808
Case Study Database Model 808
Database Setup 810
Case Study Load Packages 819
Bank File Load Package 819
ACH Load Package 845
E-mail Load Package 861
Testing 866
Case Study Invoice Matching Process 867
Matching Process Control Flow 867
Matching Process High-Confidence Data Flow 870
Matching Process Medium-Confidence Data Flow 875
Interpreting the Results 879
Creating a Parent Driver Package 880
Driver Package Setup 880
Summary 881
INDEX 883