Many IT Departments are still relying on SQL Scripts executed within their Database to Move Data from one Instance of the Database to another. This does work fairly well even with very Basic SQL like:
INSERT INTO table1 (column1, column2, someInt, someVarChar ) SELECT table2.column1, table2.column2, 8, 'some string etc.' FROM table2 WHERE table2.criteria;
The above SQL Script should even work on most Database Products. For the Oracle Database even more sophisticated SQL Capabilities exist to cope with the Task of Moving data from one place to another. This comes with various SQL Statements e.g. the COPY FROM Clause:
COPY FROM HR/your_password@ORCL - CREATE NEW_TABLE_SALESMEN (EMPLOYEE_ID, SA_MAN) - USING SELECT EMPLOYEE_ID, LAST_NAME FROM EMPLOYEES - WHERE JOB_CODE='SALESMEN';
This would “copy” the full data from one Schema to another and create a new Table. Even Incremental Loads (that is only the changed data) could be realized quickly by using the MERGE or UPSERT Command e.g.
MERGE INTO employees e USING hr_records h ON (e.id = h.emp_id) WHEN MATCHED THEN UPDATE SET e.address = h.address WHEN NOT MATCHED THEN INSERT (id, address) VALUES (h.emp_id, h.address);
All above sample code would also be very efficient since the complete Processing is done within the Database by the SQL Optimizer.
So to summarize: the above code is easy, could be placed within the Database Scheduler and is also performing very efficient within the database. So why do ETL Tools exist for this Task?
For more Complex Data Movement Design and Operation a dedicated ETL Tool should definitely be considered for their following Advantages:
- More Flexible configuration in Terms of Data Flow and Handling Connection Deployment
- Monitoring of Data Loading Workflows
- Robustness against errors during data loading, due to the Restartability from the fault point
- Debugging functionality during Development
- Structured and standardized design processes
- Orchestration of different loading procedures and Modeling the Dependency’s of Loading Procedures (Load Sequence)
- Easier Parallelization (performance) of multiple Workflows
- Implementation of business rules and transformation of the data
- Visual representation of data flow (data lineage documentation created by Tool)
- Multi- User Development (checkin / checkout )
- Built-in Versioning Capabilities (Source Code Control)
- Easier definition of aggregation Rules for data
- Quicker implementation of new Requirements to existing Mappings (e.g. adding new Column)
- Easier Implementation of Change Data Capture by using pre-built Models to utilize the same Mapping for both Tasks
- Simpler access to 3rd Party Sources which need to be incorporated into the Loading Procedure
After all, the Oracle BI Application Product also ships with an ETL Tool for data Movement like Informatica or ODI.