Lost Password?
No account yet? Register
Increase font size Default font size Decrease font size Narrow screen resolution Wide screen resolution Auto adjust screen size default color red color blue color yellow color orange color

Dealing With Dirty Data

What is Dirty Data anyway?


In reference to databases it is a data that contain errors. Dirty data can contain such mistakes as spelling or punctuation, incorrect data associated with a field, incomplete or outdated data or even data that is duplicated in the database.

 Other common causes of dirty data are:

  • Wrong fields sizes
  • Wrong and inconsistent formats
  • Logical inconsistency like typing zipcode into phone number box
  • User Errors

Most of the problems comes when working with Text or Excel Files

Life is much easier when data source is ODBC compliant database however there are still some potential problems

Imagine that you are loading orders from different countries into your oracle datawarehouse.

Part of  the data comes from text files, part from MS Excel files and some of the data is direct ODBC connenction to the source database.
Some files are result of manual consolidation of multiple files

Datawarehouse Table Definition is

  • COUNTRY_ID INTEGER
  • ORDER_ID INTEGER
  • ORDER_DATE DATE
  • AMOUNT NUMBER(10.2)


Every country have different formats for ORDER_DATE and Amount field. This situation is far too familiar for many ETL Consultants

 

 

In order to load data we need to make sure that format of Amount and Order_Date fields is consistent.

For amount field we need to get rid of dollars, pounds and commas.

It could easily done by using replace function of Advanced ETL Processor. 

 

What you see is what you load

 

For ORDER_DATE field we will apply multiple date formats.

Result of Date Format function is a string in 'YYYY-MM-DD HH:NN:SS.ZZZ' format 

  

What you see is what you load
 
 
 Full Transformation

 Result of Transformation 

 

 

This is just a small example how Advanced ETL Processor can help you to validate and transform data. 

Please feel free to Download a Free Trial or get the latest version today from only $340.
Read Advanced ETL Processor FAQ , Watch Online Flash Demo , Read Product Documentation (PDF).


About Advanced ETL Processor


Advanced ETL Processor
is an ETL tool designed to automate extracting data from ANY database, transform, validate it and load into ANY database . Typical usage of it would be extract data from Excel File,Validate Date Formats, Sort data, deduplicate it and load it into Oracle database, run stored procedure or Sql script, once loading is completed. Unlike Oracle SQL loader, BCP,  DTS or SSIS Advanced ETL Processor can also add new and update old records based on primary key.

More Information

 

Testimonials

Bye the way….. Your software rocks big time. This is the easiest way I have ever found to deal with disparate databases!!!!!!!!!!!!!!!!!!!!

George Martin
 

Free Newsletter

Please provide your Name and Email address so that we can tell you about latest releases, new features and free offers.






b.150.100.16777215.0..Screenshots.VimpEnt.vimpentbig5.jpg
DB Software Laboratory - Intel Software PartnerDB Software Laboratory - My SQL Ready Partner
DB Software Laboratory Embacadero Technology Partner