Easy to use, fast and powerful ETL tools, Our prices start from only 100 USD per licence.

Work with: Text, XML, Excel, Access, DBF, Foxpro, ODBC, OLE DB, MS Sql Server, Oracle, MySql, PostgreSQL, Firebird, Interbase, SQLite, POP3, SMTP, File System, FTP, SSL, Unicode, RSS, Windows Event Log, Google Spreadsheets and QlikView.

Built-in scheduler, business rules designer, package designer, report designer, data browser. Powerful data transformation language, regular expressions and more...

HomeArticles

Datawarehousing Articles

Data quality

What is data quality and why it is important.

it is a measure of how well business data practices satisfy standards. Good and reliable data can be used to increase efficiency, support decision making, and sky rocket profitability.

Poor data quality leads to wasting time, working with conflicting information resulting in bad decisions and massive decrease in efficiency.

Many organizations implement strict data controls at the point of entry. However quite often it is not enough. For example, when data is loaded into the data-warehouse or moved from one application to another additional validation and transformation rules must be applied.

Data Quality Strategy.

The primary objective of any data integration solution is to assemble data from one or more data sources validate and transform it into standard format.

There are three major steps in implementing data quality strategy:

  • Profiling - helps you to check if a source data file or table extract meets the format such as file encoding fields format and order
  • Cleansing - Once data successfully satisfies profiling standards, it still necessary to cleanse the data for example remove empty values or irrelevant information
  • Auditing - one ot the most important parts of data integration. It gives IT departments the history of data transformation, profiling and cleansing which can be used in importer failure investigations.

Advanced ETL Processor Data validator

Advanced ETL Processor is able to check any data including date formats, post codes, phone numbers, validating against list of values etc. It has more than 190 data validation functions, plus it can be extended by using regular expressions. It is an enterprise data integration solution that lets you quickly validate and process large volumes of data while preserving and enhancing data quality.

Validation Editor

Validation Editor

Simple data cleasing rule

Simple Data validation rule

More about data validator...

 

IMEX=1

Solving problems with loading data from Excel files into databases

Common problem: trying to load the data from Excel file half of the data is coming as nulls, or columns with more than 255 characters are truncated

The logic behind Excel mixed data types

As partially explained here

http://support.microsoft.com/kb/257819

ODBC/MS Jet scans first TypeGuessRows to determine field type

Here how Excel ODBC/MS Jet works

(TypeGuessRows=8 IMEX=1)

In your eight (8) scanned rows, if the column contains five (5) numeric values and three (3) text values, the provider returns five (5) numbers and three (3) null values.
In your eight (8) scanned rows, if the column contains three (3) numeric values and five (5) text values, the provider returns three (3) null values and five (5) text values.
In your eight (8) scanned rows, if the column contains four (4) numeric values and four (4) text values, the provider returns four (4) numbers and four (4) null values.
In your eight (8) scanned rows all of them less than 255 characters the provider will truncate all data to 255 characters
In your eight (8) scanned rows, if the column contains five (5) values with more length than 255 the provider will returm more than 255 characters

NOTE:

Setting IMEX=1 tells the driver to use Import mode. In this state, the registry setting ImportMixedTypes=Text will be noticed. This forces mixed data to be converted to text. For this to work reliably, you may also have to modify the registry setting, TypeGuessRows=8. The ISAM driver by default looks at the first eight rows and from that sampling determines the datatype. If this eight row sampling is all numeric, then setting IMEX=1 will not convert the default datatype to Text; it will remain numeric.

     Nobody wants to load half of the data, everybody wants to load data as it is

The only way to make import from Excel work is

Set IMEX=1 in connection string

Close any programs that are running.
On the Start menu, click Run. Type regedit and click OK.
In the Registry Editor, expand the following key depending on the version of Excel that you are running:
Excel 97
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.5\Engines\Excel
Excel 2000 and later versions
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel

Select TypeGuessRows and on the Edit menu click Modify.
In the Edit DWORD Value dialog box, click Decimal under Base.
Set the value to 1

Open Excel file

Make sure that the cells in the first line of the table have relevant data for example

  • mixed numbers and text characters for text fields
  • only numbers for numeric fields
  • If some of the data will be longer than 255 characters make sure that first line cell has more 255 characters otherwise it will be truncated

This solution apply to all versions of MS Excel ODBC driver, Ole DB, MS Jet, .NET, DTS and SSIS

We have spent enormous amount of time trying to get it fixed. So far we were not able to find a better solution.

The way Excel import works makes it not possible to automate it. You have to modify most of excel files manually in order to load them.

This why we are no longer using ODBC/OleDB/Ms Jet for Excel connections. Our ETL solutions work currecly with Excel all the time

  • Works directly with Excel 3.0-2007
  • No ODBC, OleDB or MS Jet Required
  • Works correctly with mixed data types
  • Works correctly with cells with more 255 characters
  • No need for IMEX=1, HDR=Yes or Registry hacks (TypeGuessRows)

  • Loads data correctly all the time + no need to edit Excel file
  • Can create Excel files in Excel 3.0-2007 format
  • Can insert data starting from specific cell
  • Can clear area before adding data into Excel
  • Can add headers

Related Microsoft KB's

  1. PRB: Excel Values Returned as NULL Using DAO OpenRecordset
  2. Data truncated to 255 characters with Excel ODBC driver
  3. Excel ODBC Driver May Determine Wrong Data Type
  4. BUG: Excel ODBC Driver Disregards the FirstRowHasNames or Header Setting
  5. How To Use ADO with Excel Data from Visual Basic or VBA
 

Creating Reports

Once data is went trough the Staging Area, cleansed, transformed and loaded into the Datawarehouse it is presented to the End Users.

There several different types of datawarehouse users.

  • Advanced Users
  • Regular Users
  • Casual Users

Advanced users design reports themselves and perform complex analytical tasks. They use very expensive reporting tools such as Business Objects, Crystal Reports or QlickView.
Regular users use reports designed by somebody else or they can use dashboards to monitor key parameters.
Casual users use reports from time to time or they may receive reports by email.

Quite often people just want to run one report per day for example print current warehouse stock level.

Cost of ownership

Divide licence cost of reporting software by number of reports use and take into account cost of hardware=Cost of ownership

Cost of ownership is very high for casual and some of regular users

One of the solutions is to use Active Table Editor

The administrator designs for Reports and Data Entry Forms. All this complexity is hidden from the end user. End users can only Print reports allowed by administrator

Active Table Editor allows the administrator to log in and design the look of the application for the end users. You can edit user menus, security settings, menu items, input forms and reports. All this complexity is left behind the scene for the end users. Once logged in, the end users see and edit the data, which was defined by the administrator.

Report Designer

Active Table Editor - Report Designer

Print Preview

Active Table Editor - Printing Report

Report creation wizard

Report Wizard

 

More Articles...

Page 6 of 10

Testimonials

"DBSL software makes it much easier for us to enable new customers for automatic order processing. This alone sets us apart from the competition."

M. Clock, Director,
Clock Logistics

Our customers

BP

BBC

HSBC


Databases we work with

Go to top