it is a measure of how well business data practices satisfy standards. Good and reliable data can be used to increase efficiency, support decision making, and sky rocket profitability.
Poor data quality leads to wasting time, working with conflicting information resulting in bad decisions and massive decrease in efficiency.
Many organizations implement strict data controls at the point of entry. However quite often it is not enough. For example, when data is loaded into the data-warehouse or moved from one application to another additional validation and transformation rules must be applied.
The primary objective of any data integration solution is to assemble data from one or more data sources validate and transform it into standard format.
There are three major steps in implementing data quality strategy:
Advanced ETL Processor is able to check any data including date formats, post codes, phone numbers, validating against list of values etc. It has more than 190 data validation functions, plus it can be extended by using regular expressions. It is an enterprise data integration solution that lets you quickly validate and process large volumes of data while preserving and enhancing data quality.


Common problem: trying to load the data from Excel file half of the data is coming as nulls, or columns with more than 255 characters are truncated
As partially explained here
http://support.microsoft.com/kb/257819
ODBC/MS Jet scans first TypeGuessRows to determine field type
(TypeGuessRows=8 IMEX=1)
In your eight (8) scanned rows, if the column contains five (5) numeric values and three (3) text values, the provider returns five (5) numbers and three (3) null values.
In your eight (8) scanned rows, if the column contains three (3) numeric values and five (5) text values, the provider returns three (3) null values and five (5) text values.
In your eight (8) scanned rows, if the column contains four (4) numeric values and four (4) text values, the provider returns four (4) numbers and four (4) null values.
In your eight (8) scanned rows all of them less than 255 characters the provider will truncate all data to 255 characters
In your eight (8) scanned rows, if the column contains five (5) values with more length than 255 the provider will returm more than 255 characters
NOTE:
Setting IMEX=1 tells the driver to use Import mode. In this state, the registry setting ImportMixedTypes=Text will be noticed. This forces mixed data to be converted to text. For this to work reliably, you may also have to modify the registry setting, TypeGuessRows=8. The ISAM driver by default looks at the first eight rows and from that sampling determines the datatype. If this eight row sampling is all numeric, then setting IMEX=1 will not convert the default datatype to Text; it will remain numeric.
Nobody wants to load half of the data, everybody wants to load data as it is
Set IMEX=1 in connection string
Close any programs that are running.
On the Start menu, click Run. Type regedit and click OK.
In the Registry Editor, expand the following key depending on the version of Excel that you are running:
Excel 97
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.5\Engines\Excel
Excel 2000 and later versions
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
Select TypeGuessRows and on the Edit menu click Modify.
In the Edit DWORD Value dialog box, click Decimal under Base.
Set the value to 1
Open Excel file
Make sure that the cells in the first line of the table have relevant data for example
This solution apply to all versions of MS Excel ODBC driver, Ole DB, MS Jet, .NET, DTS and SSIS
We have spent enormous amount of time trying to get it fixed. So far we were not able to find a better solution.
The way Excel import works makes it not possible to automate it. You have to modify most of excel files manually in order to load them.
This why we are no longer using ODBC/OleDB/Ms Jet for Excel connections. Our ETL solutions work currecly with Excel all the time
|
|
Once data is went trough the Staging Area, cleansed, transformed and loaded into the Datawarehouse it is presented to the End Users.
There several different types of datawarehouse users.
Advanced users design reports themselves and perform complex analytical tasks. They use very expensive reporting tools such as Business Objects, Crystal Reports or QlickView.
Regular users use reports designed by somebody else or they can use dashboards to monitor key parameters.
Casual users use reports from time to time or they may receive reports by email.
Quite often people just want to run one report per day for example print current warehouse stock level.
Divide licence cost of reporting software by number of reports use and take into account cost of hardware=Cost of ownership
Cost of ownership is very high for casual and some of regular users
One of the solutions is to use Active Table Editor
The administrator designs for Reports and Data Entry Forms. All this complexity is hidden from the end user. End users can only Print reports allowed by administrator
Active Table Editor allows the administrator to log in and design the look of the application for the end users. You can edit user menus, security settings, menu items, input forms and reports. All this complexity is left behind the scene for the end users. Once logged in, the end users see and edit the data, which was defined by the administrator.



![]() |
|
|
|
|
![]() |