HomeForum
Welcome, Guest

De-Duplication of Addresses
(1 viewing) (1) Guest
  • Page:
  • 1

TOPIC: De-Duplication of Addresses

De-Duplication of Addresses 1 year, 1 month ago #2475

  • DBrown
  • OFFLINE
  • Junior Boarder
  • Posts: 29
  • Karma: 0
* Version Number = Advanced ETL Processor Ent. 4.2.3.5

* Repository type = MS Access 2007

* Database type you are working with. = .MDB

* OS = Windows 7

I am converting customer data from an ACCESS database to .CSV file for upload to Oracle CRM on Demand. I have created Primary Key de-duplication of records and I am looking for a way to filter out address variances of the same customer entered multiple times like this:
JONES-123 Fair Oak Drive
JONES-123 FairOak Drive
JONES-123 Fair Oak Dr.
JONES-123 Fair Oak Dr
JONES-123 FairOak Dr

I am sorting and grouping by Last Name combined with address to create unique ID for the "Parent" record and will
be adding the other records as "Children" to the "Parent" record.Any help on this would be greatly appreciated.

Re: De-Duplication of Addresses 1 year, 1 month ago #2476

  • admin
  • OFFLINE
  • Moderator
  • Posts: 2182
  • Karma: 12
Interesting task you have got over there

I have created the working example for

I assume that the longest address is the correct one

Unzip attached the file into c:\support directory than open the file

Peter
Attachments:
The following user(s) said Thank You: DBrown

Re: De-Duplication of Addresses 1 year, 1 month ago #2487

  • DBrown
  • OFFLINE
  • Junior Boarder
  • Posts: 29
  • Karma: 0
Thank you for your time Peter, I better understand the concept for the Group/Sort for length now which will help me accomplish the task I have.The task is a bit more involved and I should have supplied A better example of what I was trying to accomplish. Let me give you a better example from the actual data.

The Layout for this particular template consists of
"Record#.Store ID","Last Name_Address_Zip Code","Zip Code","Phone#"

"9496.11501","Beck 2165 Bradley Rd 11501","11501","4408351293"
"8840.11501","Beck 2165 Bradley Road 11501","11501","4408351293"
"8540.11501","Beck 2165 Bradley 11501","11501","4408351293"


The Project Coordinator has asked to use the "Last Name_Address_Zip Code" as the Primary Key for the record.
The line of data with the Highest Record# would be the best Primary Key since it is the most recent data.

I have been tasked to create the Parent record(PK1)in the first file to import on the first pass with no duplicates.That would be

(PK1)-Written to Parent Record File
"9496.11501","Beck 2165 Bradley Rd 11501","11501","4408351293"

(Duplicates)-Written to Child Record File
"8840.11501","Beck 2165 Bradley Rd 11501","11501","4408351293"
"8540.11501","Beck 2165 Bradley Rd 11501","11501","4408351293"


Then after the Parent records are imported to the Oracle CRM on Demand DB I will be able to use the 2nd field to join the Children to the Parent.

I am continually Amazed by the power and versatility of the ETL tool and the Support Team. Thank You!

Re: De-Duplication of Addresses 1 year, 1 month ago #2490

  • admin
  • OFFLINE
  • Moderator
  • Posts: 2182
  • Karma: 12
I have created another example for you.

Which is even more complicated

Mike
Attachments:

Re: De-Duplication of Addresses 1 year, 1 month ago #2492

  • admin
  • OFFLINE
  • Moderator
  • Posts: 2182
  • Karma: 12
Also

Version 4.2.4.15

LeftPad(S,Length,Char):String, Pads Left part of the string with Character Char
RightDelete(S,Length):String, Deletes Right part of the string
RegularExpression(S,Expression):Boolean, Returns true if string matches expression
EncodeXMLElement(S,ElementName):String, Returns XML Element ElementName="S"
DecodeXMLElement(S,ElementName):String, Returns XML Element Value
EncodeXMLAttribute(S,AttributeName):String, Returns XML Attribute
DecodeXMLAttribute(S,AttributeName):String, Returns XML Attribute Value
EncodeXMLString(S):String,Returns XML String
DecodeXMLString(S):String,Returns XML Element Value

Two new data transformation fuctions

ReplaceSuffix
ReplacePrefix
  • Page:
  • 1
Time to create page: 0.20 seconds

Testimonials

"Pricing is an important aspect of our software investment program and DBSL exceeded our expectations in both performance and value for money."

Dana Sander,
ICT Manager

User Login

You only need to log in or register to use our support forum



Our customers

BP

BBC

HSBC


Databases we work with

Go to top