Wednesday, October 3, 2007

Recent international marketing database build

I commonly refer to these types of databases as data staging areas. The reason is that most large companies that I work for have many existing database systems which they already use. The staging area is a place where data has been pulled from all over an organization, cleaned, standardized, and de-duped. This is then put into an operational staging area for regular use in marketing programs.

So what were we dealing with to get started?

A series of large data extracts were sent to me with information from existing internal databases. This consisted of about 3 files in relatively good order. Next came an additional set of 250 files in every shape, size and format. Some files had only 15 or 20 records; others had thousands of records from many different countries.
My team and I spent the better part of a month evaluating and structuring these files for use. This included a substantial amount of communication with the client to clearly identify what every file meant and how it should be used and prioritized in the new staging area.

Just a few facts

We were looking at over 1,000 different field names relating to a clean list of about 220 actual field names. There were over 24,000 unique job titles that had to be translated into a standard set of job functions to make selections on the data possible. We identified over 6,000 industry descriptions in all types of industry classifications. Of course values like country and city names were also coded in numerous ways – human creativity is endless when it comes to doing the same thing in many different ways. In the end we mapped over 35,000 values to a standard list of variables in our data dictionary which made effective selections possible.
This number may seem impressive but field name matching is straightforward if you only have to match one field name against another one. Often, however, you face different data sources and some of them have account and contact data separated, while others list them together or do not distinguish between the two levels at all. Normalization to accounts and contacts is even more difficult, if you allow for the same type of value to be linked to either a contact or an account (e.g. address or telephone number).

Step by step

When all the entry data was ready with correct column mapping, we went to the lowest level of detail and searched for any wrong characters. At this stage we attempted to recover original values when we found broken data in different encodings. Often it is possible to recover original Cyrillic or Eastern European strings even from an unreadable state. This is precisely what we had to do in some of the files that had happened to be saved with wrong local settings in the past. After successful recovery, we noted the column names of local values for further creation of column versions that we named “Latin” or “ND”, for Latin characters only, or No Diactrics, respectively.
Next we dealt with issues relating to communications details. This includes company names, contact names, addresses, telephone and fax as well as email and website addresses. Once the communication details were amended we were able to run effective de-duplication routines on the files.
We spent a lot of time removing noise values that are commonly found in the company names as well as in the telephone numbers. Next we ran PAF validation (Postal Address Files). This provided us with clean address records according to local postal file standards and it also provides us with good idea of the correctness and accuracy of the file. For telephone numbers we checked their length in digits against the standards for each country and for websites we run a direct internet lookup to check for valid sites linked to the addresses.
Next we merged and ran our de-duplication software on all the data. This is the main point of data cleaning that requires exceptional care with duplicate identification and then with record merging. The de-duplication or “de-dupe” is first run on a company level and then on a contact level. Usually when working with various purchased data from outside vendors you can expect a duplicate rate of anywhere from 5 to 20%. In the case of internal data from one large company, you are talking about duplicate rates of more than 50% as the same data is found in multiple sources throughout the organization. We were able to correctly identify and merge duplicates only thanks to data policies carefully designed together with our client.
All the processes mentioned above included a lot of leg work but our crowning achievement was to design the proper merging of hundreds of thousands of records. This included information on source of the record, priority of the file it came from as well as judgments on the richness of the data.
This is a great moment in data staging work like this one, when you are finally able to view all the records merged and cleaned in one format. Then you know that there are only a few more tasks to be done to further improve the quality of the data. We capitalize the data, assign gender and local language salutations to contacts and generate the Latin field versions that were mentioned before for easy handling.
I am truly proud of this build. We now offer our customer the ability to see and use all the data found throughout their organization and we are talking about data from all over the world – Europe, Asia, North & South America, Africa, and Middle East.

Labels: , , ,

Friday, September 7, 2007

Successful Mail Order in Europe

Successful Mail Order in Europe

If you have a successful mail order product in one country, then chances are that you will be able to sell that product in many European countries.

What makes Europe interesting for mail order
Well for starters the European community consists of 27 countries with a combined population of 492.9 million potential consumers. Direct mail and direct marketing techniques are sophisticated and well entrenched in many countries.

How to start a mail order campaign for Europe
The nature of direct marketing is to test and roll. In the case of embarking on a European strategy it is always best to focus on one test market and make things work there. Then, once a model is working in one country, look to roll out to additional markets.

If you test you product in one market, it keep the complexity of the test more manageable. As an example, marketing materials, packaging, manuals, instructions and other materials need to be translated and localized for each country and language version. The more complex the test, the more room for errors. By testing one market you are also able to focus the budget and go for the best of the best.

When deciding on the first country to test, I always recommend either Germany or the United Kingdom. If you are coming form an English speaking country then the UK sounds like the natural first shot. My preference is to go for Germany and immediately become accustomed to working in different languages. Imagine it a s part of the test. If you can do one foreign language, then the next will be easy.

The big 3 mail order markets; Germany, France & the United Kingdom
After the test, the first consideration is which countries to roll out to. The big 3 countries of Germany, France and the United Kingdom are a natural progression for 2 key reasons; 1.) they cover a substantial part of European consumers and 2.) they are well developed mail order infrastructure and consumer are regularly purchasing via the mail.

So basically, the big 3 give you the most bang for your start up money with the least amount of language versions.
More details to follow in future stories as time permits...

Labels:

Selecting suppliers for international marketing data

Introduction
When selecting a data supplier, you should consider several key factors before judging the data suitable for rental, lease or purchase.

The 6 key factors to consider are:

Source
Compilation method
Updating frequency/guarantees
Selection criteria available
Usage
Local/Regional variations

Source
Where the data is originally sourced from is extremely important and often reflects the cost and quality of the data. This could range from Yellow Pages/ Chamber of Commerce data which are cheap but of uncertain quality to publication subscriber data which is expensive but good quality because the subscribers pay for and receive their publication every month or week. In our experience, there are always cheap sources of data available but this is the same as other commodities - cheap often equals bad quality. For a database to be clean, accurate, well maintained and formatted correctly requires investment by the data owner and this is reflected in the charges they make for 3rd party use.

Compilation method
How a database is compiled is also a key consideration and its influence in deciding on a supplier often relates directly to the client’s intended use. For example, a database that has been compiled from telephone interviews would make an ideal source for telemarketing campaigns because the quality of the telephone numbers will be high and the individuals have already shown a propensity for receiving unsolicited calls, likewise a newsletter subscriber list would be a good source for mailing campaigns.


Updating frequency/guarantees
In order for data to be effective, how up to date it is, is an essential ingredient. This is largely dictated by how committed the list supplier is to keeping their data ‘clean’. An initial indicator of this, is whether the supplier is a well established commercial data company or whether they are using the data themselves in their regular line of business. A specialist data supplier who can prove they have invested resources in continually updating their database or a publisher who must keep its magazine circulations accurate in order to satisfy their advertisers, can offer high guarantees of accuracy within the industry excepted tolerances. A directory publisher on the other hand may take a year to compile their data and 6 months to publish, meaning the data is already 18 months old before the directory is published. Buying a CD of this data may be a cheap and quick of getting hold of data but the quality may be poor.

Selection criteria
Being able to accurately target potential customers is one of the great advantages of direct marketing but different suppliers have of varying degrees of selectability. The choosing of a supplier is largely dependent on how close they can get to the target audience, combined with the level of accuracy they guarantee. In most generic databases selections are available to help with targeting but these are often based on classifications unique to the supplier – non standard industry classification not SICs, employee size bands not revenue, broad job functions not specific job titles. Other suppliers may not offer wide selection criteria but the profile of their database (product purchaser or magazine subscriber) could also offer inherently good targeting.

Usage
Some commercially available databases have restrictions on how their data can be used or the purpose for which the data is used. How the data is used is largely due to the terms of the purchase - one time use or lease for 12 months or in rare cases outright sale. Certain suppliers only offer one-time use of all their data, others will allow 12 month lease of postal and telephone data but only one time use of their email addresses. The purpose for which the client is intending to use the data is also a factor to consider when recommending a supplier. Some suppliers allow their email addresses for use but do not allow their telephone numbers to be used, so matching the client’s intended use of the data with the supplier’s ability to provide that use is a key objective in selecting appropriate suppliers.

Local/Regional variationsAnother variable to take into account is relative comparisons between data from different countries. It is clear that some markets, in direct marketing terms, are more developed than others. The more mature a market the more data is available, the wider the choice, more competition and therefore an increase in quality as data suppliers have to improve the quality of their data to gain a competitive advantage. It is clearly not fair to compare data between EU and Latin America but our guide in less developed areas is to use local Direct Marketing Associations as a benchmark for suppliers who use their association’s best practice guidelines to run their data businesses. This shows a commitment by the data owner to try and deliver the best quality possible within the limitations of their own market.

Labels: