Whether You’re Networked or Notebooked; Getting
the Most out of your PC Analysis
John Miglautsch received his Master of Arts from the University of Wisconsin-Madison in 1985, and his Bachelor of Arts with Distinction from UW-Madison in 1976. He was a Senior Fellow of The North American Society for Strategic Marketing and the founder and Executive Director of the International Society for Strategic Marketing. John is the President of the Board of Directors of the Wisconsin Direct Marketing Association. John speaks at Marketing conferences all over the United States and Canada and is a frequent contributor on Linkedin.
What are you looking for? What do you want? What do you need to do your job? Is it access, data, speed, more data, what? For insights into what you really need and must have to be successful read this informative article by ISSM founder John Miglautsch.
Access
You know the data is in there but how do you get it out? Data is a by-product of the direct marketing process. If you mail tens of millions of catalogs, you should have more than a few customers and orders. However, gaining marketing advantage with that data can be more than a little challenging. In direct marketing companies worldwide, MIS directors are giving marketing departments varying levels of access to internal data.
This discussion will be about gaining value from the data you have. It is not about computer politics. We will assume that whether you have query tools against the main operations system or a specialized marketing database, you do have access to data. If you do not yet have access, don’t despair… hopefully, this conversation will help you get more precisely what you need.
The first principle is that marketing focuses on the future. This means that data is helpful only if it can help in making decisions. The second principle is then that marketing professionals fundamentally don’t know what they are going to want to know from a marketing system. This causes interesting problems when MIS professionals, who are used to building procedural systems try to specify the necessary elements to build a decision support system.
To overcome this problem, marketing often asks for twice as many things as they think they will need. MIS counters by delivering half of the things marketing asks for… but not usually the right half. Into this struggle steps the database marketing consultant. Marketing begins by explaining that they want access, they simply want to look into the data that they know is in there. In the initial attempts, we tried to use access tools to satisfy this request.
A private cloud may be built. Extremely slick “point & click” query tools are offered. They allow unskilled users to build multiple joins, pull complex selects and send output to Excel, etc. One would think that marketing would be satisfied, but no! (This may tell you something about the game we’re in.) There is a broad category of query tool which is designed to access a range of databases. The user need only learn one software package to look at Oracle, Acess, hadoop and others. To deliver this flexibility, power is often sacrificed.
The initial marketing reaction to the access tool is one of amazement and enthusiasm, however, soon the “we’ll take whatever we can get” attitude changes to “it has to do this and that or it is really no good.” As the demands are refined, it becomes increasingly difficult to repeatedly execute specific procedures within the open ended query environment. Surprisingly, there is a need for something like programs. This situation is further complicated by the very nature of the data. Few realize all the information which is lurking behind the scenes when a routine report is generated.
Product item numbers are combined by product types, types into groups, groups into sub-classes and classes into divisions. Somewhere all that structure needs to be entered and maintained. Data pulled directly from the ERP typically lacks the additional organizational information (at least up-to-date). Users fill their PC’s with ample files of part numbers but in frustration realize that they don’t have all the pieces. Often reports which are taken for granted as part of the system are really generated from off-line PC tables. This means that the data is not even available to the rest of the organization.
The bottom line is that marketing is really not interested in access (though that is precisely what they are asking for). They do not really want to take the time to understand the nuances of the file structures nor do they want to become adept at writing the innumerable macros necessary to proceduralize the data pulls to get data into a form for further analysis. What marketing really wants is the ability to ask a simple question and get a complex answer out of a number of analysis tools ranging from statistical to graphic to mapping applications. Marketing wants answers not access.
Answers not Access
As we have discussed, marketing deals with the future, therefore the primary requirement of a marketing analysis or decision support system is the flexibility to ask (and answer) unanticipated questions. A system designed around a fixed number of variables and menu screens will be obsolete by the time it is created. In addition, we have also seen that simply retrieving data is not the final solution. Marketing needs the data transformed and transported into reporting applications. It is this processing which yields the answers needed for decision support. The transformation system can be broken into three key components: the database, the questions and the applications. It is crucial to understand that both ends of the system, the database and the applications are relatively constant. Only the questions change on a moment to moment basis.
The database is founded on what is available from your data sources. This would start with a customer master file. This contains customer number, name, address, original source, some sales history and sometimes geo-demographic overlay data. Transactions are often split into order header and detail files. The header contains fields like order number, date, key code, territory, etc. The detail contains item number, description, price, cost, units, shipping charges, etc. Sometimes these two transaction files are combined. In addition, inventory master files can provide the product organization information linking part number with product types, groups, classes and divisions. For catalog companies, it may also include advertising space information. Finally, there is usually a promotion file which contains key code information like promotion date, cost and description. There may also be important key organizational data allowing the grouping of keys into advertising type, campaign type and offer type.
These five files provide the building blocks and since data don’t appear out of thin air, until additional types of information (new fields) are added to the database, it remains constant in structure, only growing as more customers and orders are added. The applications, on the other hand, require specialized types of data. Usually, almost any tool will accept .csv files and may also directly access .XLS files. Mapping tools require ZIP to ride along, some statistics programs work better with continuous variables, others need categorical values. Most tools require extensive definition files which MUST match the data coming in. These definition files can be well over 100 pages of carefully structured information (and one stray character can blow the whole system away). It is precisely this complexity which makes it difficult to get answers out of the database. Further, as understanding of the applications tools increases, more of the transformation work ca be on the main database. Sales can be broke into sales by product category, sales by product category can be grouped into categorical scores, this then can be selected directly and thrown into CHAID for analysis. It is far faster and easier to build these extra variables into the database than to calculate these values every time they are used.
The problem in database design is the “chicken/egg” situation where until you build the database, you won’t learn the tools, and until you learn the tools, you won’t be able to design the database. (Pause. At this point you may want to consider getting some outside help from someone who has done some of both parts; it will cut out months of trial and error.) As consultants, we have learned to build these systems by beginning with a sample of the data. This allows us to relatively quickly hand-craft the various files into the structure mentioned above.
No two systems are the same, but there are common threads. It is crucial to understand that there are only a handful of predictive variables within the database. Once these are identified and/or constructed, all further analysis will use the same ones repeatedly. Each field is reviewed. Often, there are wonderful field names in the files like “item category” or “customer type”. When the values in these fields are listed out, perhaps 80% are “99” meaning miscellaneous. Sometimes there will be a wonderful distribution of values, but they will not match an equally wonderful set in another file. So the orders include a wide range of key codes and there are thousands of key codes in the promotion history file… but only 50% of the order keys match the promotion file (because the system does not check to make sure the order is entered with a valid key code).
Development tools like SPSS can greatly speed the data audit process. Example analyses are crafted for each application. Sale buyers might be compared with full price buyers, individual item purchasers are compared with non-purchasers. These analyses results might be thrown into mapping to see if they live in different neighborhoods, and they might be put into CHAID for segmentation analysis. The point is that each application will require the data in its own format to work. Getting each application working with the data sets is at this point an exploratory process with many tries before even preliminary success is achieved.
Assuming the initial examples are close to acceptable, new specifications are generated for both improving the application power and building a phase II database design. It is at this point, when the database is at about 90% of its final form that it becomes possible to build the middleware. Middleware begins with development software. We use Interbase because it handles anything, is open source and allows users to talk to the database (like a query tool) but allows developers to write programs which keep all the pieces straight. The crucial parts are scripts (which define which files and fields are available), the answer tables (generated from scripts), the reports (which process the answer tables into usable .csv data files), the definition files (which allow the application software to use the data files). In addition, the user interface is all point & click for simplicity.
The ad hoc specification is met by allowing users to use put any criterion (including complex Boolean logic) for any relevant field. Users are not able to change which data is returned from which fields (because this would scramble the pieces necessary to ma ke the system work together).
Answers are NOT Enough!
So after building your analysis system, you now find all sorts of new variables which seem to help your business. You go back to your ERP or cloud … which has the most current records and you try to pull names for your upcoming mailing. Suddenly you realize that the key variables you discovered predicted so precisely who would buy do not even exist on the mainframe.
An acceptable system cannot just perform dazzling analysis, it must allow you to take action and it must allow you to select on the transaction and other key variables. There are well over a dozen mailing packages which allow you to predetermine counts for mailing, but in general, they also require that you predetermine which variables you want counted. This is a far cry from the flexibility above. There are also “black box” systems which tell you who is best to mail, then pull the names for you. Though these provide analysis capabilities, they seldom give breakthrough insight into who’s buying what or how to expand your business. The full ad hoc analysis must be carefully connected to the list selection capability.
Following the phase II system development, we finalize the database design and re-specify the application requirements. By now, we have a clear vision for what should be included and the most important predictive variables. This understanding allows implementation of the list select application. Since the same principle applies to this development, that marketing doesn’t know exactly what they will need down the road, care must be exercised to insure maximum flexibility. However, unlike the analysis system, when pulling millions of names from a system, speed must also be an important consideration.
List selection is a very procedural process. It requires certain elements to be selectable (e.g. RFM scores, ZIP and/or customer type, etc.) These should be done with buttons wherever possible. There need to be safeguards like not allowing the inadvertent selection of bad debt names. Decoy names should be added without user intervention (or perhaps even their knowledge). The system must also check for accidental internal duplication. Finally, it must allow users to split and chop selected segments to enable testing and precise mailing quantities.
To allow for the ad hoc needs of marketing, a query style template can be included within the list selection system. This query option cannot be allowed to become the backbone of the list select system. Being able to use any variable even when buried within millions of transactions will inevitably take some time. There must be a distinction drawn between variables which will be used over and over on a production basis and those which will be needed only occasionally.
The list selection system is the most procedural part of the marketing database. It must combine real programming power with the specialized unanticipated requests which marketing dreams up. For that reason, it is best designed in conjunction with the marketing analysis system.
Conclusion
Early in the database design process, it seemed that the powerful emerging query tools would allow users to get anything they needed from any database (wherever it lurked). However, the open ended nature of the query process left users confused by the data and frustrated in trying to port data into their favorite application/analysis tools. It became clear that thought marketing asked for access, they really wanted answers. Satisfied for a time with attractive analysis reporting, marketing quickly found that unless the analysis was easily translatable into action, the system would never pay for itself.
Because the marketing database system was more flexible, it soon included additional scoring and analysis variables not found on the mainframe. These additions necessitated that list selection be build on the database system. Whether you build your database on your mainframe or your notebook, it must still satisfy these simple requirements: it must be flexible enough to allow you to ask any data related question. It must be rigid enough so users can’t easily break it. It should port data to a variety of applications. It must support regular production list selection. Put 50,000,000 records in it, put it all together and you have the makings of a world class database.