Shopping Basket Analyser

(an ‘Intelligent’ data mining tool)

1. Introduction

The Shopping Basket Analyser is a decision support tool which scans large volumes of shopping basket data, and discovers hidden relationships between purchasing patterns and times, store locations, and customers. The system can be used for customer, store and product profiling and in promotion management at a strategic level.

Let us assume that we provide an input dataset containing thousands of shopping basket details with the following data attributes:

    1. - Branch Code
    2. - Day of Week
    3. - Month
    4. - Shopping time (time groupings: morning, lunch, afternoon and evening)
    5. - Pounds spent on Toiletries
    6. - Pounds spent on Frozen Food
    7. - Pounds spent on Wines and Spirits
    8. - Pounds spent on Fresh Vegetables
    9. - Pounds spent on Ready Cooked Meals
    10. - Pounds spent in Total.

Using the above data, the system automatically categorises each field into their fuzzy/crisp groupings, and then puts forward and tests thousands of hypotheses to ‘explain’, for example, the LOW spenders in total (field 10 above) using the other 9 fields. Depending on the size of the dataset, the system spends some time on generating, testing and analysing relationships. In the end it reports a list of high scoring relationships in terms of their validity in the given dataset. A typical relationship could say:

any day apart from Sunday, in May or June, customers who spend LOW on Fresh Vegetable and Ready Cooked meals are LOW spenders in total,

[shopping time or store is not significant], [also the expenditure on Toiletries, Frozen Food, Wines and Spirits do not matter].

Here, LOW is a fuzzy category of relative measure of spending, it can also be defined externally by a given business criterion. The system does not stop with this report and continues to search for relationships to further explain HIGH spending in total, or HIGH spending on wine and spirits, etc.

Datasets may contain millions of entries from checkout records obtained from hundreds of shops. More data will improve the quality and the validity of relationships but with the impact of slowing down the computational process. The system is not limited to the kind of inputs presented above, and can handle other information such as weather conditions and customer specific information from customer loyalty databases. The system is designed for marketing analysts searching for hidden relationships in data or for testing their own hypotheses.

2. System Description

The system (Figure 1) contains the following modules:

Figure 1. System Diagram

Intelligent Data Miner

The intelligent engine is a hybrid of Fuzzy Logic and Genetic Algorithms. It generates and tests relationships, and reports only significantly valid relationships in the form of fuzzy logic statements.

Graphical User Interface

GUI uses any Java-enabled internet browser, allowing the selection of data fields, and the starting and monitoring of the data mining process.

Data Warehouse

A fast-access data warehouse platform which holds large amounts of shopping basket, store and customer data.

3. Software Specification

Shopping Basket Analyser is written in Java, and is hardware platform independent. It runs on any PC or a Unix workstation within an internet browser. The system needs to be integrated with existing databases/warehouses, and if required it could run as a background process continuously reporting relationships. For the system to be implemented the business user has to provide the following:

4. Performance

The main strength of the system over classic decision support tools is that it automatically generates conjunctive relationships across a wide variety of data fields. Significant relationships can then be used to support or put forward marketing strategies. If supplied with appropriate data, it could for example link location of a product with customer type and store characteristics. The system can also be used to test complex marketing hypotheses involving a wide variety of data fields.

As the system uses a computationally intensive algorithm it requires high performance computers. For small datasets a Pentium PC is sufficient, for large applications the system may need to run on a network of computers, particularly if there are tight time constraints.

As it is, the system is an interactive decision support tool requiring the manual selection of inputs and the assessment of results by marketing analysts. We are currently working to fully automate the system so that it runs continuously as a background process (an ‘intelligent agent’) reporting only the significant relationships to a number of users.

 

5. Implementation Scenarios

We envisage the following possible options:

5.1 - Computational Service

This option allows both parties to work in their area of expertise. The client provides input datasets, we generate relationships, and report to the client at agreed intervals. After a short test period we could have a longer term service agreement.

5.2 - Licensing the Software

The system can be fully or partly licensed by the client and used at the client’s premises. This option requires a customisation/familiarisation and training period on the use of the software. There are three possibilities:

5.2.1 - Full System

This option covers the full implementation and delivery of the following modules:

a - Intelligent Data Miner

b - Graphical User Interface

c - Integration with databases

5.2.2 - Intelligent Data Miner and Graphical User Interface

This option includes the Intelligent Data Miner and Graphical User Interface as described above without integration with databases, the system will work as demonstrated from a distributed network operating on ASCII files across the internet/intranet.

5.2.3 - Intelligent Data Miner only

The code for the Intelligent Data Miner is available in Java, C or C++. It reads ASCII flat files and writes the outputs into similar files. It needs to be tuned to the specific requirements of the retailer. If the client wishes, help is provided in order to link the Intelligent Data Miner to existing databases.