Shopping Basket Analyser
(an ‘Intelligent’ data mining tool)
1. Introduction
The Shopping Basket Analyser is a decision support tool which scans large volumes of shopping basket data, and discovers hidden relationships between purchasing patterns and times, store locations, and customers. The system can be used for customer, store and product profiling and in promotion management at a strategic level.
Let us assume that we provide an input dataset containing thousands of shopping basket details with the following data attributes:
Using the above data, the system automatically categorises each field into their fuzzy/crisp groupings, and then puts forward and tests thousands of hypotheses to ‘explain’, for example, the LOW spenders in total (field 10 above) using the other 9 fields. Depending on the size of the dataset, the system spends some time on generating, testing and analysing relationships. In the end it reports a list of high scoring relationships in terms of their validity in the given dataset. A typical relationship could say:
any day apart from Sunday, in May or June, customers who spend LOW on Fresh Vegetable and Ready Cooked meals are LOW spenders in total,
[shopping time or store is not significant], [also the expenditure on Toiletries, Frozen Food, Wines and Spirits do not matter].
Here, LOW is a fuzzy category of relative measure of spending, it can also be defined externally by a given business criterion. The system does not stop with this report and continues to search for relationships to further explain HIGH spending in total, or HIGH spending on wine and spirits, etc.
Datasets may contain millions of entries from checkout records obtained from hundreds of shops. More data will improve the quality and the validity of relationships but with the impact of slowing down the computational process. The system is not limited to the kind of inputs presented above, and can handle other information such as weather conditions and customer specific information from customer loyalty databases. The system is designed for marketing analysts searching for hidden relationships in data or for testing their own hypotheses.
2. System Description
The system (Figure 1) contains the following modules:

Figure 1. System Diagram
Intelligent Data Miner
The intelligent engine is a hybrid of Fuzzy Logic and Genetic Algorithms. It generates and tests relationships, and reports only significantly valid relationships in the form of fuzzy logic statements.
Graphical User Interface
GUI uses any Java-enabled internet browser, allowing the selection of data fields, and the starting and monitoring of the data mining process.
Data Warehouse
A fast-access data warehouse platform which holds large amounts of shopping basket, store and customer data.
3. Software Specification
Shopping Basket Analyser is written in Java, and is hardware platform independent. It runs on any PC or a Unix workstation within an internet browser. The system needs to be integrated with existing databases/warehouses, and if required it could run as a background process continuously reporting relationships. For the system to be implemented the business user has to provide the following:
4. Performance
The main strength of the system over classic decision support tools is that it automatically generates conjunctive relationships across a wide variety of data fields. Significant relationships can then be used to support or put forward marketing strategies. If supplied with appropriate data, it could for example link location of a product with customer type and store characteristics. The system can also be used to test complex marketing hypotheses involving a wide variety of data fields.
As the system uses a computationally intensive algorithm it requires high performance computers. For small datasets a Pentium PC is sufficient, for large applications the system may need to run on a network of computers, particularly if there are tight time constraints.
As it is, the system is an interactive decision support tool requiring the manual selection of inputs and the assessment of results by marketing analysts. We are currently working to fully automate the system so that it runs continuously as a background process (an ‘intelligent agent’) reporting only the significant relationships to a number of users.
5. Implementation Scenarios
We envisage the following possible options:
5.1 - Computational Service
This option allows both parties to work in their area of expertise. The client provides input datasets, we generate relationships, and report to the client at agreed intervals. After a short test period we could have a longer term service agreement.
5.2 - Licensing the Software
The system can be fully or partly licensed by the client and used at the client’s premises. This option requires a customisation/familiarisation and training period on the use of the software. There are three possibilities:
5.2.1 - Full System
This option covers the full implementation and delivery of the following modules:
a - Intelligent Data Miner
b - Graphical User Interface
c - Integration with databases
5.2.2 - Intelligent Data Miner and Graphical User Interface
This option includes the Intelligent Data Miner and Graphical User Interface as described above without integration with databases, the system will work as demonstrated from a distributed network operating on ASCII files across the internet/intranet.
5.2.3 - Intelligent Data Miner only
The code for the Intelligent Data Miner is available in Java, C or C++. It reads ASCII flat files and writes the outputs into similar files. It needs to be tuned to the specific requirements of the retailer. If the client wishes, help is provided in order to link the Intelligent Data Miner to existing databases.