|
XML Miner data mining demonstration
|
To give you an idea how XML Miner performs as a data mining tool in practice, we have assembled several well known data sets that have been used in research and converted them to XML.
Select any of the following data sets and press "process" to run a live
demonstration on our machine. XML Miner will be run with the source data set shown, and will mine the data according to the specification shown.
The specifications require XML Miner to put aside a randomly selected 10% of the data set. Since a different training set is therefore generated each time, the rule sets and performance will vary slightly each time you run the demonstration.
Scroll down to see the results of the processing and an XSL transformed English language version of the Metarule source generated.
Depending on processor load and the data set chosen, processing may take
several seconds.
|
| |
Name |
Description |
Patterns |
Source data |
Processing Specification |
|
|
Fisher's Iris Data |
This is the classic classification data set. There are four inputs: the petal and sepal widths and lengths taken from 3 different species of iris.
The predicted value is the species, here referred to as "class".
|
152 |
Iris_data.xml |
Iris_spec.xml |
|
|
Pima Indians Diabetes |
This example looks at medical data recorded from individuals assumed to have a genetic predisposition to diabetes.
XML Miner must predict if they did go on to get diabetes. |
768 |
pima_indians.xml |
pima_spec.xml |
|
|
Wine Dataset |
In this example 13 mass spectrogram values are
selected, created from 178 samples of wine, which were produced
from 3 different vine cultivars.
XML Miner must predict the cultivars from the values.
|
178 |
wine.xml |
wine_spec.xml |
|
|
Sine Dataset |
This simple data set demonstrates that XML Miner may be used to predict numerical values. The data set contains angles at half degree intervals and the corresponding sine of the angle.
|
359 |
sine.xml |
sine_spec.xml |
|
|
Credit Scoring |
In this case 13 different financial and personal data items were collected from 690 individuals applying for loans, some numeric, some categorical. XML Miner is asked to predict their subsequent credit performance (+,-).
|
690 |
credit.xml |
credit_spec.xml |
|
|
House Votes 1984 |
This data is taken from the Congressional Quarterly Almanac and contains the voting record of the members of the house of representatives, simplified to yea and nay by ignoring pairing. Failure to vote for whatever reason causes no record to be stored. The predicted value is the party affiliation of the representative, the remaining input columns are key issues voted over in that year.
|
435 |
house_votes_84.xml |
house_votes_spec.xml |
|
|
Books structure mining |
This data set illustrates Structure Mining(see here for details).
This data set contains the skeletal structure of a range of fiction and non-fiction books, matching their chapter structure, indexes, tables of contents, etc..
Each is marked up as fiction or non fiction. In this example XML Miner mines the structure, i.e. the presence of various structural elements for each book, as well as their number
trying to predict fiction or non-fiction.
|
30 |
books.xml |
booksspec.xml |
|
|
Retail - Shopping basket analysis |
This data set illustrates Shopping basket analysis otherwise known as Association Learning.
It contains the contents of 88,000 shopping baskets from a supermarket chain in Belgium.
The rules returned represent items that frequently occur together, and form the basis of the recommendation or cross selling process used by online retailers such as Amazon.
The contents have been anonymized to integers representing different items drawn from a total range of 16,000 items.
The average number of items in a shopping basket in this data set is 11. For this example thresholds have been set high so that only a few associations are returned.
Due to the very large data set, processing will take 10-20 seconds.
|
88,000 |
retail.xml |
retailspec.xml |
|
|
|
|
|
|