scientio logo Scientio
Sign Up/In
 
XML Miner data mining demonstration
To give you an idea how XML Miner performs as a data mining tool in practice, we have assembled several well known data sets that have been used in research and converted them to XML. Select any of the following data sets and press "process" to run a live demonstration on our machine. XML Miner will be run with the source data set shown, and will mine the data according to the specification shown.
The specifications require XML Miner to put aside a randomly selected 10% of the data set. Since a different training set is therefore generated each time, the rule sets and performance will vary slightly each time you run the demonstration.

Scroll down to see the results of the processing and an XSL transformed English language version of the Metarule source generated.
Depending on processor load and the data set chosen, processing may take several seconds.

  Name Description Patterns Source data Processing Specification
Fisher's Iris Data This is the classic classification data set. There are four inputs: the petal and sepal widths and lengths taken from 3 different species of iris. The predicted value is the species, here referred to as "class". 152 Iris_data.xml Iris_spec.xml
Pima Indians Diabetes This example looks at medical data recorded from individuals assumed to have a genetic predisposition to diabetes. XML Miner must predict if they did go on to get diabetes. 768 pima_indians.xml pima_spec.xml
Wine Dataset In this example 13 mass spectrogram values are selected, created from 178 samples of wine, which were produced from 3 different vine cultivars. XML Miner must predict the cultivars from the values. 178 wine.xml wine_spec.xml
Sine Dataset This simple data set demonstrates that XML Miner may be used to predict numerical values. The data set contains angles at half degree intervals and the corresponding sine of the angle. 359 sine.xml sine_spec.xml
Credit Scoring In this case 13 different financial and personal data items were collected from 690 individuals applying for loans, some numeric, some categorical. XML Miner is asked to predict their subsequent credit performance (+,-). 690 credit.xml credit_spec.xml
House Votes 1984 This data is taken from the Congressional Quarterly Almanac and contains the voting record of the members of the house of representatives, simplified to yea and nay by ignoring pairing. Failure to vote for whatever reason causes no record to be stored. The predicted value is the party affiliation of the representative, the remaining input columns are key issues voted over in that year. 435 house_votes_84.xml house_votes_spec.xml
Books structure mining This data set illustrates Structure Mining(see here for details). This data set contains the skeletal structure of a range of fiction and non-fiction books, matching their chapter structure, indexes, tables of contents, etc.. Each is marked up as fiction or non fiction. In this example XML Miner mines the structure, i.e. the presence of various structural elements for each book, as well as their number trying to predict fiction or non-fiction. 30 books.xml booksspec.xml
Retail - Shopping basket analysis This data set illustrates Shopping basket analysis otherwise known as Association Learning. It contains the contents of 88,000 shopping baskets from a supermarket chain in Belgium. The rules returned represent items that frequently occur together, and form the basis of the recommendation or cross selling process used by online retailers such as Amazon. The contents have been anonymized to integers representing different items drawn from a total range of 16,000 items. The average number of items in a shopping basket in this data set is 11. For this example thresholds have been set high so that only a few associations are returned. Due to the very large data set, processing will take 10-20 seconds. 88,000 retail.xml retailspec.xml