|
|
|
|
|
|
|
XMLMiner is a web service and class library for mining data and text expressed in XML, which extracts knowledge and re-uses that knowledge in the form of fuzzy logic expert system rules.
XmlMiner is a completely new development in data mining. Although it can perform the same kind of processing as other data mining systems, it is the first and only product that can also mine semi-structured data sources such as XML.
XMLMiner can also be used as a full featured, low cost and online Business Rules system.
|
Features
- Use it to predict numeric values, categorize and classify data, infer the relevance and topics in text, and to mine the structure of XML documents.
- XML data is everywhere, can be easily generated from any data source, but can be unstructured and sparse. XmlMiner is the first data mining tool to mine any data that can be expressed in XML.
- XmlMiner is configured via XML, reads XML, and creates results in XML using our Metarule
- XmlMiner performs both Supervized learning of numeric, categorical, structural or textual values to a given numeric or categorical output and Association learning, where a data set is searched for all useful relationships between data or structural values.
- You can convert Metarule to easily understood English language if...then rules using an XSL transform we supply, so you can see what's been discovered.
- You can apply Metarule rules to new data, either supplied directly or embedded in an XML document and have the results available for use in your programs or embedded into a copy of the source XML.
- XmlMiner is standards-based and compatible with other standards-based tools.
- XmlMiner comes with development tools integrated into the web service.
- XmlMiner integrates text mining seamlessly so that blocks of embedded text can be handled at the same time as numeric and categorical data.
|
Handling uncertainty
- The user can specify a percentage of the training data to set aside as a test set.
- The fuzzy logic rule induction process minimizes and optimizes the rule set created and annotates each rule with a degree of certainty, scaled 0-1, based on the support for the rule in the source data.
- The rule set is tested using the runtime processor fuzzy logic inference engine on both the training data and any test data set aside.
- XML Miner reports performance for both sets of data, as RMS error for numeric variables or percentage correct for categorical.
- XML Miner also reports the percentage of any training or test patterns that did not generate a valid output with usable confidence level.
- The runtime processor supports Fuzzy Tri-state logic. Internal logical values can be 0-1 for fuzzy degrees of truth, or -1, representing an unknown state.
- The runtime processor's logical and numeric processing handles and correctly propagates unknown data states through the rules. A missing data value on one of the inputs does not prevent processing if the output can be inferred from other sources.
- For each output a confidence value is generated. If the combination of inputs presented did not fire any rules then an unknown state is signaled.
- Each rule can be annotated with a certainty factor in the range 0-1 which is used in the rule aggregation process.
- All inputs to the runtime processor can be either crisp values, or values with associated uncertainty.
- Numeric outputs from the runtime processor are supplied as a crisp central value, and as a fuzzy number, complete with certainty value.
- For categorical outputs, the runtime processor supplies both the most likely category, with associated uncertainty, and an ordered list of alternate categories (if they exist) annotated with confidence figures.
|
|
|
|