Khiops © software for data mining
Orange Labs software for mining large databases
Domain & Definition
Because proper knowledge of customers' behavior and market dynamics is now essential to master business opportunities, the use of efficient data mining solutions has become a key factor of success in all economic fields.
Data preparation is a very important stage of the data mining process. But it is often carried out manually, and requires the involvement of skilled statisticians.
Up to now, it was very difficult to have a software that speeds up information analysis, and preparation of data before modeling.
Orange Labs has developed a data mining solution called Khiops that automates the descriptive and exploratory phases of data analysis, as well as preparation of data before the phase of modeling for supervised classification.
Khiops makes easier the analysis of the predictive value of the variables, as this phase can take up to 70% of the time spent on building a predictive model.
The scoring phase, that starts after the data preparation is achieved, builds an efficient predictive model by combining the information carried out by all the descriptive variables. This model can be deployed to score new data.
The Khiops solution includes the following components:
- Khiops: back-end, main data preparation and scoring software
- Khiops Visualization: to visualize results from Khiops component
- Khiops Coclustering: back-end, to analyze correlation between variables using hierarchical coclustering
- Khiops Covisualization: to visualize results from Khiops Coclustering component
The results of the comparative benchmarks carried out with Khiops showed the great quality of the results of discretizations and value groupings, on all the following criteria:
- Accuracy of prediction
- Robustness (ratio of the accuracy between test and training)
- Strong resistance to noise
- High explanatory capacity (low number of intervals or groups)
- Computational efficiency
It has already been applied in problems with up to tens of thousands of variables and hundreds of thousands of instances, and has proved its ability to find the most discriminating variables.
The most valuable characteristic of the data preparation method potentially resides in the robust explanation of the data it provides: the software method builds the most probable discretization-based, grouped-based or coclustering-based explanation of the data.
Benefits to users
Khiops will definitely save time in the construction of forecasting models, as it performs crunching tasks and data scoring. It helps searching for discriminating variables on very large data samples, analyzing the predictive value of attributes, and producing statistical reports that can easily be exported by copy-paste. It also includes an entirely automatic software to build scoring models, which has proved to be very efficient; it is very quick, even on very large datasets.
Khiops Coclustering is able to detect highly informative patterns by the mean of hierarchical coclustering models, suitable for the task of explanatory analysis. This novel type of statistical analysis provides insights in many domains, such as:
- Market analysis: clusters of customers versus clusters of products
- Text corpus analysis: clusters of texts versus clusters of words
- Web log analysis: clusters of cookies versus clusters of web pages
- Graph analysis: clusters of source versus target nodes
Who is it for ?
This application is licensed to companies and/or organizations which believe that availability of clean, formatted and ready-to-use information is crucial to undergo a rich and successful data mining process.
- Telecom, Water, Energy
- Banking, Financial
- Government/ Administrations