Overview of Free Software Tools for Data Mining

undefined
A
N
 
OVERVIEW
 
OF
 
FREE
 
SOFTWARE
 
TOOLS
 
FOR
GENERAL
 
DATA
 
MINING
Alan Jović, Karla Brkić, Nikola
Bogunović
E-mail: {alan.jovic, karla.brkic, nikola.bogunovic}@fer.hr
Faculty of Electrical Engineering and Computing, University of
Zagreb
Department of Electronics, Microelectronics, Computer and
Intelligent Systems
C
ONTENTS
Motivation and goal
DM tools’ general characteristics
DM algorithms supported
DM advanced tasks supported
Overall recommendations
Conclusion
2/10
M
OTIVATION
A 
problem
 that requires DM
business-oriented (e.g. churn detection, direct marketing,
sentiment analysis...)
research-oriented (e.g. computer vision, biomedical data
analysis, chemometrics...)
Many 
algorithms
 for DM
Which one should I use? 
Are there any others similar?
Many open-source and commercial DM 
tools
 available
Steady development progress in the last 20-25 years
Wikipedia currently lists more than 30 significant DM tools,
many specialized
3/10
G
OAL
Provide a detailed overview of the most commonly
used free general DM tools
“Most commonly used” is
   based on KDnuggets 2013 poll:
Considered tools include
RapidMiner
R
Weka
KNIME
Orange
scikit-learn
4/10
DM 
TOOLS
 
GENERAL
 
CHARACTERISTICS
5/10
DM 
ALGORITHMS
 
SUPPORT
 
6/10
An excerpt from Table II (18 categories, ~70 methods):
Support level
+
 
  
   supported by the tool
A
 
  
  supported in an add-on for the tool
S
 
  
  somewhat supported – possible to achieve, but not
directly supported or supported only in part
  
  not supported
DM 
ADVANCED
 
TASKS
 
SUPPORT
 
7/10
O
VERALL
 
RECOMMENDATIONS
RapidMiner: 
many DM algorithms (also can import Weka’s
methods), extendable, steady learning curve, recent problems with
licensing
R: 
strong in statistics and DM algorithms,
 
extendable, fast
implementations, complexity of extensions, not user-friendly – some
improvement with Rattle GUI
Weka: 
many DM algorithms, user-friendly, extendable, not the best
choice for data visualization or advanced DM tasks at this time
Orange: 
user-friendly, visually appealing GUI, moderate DM
algorithms coverage, doesn’t cover advanced DM tasks at this time
KNIME: 
user-friendly, extendable (e.g. Weka, R), covers most of the
advanced DM tasks as add-ons, no significant downsides
scikit-learn: 
great documentation, fast implementations, moderate
DM algorithms coverage, not user-friendy
8/10
 
C
ONCLUSION
Choice of DM tool typically depends on the problem at hand,
experience of the DM user, and user-friendliness of the tool
This study provided an overview into DM algorithms
implementations coverage for several important DM tools
Based on the overview, we can recommend RapidMiner, R, Weka
and KNIME tools
Orange and scikit-learn are still not as powerful, but have their
specific advantages
Other free general DM tools still fall behind
Further progress of the tools might be in adoption and perhaps
integration of extensions for recent more advanced DM tasks
Also, further integration of methods (collaboration) between the
free tools is expected
9/10
 
T
HANK
 
YOU
!
 
10/10
Slide Note
Embed
Share

In the realm of data mining, a plethora of free software tools are available to assist in various tasks such as churn detection, sentiment analysis, and more. This overview delves into the characteristics, algorithms, and support offered by popular tools like RapidMiner, Weka, Orange, KNIME, and scikit-learn. Dive into this comprehensive guide to explore the world of general data mining tools.

  • Data Mining
  • Software Tools
  • Free
  • Algorithms
  • Overview

Uploaded on Feb 26, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. AN OVERVIEW OF FREE SOFTWARE TOOLS FOR GENERAL DATA MINING Alan Jovi , Karla Brki , Nikola Bogunovi E-mail: {alan.jovic, karla.brkic, nikola.bogunovic}@fer.hr Faculty of Electrical Engineering and Computing, University of Zagreb Department of Electronics, Microelectronics, Computer and Intelligent Systems

  2. CONTENTS Motivation and goal DM tools general characteristics DM algorithms supported DM advanced tasks supported Overall recommendations Conclusion 2/10

  3. MOTIVATION A problem that requires DM business-oriented (e.g. churn detection, direct marketing, sentiment analysis...) research-oriented (e.g. computer vision, biomedical data analysis, chemometrics...) Many algorithms for DM Which one should I use? Are there any others similar? Many open-source and commercial DM tools available Steady development progress in the last 20-25 years Wikipedia currently lists more than 30 significant DM tools, many specialized 3/10

  4. GOAL Provide a detailed overview of the most commonly used free general DM tools Most commonly used is based on KDnuggets 2013 poll: Considered tools include RapidMiner R Weka KNIME Orange scikit-learn 4/10

  5. DM TOOLS GENERAL CHARACTERISTICS Characteristic RapidMiner R Weka Orange KNIME scikit-learn Univ. of Waikato, New Zealand Univ. of Ljubljana, Slovenia RapidMiner, Germany worldwide development KNIME.com AG,Switzerland multiple; support: INRIA, Google Developer: Programming language: C++, Python, Qt framew. Python+NumPy+ SciPy+matplotlib Java C, Fortran, R Java Java open s. (v.5 or lower); closed s., free Starter ed. (v.6) free software, GNU GPL 2+ open source, GNU GPL 3 open source, GNU GPL 3 open source, GNU GPL 3 License: FreeBSD Current version: GUI / command line: 6 3.02 3.6.10 2.7 2.9.1 0.14.1 both; (GUI for DM = Rattle) GUI both both GUI command line general data mining sci. computation and statistics general data mining general data mining general data mining machine learning package add-on Main purpose: Community support (est.): large (~200 000 users) very large (~ 2 M users) moderate (~ 15 000 users) large moderate moderate 5/10

  6. DM ALGORITHMS SUPPORT An excerpt from Table II (18 categories, ~70 methods): Category Method RapidMiner R Weka Orange KNIME scikit-learn ID3 A (Weka) + + A (Weka) C4.5 A (Weka) A (RWeka) + + Decision tree learner CART A (Weka) A (RWeka) + + A (Weka) + (optimized) +, A (own*, dec. stump) +, A (own*, RWeka) others + (dec. stump) + (own*) + (own*) Support level + supported by the tool A supported in an add-on for the tool S somewhat supported possible to achieve, but not directly supported or supported only in part not supported 6/10

  7. DM ADVANCED TASKS SUPPORT Name RapidMiner R Weka Orange KNIME scikit-learn S (CLI, knowl. flow, distributedWekaH adoop) S (not free: Radoop) Big data A (ff, ffbase) A S Link, graph mining Spatial data analysis Time-series analysis Semi-super-vised learning A (igraph, sna) A A A (ggmap) A S S (several time series filters) S (timeseries module has bugs) + (label propagation) A +, A(forecast) + S A (upclass) S S A Data streams + A (stream) (massiveOnlineAn alysis) + S A (tm, Text mining A S A A + RTextTools, qdap) A (snow, multicore) S (darch: incomplete) Paralelization S (enterprise ed.) S + A (joblib) S (Restricted Boltzmann Mach.) Deep learning 7/10

  8. OVERALL RECOMMENDATIONS RapidMiner: many DM algorithms (also can import Weka s methods), extendable, steady learning curve, recent problems with licensing R: strong in statistics and DM algorithms, extendable, fast implementations, complexity of extensions, not user-friendly some improvement with Rattle GUI Weka: many DM algorithms, user-friendly, extendable, not the best choice for data visualization or advanced DM tasks at this time Orange: user-friendly, visually appealing GUI, moderate DM algorithms coverage, doesn t cover advanced DM tasks at this time KNIME: user-friendly, extendable (e.g. Weka, R), covers most of the advanced DM tasks as add-ons, no significant downsides scikit-learn: great documentation, fast implementations, moderate DM algorithms coverage, not user-friendy 8/10

  9. CONCLUSION Choice of DM tool typically depends on the problem at hand, experience of the DM user, and user-friendliness of the tool This study provided an overview into DM algorithms implementations coverage for several important DM tools Based on the overview, we can recommend RapidMiner, R, Weka and KNIME tools Orange and scikit-learn are still not as powerful, but have their specific advantages Other free general DM tools still fall behind Further progress of the tools might be in adoption and perhaps integration of extensions for recent more advanced DM tasks Also, further integration of methods (collaboration) between the free tools is expected 9/10

  10. THANK YOU! 10/10

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#