We ran AutoViz on the famous iris flower data set. Here's what we got...
This is a Multi-Class VISUALIZATION problem and typeclass is the Target Variable.
Please select the type of Plots and give you comments below it.
Scatter Plots can show positive or negative trends and linear or non-linear relationships between a numeric variable and the target variable.
From the Scatter Plots that AutoViz shows, out of all the three species, virginica must be the biggest flower and setosa must be the smallest as far as petal length and petal width is concerned.
Pairwise Scatter Plots can show positive or negative interactions and linear or non-linear relationships between a numeric variable and another numeric variable.
In this case, Petal Length and Petal Width seem to have an interaction which helps discriminate among the three species visually.
A Histogram shows the distribution of variables individually.
From the Histograms that AutoViz shows, we can see that some distributions of variables such as Sepal Length and Petal Length can be very useful to discriminate among the three species.
A Violin Plot is similar to a Kernel Density Distribution plot of a Continuous Variable.
From the Violin Plots of AutoViz, we can see that almost all variables are well behaved except for Sepal Width in Setosa species which has a few Outliers...
A Distribution Plot shows a Histogram or Distribution of a Continuous Variable.
From the Distribution of the Target Variable, we can see that all three species are equally likely to be found in the data set which is good.
Heatmaps help us visualise the correlation between every set of variables in the dataset.
From the charts below, we can see that Sepal Length and Sepal Width have the strongest correlation to each other and can perhaps be combined into one new variable.
Bar Plots are used to compare and summarize numeric data by different groups or categories in a data set.
From the Bar Plots below, we can see that Petal Length and Width are much better at differentiating the three species than Sepal Length and Width.
Time Series Plots are used track changes in continuous variables by date/time variables in a data set.
There are no Time Series Plots in Iris since there is no Date/Time variable in this data set.