We ran AutoViz on the famous Titanic data set. Here's what we got...
The Titanic data set has 2 classes. Here "Survived" is the Target Variable.
Please start Visualization below or select a Tab above to see a specific plot.
Scatter Plots can show positive or negative trends and linear or non-linear relationships between a numeric variable and the target variable.
From the Scatter Plots, we can see a slight relationship between Fare and Survived and similarly between Age and Survived.
Pairwise Scatter Plots can show positive or negative interactions and linear or non-linear relationships between a numeric variable and another numeric variable.
We can see a linear relationship between Age and Fare here.
A Histogram shows the distribution of variables individually.
From the Histograms of AutoViz, we can see that the Fare distribution is strongly positively skewed and can benefit from a Log Transformation...
A Violin Plot is similar to a Kernel Density Distribution plot of a Continuous Variable.
From the Violin Plots of AutoViz, we can see that Fare, Parents/Children and Spouses/Siblings variables are highly skewed and there are some Big Outliers...
A Distribution Plot shows a Histogram or Distribution of a Continuous Variable.
From the Distribution of the Target Variable, we can see that Survived=0 is more prevalent than Survived=1 but not extremely so.
Heatmaps help us visualise the correlation between every set of variables in the dataset.
From this, we can see that the two variables with the strongest correlation to Target are Fare and Siblings/Spouses variable.
Bar Plots are used to compare and summarize numeric data by different groups or categories in a data set.
From the Bar Plots of Average Target against Pclass, we can see that Pclass is a good predictor of Survived (the Target Variable)
Time Series Plots are used track changes in continuous variables by date/time variables in a data set.
There are no Time Series Plots in Titanic since there is no Date/Time variable in this data set.