We ran AutoViz on the famous Boston Housing data set. Here's what we got...
The Boston Housing data set is a Regression problem. Here "MV" is the Target Variable.
Please start Visualization below or select a Tab above to see a specific plot.
Scatter Plots can show positive or negative trends and linear or non-linear relationships between a numeric variable and the target variable.
From the Scatter Plots that AutoViz shows, we can see that RM (average number of rooms per dwelling) is almost linearly correlated to Median Value (the target variable).
Comments:
Pairwise Scatter Plots can show positive or negative interactions and linear or non-linear relationships between a numeric variable and another numeric variable.
Comments:
A Histogram shows the distribution of variables individually.
From the Histograms that AutoViz shows, we can see that some distributions of variables such as AGE and LSTAT are skewed, and can benefit from a Log Transformation...
Comments:
A Violin Plot is similar to a Kernel Density Distribution plot of a Continuous Variable.
From the Violin Plots that AutoViz shows, we can see that the CRIM (crime rates) variable is very close to zero most of the time but there are some Big Outliers...
Comments:
Heatmaps help us visualise the correlation between every set of variables in the dataset.
From this, we can see that the two variables with the strongest correlation to MV are the LSTAT(% lower status of the population) and RM (average number of rooms per dwelling).
Comments:
Bar Plots are used to compare and summarize numeric data by different groups or categories in a data set.
From the Bar Plots of Average MV against CHAS (1 if the tract bounds a river; 0 otherwise), we can see that CHAS is a good predictor of Median Value (the Target Variable).
Comments:
Time Series Plots are used track changes in continuous variables by date/time variables in a data set.
There are no Time Series Plots in Boston Housing since there is no Date/Time variable in this data set.
Comments: