Feature Analysis Visualizers¶

Feature analysis visualizers are designed to visualize instances in data space in order to detect features or targets that might impact downstream fitting. Because ML operates on high-dimensional data sets (usually at least 35), the visualizers focus on aggregation, optimization, and other techniques to give overviews of the data. It is our intent that the steering process will allow the data scientist to zoom and filter and explore the relationships between their instances and between dimensions.

At the moment we have five feature analysis visualizers implemented:

Rank Features: rank single and pairs of features to detect covariance
RadViz Visualizer: plot data points along axes ordered around a circle to detect separability
Parallel Coordinates: plot instances as lines along vertical axes to detect classes or clusters
PCA Projection: project higher dimensions into a visual space using PCA
Direct Data Visualization: plot instances by selecting subsets of features

Feature analysis visualizers implement the Transformer API from Scikit-Learn, meaning they can be used as intermediate transform steps in a Pipeline (particularly a VisualPipeline). They are instantiated in the same way, and then fit and transform are called on them, which draws the instances correctly. Finally poof or show is called which displays the image.

# Feature Analysis Imports
# NOTE that all these are available for import directly from the `yellowbrick.features` module
from yellowbrick.features.rankd import Rank1D, Rank2D
from yellowbrick.features.radviz import RadViz
from yellowbrick.features.pcoords import ParallelCoordinates
from yellowbrick.features.jointplot import JointPlotVisualizer
from yellowbrick.features.pca import PCADecomposition
from yellowbrick.features.scatter import ScatterVisualizer