Skip to content

T-SNE - T-distributed Stochastic Neighbor Embedding


What is T-SNE ?

  • Different with PCA, T-SNE is a non-linear technique.
  • Different with PCA, T-SNE CANNOT be reused for new data. It is only used for data exploration.
    • Because - “t-SNE has a cost function that is not convex, i.e. with different initializations we can get different results.”
  • Perplexity is a target number of neighbors for the central point. Normally 5~50 i.e. how large is the “circle”

Resources

Usage:


How to implement ?

Assumption:

  • It is good to visualize high dimensional data, BUT we need to reduce the dimension by ourselves as a preprocess. So that the dimension is less than a small number like 50
    • e.g. apply PCA before inputting the data into TSNE
  • The number of sample cannot be too large, otherwise the memory/computation time explode

Implementation:

It can be as easy as:

from sklearn.manifold import TSNE

X_embedded = TSNE(n_components=2, learning_rate='auto',
                  init='random', perplexity=20).fit_transform(flat_x)
X_embedded.shape

How it works ?

From the guide on medium, its visualization is great:

1st step :

For each data points, generate a normal distribution with this point as the mean.

And the Euclidean distance as the x-axis in distribution

(Note that the distribution is not exactly like this in t-SNE, here is simplified for explanation)

Untitled

The original distribution should be like this : (img also from medium post)

Untitled

2nd Step :

Create a new & low-dimension space.

Put all the data points randomly on this space

Like in 1st step, for each point create a Student t-distribution with a single degree of freedom

3rd Step : Gradient descent

To optimize the distribution from above (because we put the points randomly),

Kullback-Leibler divergence is used, between the 2 distribution we created.

This gradient help to “move” the point to its suitable position

https://miro.medium.com/max/1400/0*gx5m_CS7gVUn8WLH.gif


Usage Example :

For visualizing & exploring parameters inside a CNN network, that always seems a blackbox, because its parameters in hidden layers are too high-dimensional.

But t-SNE can be used here to visualize the parameters.


Comments