Thanks for the feedback, all points are very useful!

Feb 15, 2021

- In fact, the main reason for the 2D embedding was to be able to visualize the clustering in a clear fashion on the scatter plot. But what's important is the final clustering picture, so one could use more dimensions in UMAP and just skip the intermediate plotting;

- Yes, n_neighbors and weights in UMAP, the eps and min_samples in DBSCAN, all should be tuned. I guess it would be beneficial to formalize the metric of what I want from the clustering, for example, to limit the variance per sample within each cluster, in order to keep them internally uniform. And with that in mind, optimize the parameters to get as few clusters as possible.

Written by Egor Vorontsov

No responses yet