Installation

You can create plots that open in your browser by installing the Python library:
pip install clusterfun
A simple example:
import pandas as pd
import clusterfun as clt
                       
df = pd.read_csv("https://raw.githubusercontent.com/gietema/clusterfun-data/main/wiki-art.csv") 
clt.scatter(df, x="x", y="y", media="img_path", color="painter")
As you can see, a clusterfun plot takes as input a pandas dataframe and column names indicating which columns to use for the visualisation. In this way, it is similar to the seaborn or the plotly library. But in clusterfun, you can:
  • Click and drag to select data to visualise it in a grid
  • Hover over data points to see them on the right side of the page
  • Click on data points to view zoomed in versions of the image related to the data point
This makes clusterfun ideal for quickly visualising image data, which can be useful in the context of building datasets, exploring edge cases and debugging model performance.
Data loading

Clusterfun supports AWS S3 and local data storage and loading. The media column in the dataframe will be used to determine where to load the media from. S3 media should start with s3://.
Make sure to set a AWS_REGION environment variable to the region where your data is stored. Support for Google Cloud Storage is coming soon.


Usage

Plot types
The following plot types are available:
Color

You can color different categories with the color parameter.

Bounding box

You can visualise bounding boxes on top of your images by with the bounding_box parameter. For this to work, you need to have a bounding box column in the dataframe used to plot the data. Each cell in the dataframe needs to contain a dictionary or a list of dictionaries with bounding box values: xmin, ymin, xmax, ymax, label (optional), color (optional).
Example of a bounding box:


single_bounding_box = {
    "xmin": 12,
    "ymin": 10,
    "xmax": 100,
    "ymax": 110,
    "color": "green",
    "label": "ground truth"
}
  • The bounding box coordinates can be either floats or integers.
  • The color can be either a color name or hex value