Histogram

def histogram( 
    df: pd.DataFrame, 
    x: str, 
    media: str, 
    bins: int = 20, 
    color: Optional[str] = None, 
    bounding_box: Optional[str] = None, 
    title: Optional[str] = None, 
    show: bool = True, 
) -> Path:
Parameters
  • df: pd.DataFrame
    The dataframe with the data to plot
  • x: str
    The column name of the data for the histogram
  • media: str
    The column name of the media to display
  • bins: int = 20
    The number of bins to use for the histogram
  • color: Optional[str] = None
    Optional column name to color points according to the value in the column
  • bounding_box: Optional[str] = None
    Optional column to draw bounding boxes on top of the media. The bounding boxes should be a dictionary or an array of dictionaries of type:
    • xmin: Union[float, int]
    • ymin: Union[float, int]
    • xmax: Union[float, int]
    • ymax: Union[float, int]
    • label: Optional[str] = None
    • color: Optional[str] = None
    If no color is provided, a default color scheme will be used.
    The label will be displayed in the top left of the bounding box
  • title: Optional[str] = None
    The title to use for the plot.
  • show: bool = True
    Whether to show the plot or not. If show is set to True, we will start a local server to display the plot in a web browser.
    More specifically, we start a FastAPI server where we mount the webpage as a static file.
    The application itself does not require an internet connection. All data is loaded locally and does not leave your machine/browser.
    If show is set to False, we only save the required data to serve the plot later on and return the path to where the data is stored.
    If you want to serve the plot yourself later on, you can run clusterfun serve {path-to-data}|{uuid} in the command line to start a local server for the plot you are interested in.
Example
import pandas as pd
import clusterfun as clt

df = pd.read_csv("https://raw.githubusercontent.com/gietema/clusterfun-data/main/wiki-art.csv")
clt.histogram(df, x="brightness", media="img_path")