Fowlkes-Mallows Score

The Fowlkes-Mallows score is the geometric mean of the precision and recall, calculated by this formula 👇 If you're unsure what precision and recall are, you can read more about them here.

Fowlkes-Mallows \ = \sqrt{precision * recall}

Fowlkes \ Mallows \ = \frac{TP} {\sqrt (TP + FP)(TP + FN)}

This is a metrics that is used to compatibly compare the similarity between two clusters from different clustering algorithms. The higher the value the more similar the clusters are.

Higher the score, the better. It varies between 0 and 1.

TP: True Positives refers to the number of pairs of observations that are part of the same cluster in both the predicted and the true clusters.

FP: False Positives refers to the number of pairs of observations that are predicted to be in the same cluster, but are actually in different clusters.

FN: False Negatives refers to the number of pairs of observations that are in the same cluster in the predicted clusters, but are in different clusters in the true clusters.

Code example#

Source

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/PacktPublishing/Python-Data-Analysis-Third-Edition/master/Chapter11/diabetes.csv")
feature_set = ['pregnant', 'insulin', 'bmi','age','glucose','bp','pedigree']
features = df[feature_set]target = df.label
# partition data into training and testing setfrom sklearn.model_selection import train_test_splitfeature_train, feature_test, target_train, target_test = train_test_split(features, target, test_size=0.3, random_state=1)
# Import K-means Clusteringfrom sklearn.cluster import KMeansfrom sklearn.metrics import fowlkes_mallows_score
# Specify the number of clustersnum_clusters = 2
# Create and fit the KMeans modelkm = KMeans(n_clusters=num_clusters)km.fit(feature_train)
# Predict the target variablepredictions = km.predict(feature_test)
# Calculate internal performance evaluation measuresprint("Fowlkes Mallows Score:", fowlkes_mallows_score(target_test, predictions))