F-measure: ‘positive specific agreement’ index

Agreement Cohen's kappa Demonstration

Illustrates that the Cohen’s kappa converges on the F-measure as d increases

Chris Evans https://www.psyctc.org/R_blog/ (PSYCTC.org)https://www.psyctc.org/psyctc/
02-17-2025

Started 17.ii.25

Show code
### this is just the code that creates the "copy to clipboard" function in the code blocks
htmltools::tagList(
  xaringanExtra::use_clipboard(
    button_text = "<i class=\"fa fa-clone fa-2x\" style=\"color: #301e64\"></i>",
    success_text = "<i class=\"fa fa-check fa-2x\" style=\"color: #90BE6D\"></i>",
    error_text = "<i class=\"fa fa-times fa-2x\" style=\"color: #F94144\"></i>"
  ),
  rmarkdown::html_dependency_font_awesome()
)

Introduction

I have just discovered the F-measure, or ‘positive specific agreement’ index which is a nice measure of agreement between two raters on binary judgements but where the number of negative-negative agreements is unknown. I got to this from the paper:
Hripcsak, G. (2005). Agreement, the F-Measure, and Reliability in Information Retrieval. Journal of the American Medical Informatics Association, 12(3), 296–298. https://doi.org/10.1197/jamia.M1733

Two examples of situations in which you might need this are given as comparing the agreement between two internet searches where you might have the positive-positive agreement count, the count where the first search found a positive hit but the other didn’t and vice versa, in a traditional 2 by 2 crosstabulation those are cell counts a, b and c but we don’t have d, the number of negative-negative agreements because we simply don’t know how many documents there are on the internet (and as that number is always changing).

This just shows the 2 by 2 crosstabulation.

Show code
tribble(~counts, ~positive, ~negative,
        # "by_R1", " ", " ",
        "positive", "a", "b",
        "negative", "c", "d") %>%
  flextable()

counts

positive

negative

positive

a

b

negative

c

d

The count shown as a in the table is the count of positive agreements, b is the count of occasions on which the first internet search came back with a hit for the document but the second search didn’t, c is the count where the first internet search did not have a hit but the second search did and d is unknown as we don’t know the number of documents on the internet.

Another nice example in the paper, more pertinent to us probably than the internet search is where two raters mark parts of a text document, a, the positive-positive count might be the overlapping parts, b the number where the first rater marked that part but the second rater had no overlapping marked part, c where the second rater marked a part of the text but the first had no overlapping mark: even here, with a finite sized document we can’t know the true number of possible parts as different raters will demarcate differently.

The paper notes that Cohen’s kappa, which can only be computed where d is known, will approach the F-measure for given values of a, b and c as d is increased to any very large count. I wanted to demonstrate this.

This little code block just creates simple functions for the F-measure and for Cohen’s kappa.

Show code
### this is the function to get the F-measure value
Fmeasure <- function(a, b, c){
  ### function that computes the F-measure, or positive specific agreement
  ### based on Hripcsak, G. (2005). Agreement, the F-Measure, and Reliability in Information Retrieval. 
  ### Journal of the American Medical Informatics Association, 12(3), 296–298. https://doi.org/10.1197/jamia.M1733
  2 * a / (2 * a + b + c)
}
### Example values from the yardstick package which has function f_meas() that I used to check my own function
# Fmeasure(227, 31, 50)

simpleKappa <- function(a, b, c, d){
  ### function that computes Cohen's kappa from the four numbers of a 2x2 table (a and c are agreement)
  numerator <- 2 * (a * d - b * c)
  denominator <- (a + c) * (c + d) + (b + d) * (a + b)
  numerator / denominator
}
### checked against yardstick::kap()
# simpleKappa(227, 31, 50, 192)

### I played with vectorising my little function but didn't use this as I prefer the tidyverse way (below)
# vectorKappa <- Vectorize(simpleKappa, vectorize.args = "d")
# 
# vectorKappa(227, 31, 50, seq(192, by = 100, length.out = 20))

This is an example of data for the F-measure. (Taken from the R package yardstick.

Show code
tribble(~counts, ~positive, ~negative,
        # "by_R1", " ", " ",
        "positive", 227, 31,
        "negative", 50, NA) %>%
  flextable()

counts

positive

negative

positive

227

31

negative

50

The value for d is indeterminate. The value for the F-measure is 0.849.

The yardstick package has data two_class_example where d is given:

Show code
tribble(~counts, ~positive, ~negative,
        # "by_R1", " ", " ",
        "positive", 227, 31,
        "negative", 50, 192) %>%
  flextable()

counts

positive

negative

positive

227

31

negative

50

192

Kappa here is 0.675, much lower than the value of the F-measure for the situation in which d is unknown.

However, this shows how kappa increases asymptotically towards the value of the F-measure as d is increased from that value to 192 to a very large value, way bigger than a, b or c. (I have actually stepped d up 200 times, stepping it up by 200 each time, so to maximum value of d of 39,992.)

Show code
tibble(a = 227, b = 31, c = 50, d = list(seq(192, by = 200, length.out = 200))) %>%
  unnest_longer(d) %>%
  mutate(kappa = simpleKappa(a, b, c, d)) -> tmpTib

ggplot(data = tmpTib,
       aes(x = d, y = kappa)) +
  geom_point() +
  ### give referential intercept
  geom_hline(yintercept = Fmeasure(227, 31, 50),
             linetype = 3) +
  ylab("Kappa values") +
  xlab("d")

I think that’s pretty clear!

History

Visit count

hit counter

Last updated

Show code
cat(paste(format(Sys.time(), "%d/%m/%Y"), "at", format(Sys.time(), "%H:%M")))
17/02/2025 at 17:13

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Evans (2025, Feb. 17). Chris (Evans) R SAFAQ: F-measure: 'positive specific agreement' index. Retrieved from https://www.psyctc.org/R_blog/posts/2025-02-17-f-measure/

BibTeX citation

@misc{evans2025f-measure:,
  author = {Evans, Chris},
  title = {Chris (Evans) R SAFAQ: F-measure: 'positive specific agreement' index},
  url = {https://www.psyctc.org/R_blog/posts/2025-02-17-f-measure/},
  year = {2025}
}