GRAFT: phylogenetic signal in patent applications across the tree of life
Van Criekinge, W.
Show abstract
Whether closely related species are repurposed for similar biotechnologies -- a phylogenetic signal in human technological interest -- has lacked a tractable test at scale. We built GRAFT (Graph of Relatedness, Applications, Families and Taxonomy), a Neo4j knowledge graph linking the Open Tree of Life synthetic taxonomy (4.53 x 106 taxa)1 to multilingual common names2,3 and to a Google Patents BigQuery patent layer from a single 257 GB SQL scan, recovering 22,876 species in 759,182 patents with all CPC and IPC class definitions resolved. Treating each species CPC-subclass profile as a binary application vector, we tested the correlation between pairwise topological phylogenetic distance and pairwise Jaccard distance of patent profiles by Mantel test6 (999 permutations, n = 9,944 species at [≥]5 patents, 49,436,596 pairs). The global correlation was significant (Pearson r = +0.188, one-sided p = 0.001), with Bonferroni-significant phylogenetic signal in every close-distance bin from sister-species through within-class. The signal is not an artefact of the unweighted topology: re-expressing phylogenetic distance as time-calibrated divergence from the TimeTree of Life confirms Bonferroni-significant signal in every bin out to [~]500 Myr of divergence. The same graph supports a predictive query that returns sister-species bioprospecting candidates for any application: ten Angelica congeners are unflagged for medicinal preparations while A. sinensis (Chinese angelica) already carries 86,814 such edges. GRAFT is an openly extensible scaffold linking phylogeny, ecology and the global IP record.
Matching journals
The top 1 journal accounts for 50% of the predicted probability mass.