BIOSCAN-ML aims to use machine learning to help biologists with categorizing and monitoring the biodiversity of our planet through the use of DNA barcodes. This is a part of BIOSCAN, a larger international collaboration led by the International Barcode of Life (iBOL) Consortium.

For foster the use of machine learning for biodiversity monitoring with DNA barcodes, we have two datasets with images of insect specimens paired with DNA barcodes, and partial taxonomic labels.

For working with the BIOSCAN datasets, please use our dataset code. To browse the BIOSCAN-5M dataset, we provide the BIOSCAN-Browser.

With these datasets, we are investigating different machine learning techniques to classify species based on image and/or DNA barcode, and to develop algorithms to help biologists taxonomize new specimen of potentially unknown species. Below are recent work investigating different pretraining models for DNA barcodes and images for taxonomic classification.