Fig. 1. Eleven target parasitic eggs with their average sizes.

Our Chula-ParasiteEgg-11 dataset contains 11 types of parasitic eggs from faecal smear samples as shown in Fig. 1 with their average sizes, ranging approximately in the interval 15-100 μm. The images are acquired from several different devices, including Canon EOS 70D camera body with Olympus BX53 microscopes, DS-Fi2 Nikon camera body with Nikon Eclipse Ni microscopes, Samsung Galaxy J7 Prime phone, iPhone 12 and 13 with either 10× times eyepieces lens of Nikon Eclipse Ni or Olympus BX53 devices. This results in different resolutions, different lighting and setting conditions. Some images are out- of-focus, some images are noisy and some images exhibit motion blur having been captured with a motorised stage microscopes. A wide variety of image qualities and appearances create a large range of different image characteristics. This aims to enhance the robustness of the detection models. The training and testing datasets will be randomly divided. We expect to produce 1,000 and 250 images/class for training and testing on the day of dataset release. This is the largest dataset of its kind. The labels will take the form of boundary boxes.

Training and test datasets are now available at the IEEE Dataport : PARASITIC EGG DETECTION AND CLASSIFICATION IN MICROSCOPIC IMAGES webpage.

Training Data Format

Training dataset contains 11 parasitic egg types. Each category has 1,000 images.

  • category_id 0: Ascaris lumbricoides
  • category_id 1: Capillaria philippinensis
  • category_id 2: Enterobius vermicularis
  • category_id 3: Fasciolopsis buski
  • category_id 4: Hookworm egg
  • category_id 5: Hymenolepis diminuta
  • category_id 6: Hymenolepis nana
  • category_id 7: Opisthorchis viverrine
  • category_id 8: Paragonimus spp
  • category_id 9: Taenia spp. egg
  • category_id 10: Trichuris trichiura

Annotation: We follow COCO annotation for object detection. The annotations are stored using JSON with structure below.

“info”: [
{“year”: int, “version”: str, “description”: str, “contributor”: str, “url”: str, “date_created”: datetime}
“license”: [
{“id”: int, “name”: str, “url”: str}
“categories: [
{“id”: int, “name”: str, “supercategory”: str}
“image”: [
{“id”: int, “file_name”: str, “height”: int, “width”: int, “license”: int, “coco_url”: str}
“annotations”: [
{“id”: int, “image_id”: int, “category_id”: int, “bbox”: [x,y,width,height], “area”: float}