Skip to content

Latest commit

 

History

History
33 lines (28 loc) · 2.14 KB

File metadata and controls

33 lines (28 loc) · 2.14 KB

garbage_classification_cvpr_2024

Code for the paper we're going to publish on CVPR 2024.

Abstract:

Recycling is a pressing challenge for all countries nowadays, it reduces waste but manual sorting of tons of garbage daily can be challenging. Due to this, alternate ways of garbage sorting that leverage novel technology are being studied. Artificial Intelligence can help in this task by sorting domestic and industrial garbage in an automated manner. Hence, our research introduces Garbage-IT, a large multimodal dataset of Garbage Images along with a small Text description that claims to provide a more generalized dataset for the garbage classification problem and improve garbage sorting, and create a huge impact if used practically. Mostly of the current systems that use Deep Learning (DL) to do automated garbage sorting assumes all the needed information is contained in images of garbage. The system presented in this paper incorporates Natural Language Processing (NLP) by leveraging a language model which consumes textual information used to describe a piece of garbage. By combining those two sources of information, it is possible to obtain better results when doing classification using image or text only information. Garbage-IT provides benchmarks of various state-of-the-art models for applications like classification and image captioning (mention results).This works shows that the accuracy/f1-score in the dataset presented increases by X% when combining text and image.

Much research has been done in image classification for the garbage classification problem with a very specific and limited dataset. But we propose the classification based on both images and text. In order to show the power of multimodality we com- pared the performance of the following models:

  • Deep Learning Classification model trained on images
  • Deep Learning Classification model trained on text
  • Deep Learning Classification model trained on Images + Text (text generated from image captioning)
  • State-of-the-Art (SOTA) Deep Learning Classification model trained Image + text (generated by human)
  • Our novel Deep Learning Classification model Images + Text (generated by human)