This walkthrough uses an ML algorithm called an image classification model. These models learn to distinguish between different objects by observing many examples over many iterations. This post uses a technique called transfer learning to dramatically reduce the time and data required to train an image classification model. For more information about transfer learning with Amazon SageMaker built-in algorithms, see How Image Classification Works. With transfer learning, you only need a few hundred images of each type of trash. As you add more training samples and vary the viewing angle and lighting for each type of trash, the model takes longer to train but improves its accuracy during inference, when you ask the model to classify trash items it has never seen before.

Before going out and collecting images yourself, consider using the many sources for images that are publicly available. Wewant images that have clear labels (often done by humans) on what’s inside the image. Here are some sources you could look for your use case:

  • AWS Open Data – Contains a variety of datasets sourced from trusted entities that share and open their datasets for general use.
  • AWS Data Exchange – Contains datasets that are both free and available for a fee or subscription charge. These are very well curated and labeled and therefore involve a charge in most cases.
  • GitHub – Offers several public repos with image datasets. Make sure you comply with the terms and conditions and reference the original owners when your work is published.
  • Kaggle – Has a wide variety of public datasets. Also, they provide some interesting starter Jupyter notebooks.
  • Non-profit and government organizations – Often publish data for public use under certain terms. These can be a great source of data.
  • Amazon SageMaker Ground Truth – Creates labeled datasets from your images. You can choose between automated labeling (recommended for common objects) or use human labelers or AWS Marketplace offerings for more specific labeling use cases. For more information, see Build Highly Accurate Training Datasets with Amazon SageMaker Ground Truth.

A good practice for collecting images is to use pictures at different possible angles and lighting conditions to make the model more robust. The following image is an example of the type of image the model classifies into landfill, recycling, or compost.

Image Example

When you have your images for each type of trash, separate the images into folders.





After you have the images you want to train your ML model on, upload them to Amazon S3. First, create an S3 bucket. For AWS DeepLens projects, the S3 bucket names must start with the prefix deeplens-.

This recipe provides a dataset of images labeled under the categories of recycling, landfill, and compost.