Collaborating Foundation models for Domain Generalized Semantic Segmentation

1 LTCI, Télécom-Paris, Institut Polytechnique de Paris 2 LIX, Ecole Polytechnique, CNRS, Institut Polytechnique de Paris
  • Focus: Domain Generalized Semantic Segmentation (DGSS) aims to train models on a labeled source domain to generalize to unseen domains during inference.
  • Limitation of Existing Methods: Traditional Domain Randomization (DR) methods are limited to style diversification and lack content variability.
  • Our Approach: Introduction of the CLOUDS framework, utilizing an assembly of CoLlaborative FOUndation models for Domain Generalized Semantic Segmentation.
  • Components of CLOUDS:
    • CLIP Backbone - For robust feature representation.
    • Large Language Model (LLM) - Provides rich and diverse text prompts to enhance both content and style diversity.
    • Diffusion Model - Generates images while being textually conditioned on the LLM.
    • Segment Anything Model (SAM) - Iteratively refines the pseudo-labels of the segmentation model on the generated images.
  • Performance: CLOUDS significantly outperforms previous methods, showing improvements by 5.6% and 6.7% on averaged mIoU in adapting from synthetic to real DGSS benchmarks under varying weather conditions.


  • CLIP provides robust feature representations for unseen domains.
  • The frozen backbone ensures preserved generalizability.
  • Data is generated using a diffusion model.
  • The conditioning on LLM prompts increases content and style diversity.
  • We use SAM to improve noisy PLs.
  • Class-wise masks and point prompts are extracted for each noisy PL and fed to SAM, resulting in refined masks..

Generated images using Diffusion Model

Pseudo-Label refinement using SAM

Qualitative Results of CLOUDS

This paper has been supported by the French National Research Agency (ANR) in the framework of its JCJC (ANR-20-CE23-0027). This work was granted access to the HPC resources of IDRIS under the allocation AD011013071 made by GENCI. We would like to thank I.E.Marouf and T.Delatolas for proofreading.


      title={Collaborating Foundation models for Domain Generalized Semantic Segmentation},
      author={Benigmim, Yasser and Roy, Subhankar and Essid, Slim and Kalogeiton, Vicky and Lathuilière, Stéphane},
      journal={arXiv preprint arXiv:2312.09788},