📦 Segmentation Models¶

Unet¶

class segmentation_models_pytorch.Unet(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', decoder_use_batchnorm=True, decoder_channels=(256, 128, 64, 32, 16), decoder_attention_type=None, in_channels=3, classes=1, activation=None, aux_params=None)[source]¶

Unet is a fully convolution neural network for image semantic segmentation. Consist of encoder and decoder parts connected with skip connections. Encoder extract features of different spatial resolution (skip connections) which are used by decoder to define accurate segmentation mask. Use concatenation for fusing decoder blocks with skip connections.

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimensions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • decoder_channels – List of integers which specify in_channels parameter for convolutions used in decoder. Length of the list should be the same as encoder_depth

  • decoder_use_batchnorm – If True, BatchNorm2d layer between Conv2D and Activation layers is used. If “inplace” InplaceABN will be used, allows to decrease memory consumption. Available options are True, False, “inplace”

  • decoder_attention_type – Attention module used in decoder of the model. Available options are None, se and scse. SE paper - https://arxiv.org/abs/1709.01507 SCSE paper - https://arxiv.org/abs/1808.08127

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Available options are “sigmoid”, “softmax”, “logsoftmax”, “tanh”, “identity”, callable and None. Default is None

  • aux_params –

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

Unet

Return type

torch.nn.Module

Unet++¶

class segmentation_models_pytorch.UnetPlusPlus(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', decoder_use_batchnorm=True, decoder_channels=(256, 128, 64, 32, 16), decoder_attention_type=None, in_channels=3, classes=1, activation=None, aux_params=None, weight_standardization=False)[source]¶

Unet++ is a fully convolution neural network for image semantic segmentation. Consist of encoder and decoder parts connected with skip connections. Encoder extract features of different spatial resolution (skip connections) which are used by decoder to define accurate segmentation mask. Decoder of Unet++ is more complex than in usual Unet. :param encoder_name: Name of the classification model that will be used as an encoder (a.k.a backbone)

to extract features of different spatial resolution

Parameters
  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimensions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • decoder_channels – List of integers which specify in_channels parameter for convolutions used in decoder. Length of the list should be the same as encoder_depth

  • decoder_use_batchnorm – If True, BatchNorm2d layer between Conv2D and Activation layers is used. If “inplace” InplaceABN will be used, allows to decrease memory consumption. Available options are True, False, “inplace”

  • decoder_attention_type – Attention module used in decoder of the model. Available options are None, se and scse. SE paper - https://arxiv.org/abs/1709.01507 SCSE paper - https://arxiv.org/abs/1808.08127

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Available options are “sigmoid”, “softmax”, “logsoftmax”, “tanh”, “identity”, callable and None. Default is None

  • aux_params –

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

Unet++

Return type

torch.nn.Module

Reference:

https://arxiv.org/abs/1807.10165, https://arxiv.org/abs/1912.05074

EfficientUNet++¶

class segmentation_models_pytorch.EfficientUnetPlusPlus(encoder_name='timm-efficientnet-b5', encoder_depth=5, encoder_weights='imagenet', decoder_channels=(256, 128, 64, 32, 16), squeeze_ratio=1, expansion_ratio=1, in_channels=3, classes=1, activation=None, aux_params=None)[source]¶

The EfficientUNet++ is a fully convolutional neural network for ordinary and medical image semantic segmentation. Consists of an encoder and a decoder, connected by skip connections. The encoder extracts features of different spatial resolutions, which are fed to the decoder through skip connections. The decoder combines its own feature maps with the ones from skip connections to produce accurate segmentations masks. The EfficientUNet++ decoder architecture is based on the UNet++, a model composed of nested U-Net-like decoder sub-networks. To increase performance and computational efficiency, the EfficientUNet++ replaces the UNet++’s blocks with inverted residual blocks with depthwise convolutions and embedded spatial and channel attention mechanisms. Synergizes well with EfficientNet encoders. Due to their efficient visual representations (i.e., using few channels to represent extracted features), EfficientNet encoders require few computation from the decoder.

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features

  • encoder_depth – Number of stages of the encoder, in range [3 ,5]. Each stage generate features two times smaller, in spatial dimensions, than the previous one (e.g., for depth=0 features will haves shapes [(N, C, H, W)]), for depth 1 features will have shapes [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • decoder_channels – List of integers which specify in_channels parameter for convolutions used in the decoder. Length of the list should be the same as encoder_depth

  • in_channels – The number of input channels of the model, default is 3 (RGB images)

  • classes – The number of classes of the output mask. Can be thought of as the number of channels of the mask

  • activation – An activation function to apply after the final convolution layer. Available options are “sigmoid”, “softmax”, “logsoftmax”, “tanh”, “identity”, callable and None. Default is None

  • aux_params –

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is built on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

EfficientUnet++

Return type

torch.nn.Module

Reference:

https://arxiv.org/abs/2106.11447

ResUnet¶

class segmentation_models_pytorch.ResUnet(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', decoder_use_batchnorm=True, decoder_channels=(256, 128, 64, 32, 16), decoder_attention_type=None, in_channels=3, classes=1, activation=None, aux_params=None)[source]¶

ResUnet is a fully convolution neural network for image semantic segmentation. Consist of encoder and decoder parts connected with skip connections. Encoder extract features of different spatial resolution (skip connections) which are used by decoder to define accurate segmentation mask. Use concatenation for fusing decoder blocks with skip connections. Use residual connections inside each decoder block.

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features

  • encoder_depth – Number of stages of the encoder, in range [3 ,5]. Each stage generate features two times smaller, in spatial dimensions, than the previous one (e.g., for depth=0 features will haves shapes [(N, C, H, W)]), for depth 1 features will have shapes [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • decoder_channels – List of integers which specify in_channels parameter for convolutions used in the decoder. Length of the list should be the same as encoder_depth

  • decoder_use_batchnorm – If True, BatchNorm2d layer between Conv2D and Activation layers is used. If “inplace” InplaceABN will be used, allows to decrease memory consumption. Available options are True, False, “inplace”

  • decoder_attention_type – Attention module used in decoder of the model. Available options are None, se and scse. SE paper - https://arxiv.org/abs/1709.01507 SCSE paper - https://arxiv.org/abs/1808.08127

  • in_channels – The number of input channels of the model, default is 3 (RGB images)

  • classes – The number of classes of the output mask. Can be thought of as the number of channels of the mask

  • activation – An activation function to apply after the final convolution layer. Available options are “sigmoid”, “softmax”, “logsoftmax”, “tanh”, “identity”, callable and None. Default is None

  • aux_params –

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

ResUnet

Return type

torch.nn.Module

Reference:

https://arxiv.org/abs/1711.10684

ResUnet++¶

class segmentation_models_pytorch.ResUnetPlusPlus(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', decoder_use_batchnorm=True, decoder_channels=(256, 128, 64, 32, 16), decoder_attention_type=None, in_channels=3, classes=1, activation=None, aux_params=None)[source]¶

ResUnet++ is a fully convolution neural network for image semantic segmentation. Consist of encoder and decoder parts connected with skip connections. The encoder extracts features of different spatial resolution (skip connections) which are used by decoder to define accurate segmentation mask.

Applies attention to the skip connection feature maps, based on themselves and the decoder feature maps. The skip connection feature maps are then fused with the decoder feature maps through concatenation. Uses an Atrous Spatial Pyramid Pooling (ASPP) bridge module and residual connections inside each decoder blocks.

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features

  • encoder_depth – Number of stages of the encoder, in range [3 ,5]. Each stage generate features two times smaller, in spatial dimensions, than the previous one (e.g., for depth=0 features will haves shapes [(N, C, H, W)]), for depth 1 features will have shapes [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • decoder_channels – List of integers which specify in_channels parameter for convolutions used in the decoder. Length of the list should be the same as encoder_depth

  • decoder_use_batchnorm – If True, BatchNorm2d layer between Conv2D and Activation layers is used. If “inplace” InplaceABN will be used, allows to decrease memory consumption. Available options are True, False, “inplace”

  • decoder_attention_type – Attention module used in decoder of the model (in addition to the built-in attention used to process skip connection feature maps). Available options are None, se and scse. SE paper - https://arxiv.org/abs/1709.01507 SCSE paper - https://arxiv.org/abs/1808.08127

  • in_channels – The number of input channels of the model, default is 3 (RGB images)

  • classes – The number of classes of the output mask. Can be thought of as the number of channels of the mask

  • activation – An activation function to apply after the final convolution layer. Available options are “sigmoid”, “softmax”, “logsoftmax”, “tanh”, “identity”, callable and None. Default is None

  • aux_params –

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

ResUnetPlusPlus

Return type

torch.nn.Module

Reference:

https://arxiv.org/abs/1911.07067

MAnet¶

class segmentation_models_pytorch.MAnet(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', decoder_use_batchnorm=True, decoder_channels=(256, 128, 64, 32, 16), decoder_pab_channels=64, in_channels=3, classes=1, activation=None, aux_params=None)[source]¶

MAnet : Multi-scale Attention Net. The MA-Net can capture rich contextual dependencies based on the attention mechanism, using two blocks:

  • Position-wise Attention Block (PAB), which captures the spatial dependencies between pixels in a global view

  • Multi-scale Fusion Attention Block (MFAB), which captures the channel dependencies between any feature map by multi-scale semantic feature fusion

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimensions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • decoder_channels – List of integers which specify in_channels parameter for convolutions used in decoder. Length of the list should be the same as encoder_depth

  • decoder_use_batchnorm – If True, BatchNorm2d layer between Conv2D and Activation layers is used. If “inplace” InplaceABN will be used, allows to decrease memory consumption. Available options are True, False, “inplace”

  • decoder_pab_channels – A number of channels for PAB module in decoder. Default is 64.

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Available options are “sigmoid”, “softmax”, “logsoftmax”, “tanh”, “identity”, callable and None. Default is None

  • aux_params –

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

MAnet

Return type

torch.nn.Module

Reference:

https://ieeexplore.ieee.org/abstract/document/9201310

Linknet¶

class segmentation_models_pytorch.Linknet(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', decoder_use_batchnorm=True, in_channels=3, classes=1, activation=None, aux_params=None)[source]¶

Linknet is a fully convolution neural network for image semantic segmentation. Consist of encoder and decoder parts connected with skip connections. Encoder extract features of different spatial resolution (skip connections) which are used by decoder to define accurate segmentation mask. Use sum for fusing decoder blocks with skip connections.

Note

This implementation by default has 4 skip connections (original - 3).

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimensions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • decoder_use_batchnorm – If True, BatchNorm2d layer between Conv2D and Activation layers is used. If “inplace” InplaceABN will be used, allows to decrease memory consumption. Available options are True, False, “inplace”

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Available options are “sigmoid”, “softmax”, “logsoftmax”, “tanh”, “identity”, callable and None. Default is None

  • aux_params –

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

Linknet

Return type

torch.nn.Module

Reference:

https://arxiv.org/abs/1707.03718

FPN¶

class segmentation_models_pytorch.FPN(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', decoder_pyramid_channels=256, decoder_segmentation_channels=128, decoder_merge_policy='add', decoder_dropout=0.2, in_channels=3, classes=1, activation=None, upsampling=4, aux_params=None)[source]¶

FPN is a fully convolution neural network for image semantic segmentation.

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimensions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • decoder_pyramid_channels – A number of convolution filters in Feature Pyramid of FPN

  • decoder_segmentation_channels – A number of convolution filters in segmentation blocks of FPN

  • decoder_merge_policy – Determines how to merge pyramid features inside FPN. Available options are add and cat

  • decoder_dropout – Spatial dropout rate in range (0, 1) for feature pyramid in FPN

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Available options are “sigmoid”, “softmax”, “logsoftmax”, “tanh”, “identity”, callable and None. Default is None

  • upsampling – Final upsampling factor. Default is 4 to preserve input-output spatial shape identity

  • aux_params –

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

FPN

Return type

torch.nn.Module

Reference:

http://presentations.cocodataset.org/COCO17-Stuff-FAIR.pdf

PSPNet¶

class segmentation_models_pytorch.PSPNet(encoder_name='resnet34', encoder_weights='imagenet', encoder_depth=3, psp_out_channels=512, psp_use_batchnorm=True, psp_dropout=0.2, in_channels=3, classes=1, activation=None, upsampling=8, aux_params=None)[source]¶

PSPNet is a fully convolution neural network for image semantic segmentation. Consist of encoder and Spatial Pyramid (decoder). Spatial Pyramid build on top of encoder and does not use “fine-features” (features of high spatial resolution). PSPNet can be used for multiclass segmentation of high resolution images, however it is not good for detecting small objects and producing accurate, pixel-level mask.

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimensions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • psp_out_channels – A number of filters in Spatial Pyramid

  • psp_use_batchnorm – If True, BatchNorm2d layer between Conv2D and Activation layers is used. If “inplace” InplaceABN will be used, allows to decrease memory consumption. Available options are True, False, “inplace”

  • psp_dropout – Spatial dropout rate in [0, 1) used in Spatial Pyramid

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Available options are “sigmoid”, “softmax”, “logsoftmax”, “tanh”, “identity”, callable and None. Default is None

  • upsampling – Final upsampling factor. Default is 8 to preserve input-output spatial shape identity

  • aux_params –

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

PSPNet

Return type

torch.nn.Module

Reference:

https://arxiv.org/abs/1612.01105

PAN¶

class segmentation_models_pytorch.PAN(encoder_name='resnet34', encoder_weights='imagenet', encoder_dilation=True, decoder_channels=32, in_channels=3, classes=1, activation=None, upsampling=4, aux_params=None)[source]¶

Implementation of PAN (Pyramid Attention Network).

Note

Currently works with shape of input tensor >= [B x C x 128 x 128] for pytorch <= 1.1.0 and with shape of input tensor >= [B x C x 256 x 256] for pytorch == 1.3.1

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • encoder_dilation – Flag to use dilation in encoder last layer. Doesn’t work with *ception*, vgg*, densenet*` backbones, default is True

  • decoder_channels – A number of convolution layer filters in decoder blocks

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Available options are “sigmoid”, “softmax”, “logsoftmax”, “tanh”, “identity”, callable and None. Default is None

  • upsampling – Final upsampling factor. Default is 4 to preserve input-output spatial shape identity

  • aux_params –

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

PAN

Return type

torch.nn.Module

Reference:

https://arxiv.org/abs/1805.10180

DeepLabV3¶

class segmentation_models_pytorch.DeepLabV3(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', decoder_channels=256, in_channels=3, classes=1, activation=None, upsampling=8, aux_params=None)[source]¶

DeepLabV3 implementation from “Rethinking Atrous Convolution for Semantic Image Segmentation”

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimensions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • decoder_channels – A number of convolution filters in ASPP module. Default is 256

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Available options are “sigmoid”, “softmax”, “logsoftmax”, “tanh”, “identity”, callable and None. Default is None

  • upsampling – Final upsampling factor. Default is 8 to preserve input-output spatial shape identity

  • aux_params –

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

DeepLabV3

Return type

torch.nn.Module

Reference:

https://arxiv.org/abs/1706.05587

DeepLabV3+¶

class segmentation_models_pytorch.DeepLabV3Plus(encoder_name='resnet34', encoder_depth=5, encoder_weights='imagenet', encoder_output_stride=16, decoder_channels=256, decoder_atrous_rates=(12, 24, 36), in_channels=3, classes=1, activation=None, upsampling=4, aux_params=None)[source]¶

DeepLabV3+ implementation from “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation”

Parameters
  • encoder_name – Name of the classification model that will be used as an encoder (a.k.a backbone) to extract features of different spatial resolution

  • encoder_depth – A number of stages used in encoder in range [3, 5]. Each stage generate features two times smaller in spatial dimensions than previous one (e.g. for depth 0 we will have features with shapes [(N, C, H, W),], for depth 1 - [(N, C, H, W), (N, C, H // 2, W // 2)] and so on). Default is 5

  • encoder_weights – One of None (random initialization), “imagenet” (pre-training on ImageNet) and other pretrained weights (see table with available weights for each encoder_name)

  • encoder_output_stride – Downsampling factor for last encoder features (see original paper for explanation)

  • decoder_atrous_rates – Dilation rates for ASPP module (should be a tuple of 3 integer values)

  • decoder_channels – A number of convolution filters in ASPP module. Default is 256

  • in_channels – A number of input channels for the model, default is 3 (RGB images)

  • classes – A number of classes for output mask (or you can think as a number of channels of output mask)

  • activation – An activation function to apply after the final convolution layer. Available options are “sigmoid”, “softmax”, “logsoftmax”, “tanh”, “identity”, callable and None. Default is None

  • upsampling – Final upsampling factor. Default is 4 to preserve input-output spatial shape identity

  • aux_params –

    Dictionary with parameters of the auxiliary output (classification head). Auxiliary output is build on top of encoder if aux_params is not None (default). Supported params:

    • classes (int): A number of classes

    • pooling (str): One of “max”, “avg”. Default is “avg”

    • dropout (float): Dropout factor in [0, 1)

    • activation (str): An activation function to apply “sigmoid”/”softmax” (could be None to return logits)

Returns

DeepLabV3Plus

Return type

torch.nn.Module

Reference:

https://arxiv.org/abs/1802.02611v3