I attended a talk by Prof. Weinberger at GCPR 2017. He compared traditional CNNs to playing a game of Chinese Whispers - every layer passes what it learned to the next one. If the some information is wrong, though, it will propagate to the end without being corrected. https://abs.twimg.com/emoji/v2/... draggable="false" alt="🤷‍♂️" title="Achselzuckender Mann" aria-label="Emoji: Achselzuckender Mann">
Details https://abs.twimg.com/emoji/v2/... draggable="false" alt="⚙️" title="Zahnrad" aria-label="Emoji: Zahnrad">

Creating "shortcuts" between layers has been done before DenseNet - important for training very deep networks.

The interesting point here is that the connectivity is dense and that the feature maps from all layers are concatenated and not summed up as in ResNet.
Growth rate https://abs.twimg.com/emoji/v2/... draggable="false" alt="📈" title="Tabelle mit Aufwärtstrend" aria-label="Emoji: Tabelle mit Aufwärtstrend">

An important hyper parameter is the growth rate - it specifies the number of feature maps that is added after each layer. In contrast to other architectures, DenseNet can use very narrow layers, typically containing only 12 feature maps.
Generalization https://abs.twimg.com/emoji/v2/... draggable="false" alt="✅" title="Fettes weißes Häkchen" aria-label="Emoji: Fettes weißes Häkchen">

An nice side effect of having a smaller network is that it is less prone to overfitting. Especially when no data augmentation is performed, DenseNet achieves significantly better results than other methods.
Results https://abs.twimg.com/emoji/v2/... draggable="false" alt="🏆" title="Trophäe" aria-label="Emoji: Trophäe">

DenseNet is tested on several benchmark datasets (including ImageNet) and either beats all other methods or achieves results comparable to the state-of-the-art, but using much less computational resources.
Conclusion https://abs.twimg.com/emoji/v2/... draggable="false" alt="🏁" title="Karierte Flagge" aria-label="Emoji: Karierte Flagge">

DenseNet is an architecture where each layer is able to reuse the information in all previous layer. This results in a network that is smaller, faster to train and generalizes better.

It is implemented in all major DL frameworks: PyTorch, Keras, Tensorflow etc.
Further reading https://abs.twimg.com/emoji/v2/... draggable="false" alt="📖" title="Offenes Buch" aria-label="Emoji: Offenes Buch">

- Original article: https://arxiv.org/abs/1608.06993 
-">https://arxiv.org/abs/1608.... Extended version in TPAMI: http://www.gaohuang.net/papers/DenseNet_Journal.pdf
-">https://www.gaohuang.net/papers/De... Code: https://github.com/liuzhuang13/DenseNet">https://github.com/liuzhuang...
You can follow @haltakov.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: