Короткий опис (реферат):
Deep learning models have demonstrated remarkable performance across the glaucoma's diagnosis and
prognosis; however, deploying them in resource-constrained environments poses significant challenges.
This research explores the balance between compression and accuracy preservation in specialist
convolutional neural networks (CNNs) intended for CPU-based execution with minimal storage
requirements. By employing pruning, knowledge distillation, quantization, and weight sharing, it is aimed
to achieve maximal compression without compromising essential task performance. Resulting findings
provide insights into the efficiency limits of model compression and its implications for real-world
deployment. Additionally, the applicability of these compression techniques to Transformer-based
architectures is examined throughout the work, which pose unique challenges due to their reliance on
attention mechanisms