dc.description.abstract | Obtaining a low-spatial-resolution hyperspectral image (HS) or low-spectral-resolution multispectral (MS) image from a high-resolution (HR) spectral image is straightforward with knowledge of the acquisition models. However, the reverse process, from HS and MS to HR, is an ill-posed problem known as spectral image fusion.} Although recent fusion techniques based on supervised deep learning have shown promising results, these methods require large training datasets involving expensive acquisition costs and long training times. In contrast, unsupervised HS and MS image fusion methods \baccac{have emerged as an alternative to data demand issues; however, they rely on the knowledge of the linear degradation models for optimal performance.} To overcome these challenges, we propose the Middle-Output Deep Image Prior (MODIP) for unsupervised blind HS and MS image fusion. \baccac{MODIP is adjusted for the HS and MS images, and the HR fused image is estimated at} an intermediate layer within the network. The architecture comprises two convolutional \baccac{networks} that reconstruct the HR spectral image from HS and MS inputs, along with two networks that appropriately downscale the estimated HR image to match the available MS and HS \baccac{images}, learning the non-linear degradation models. The network parameters of MODIP are jointly and iteratively adjusted by minimizing a proposed loss function. This approach can handle scenarios where the degradation operators are unknown or partially estimated. To evaluate the performance of MODIP, we test the fusion approach on three simulated spectral image datasets (Pavia University, Salinas Valley, and CAVE) and a real dataset obtained through a testbed implementation in an optical lab. Extensive simulations demonstrate that MODIP outperforms other unsupervised model-based image fusion methods \baccac{by up to 6 dB in PNSR. | es |