Cross-modal audio retrieval with famous paintings

We, human beings, can imagine sounds by taking a glance at a photo: A scene of a beach may bring the sound of crashing waves to mind. You may hear sounds of horns and street advertising from a picture of a busy crossing.  My fascination with such power of imagination led to a series of works on cross-modal audio retrieval, namely, Imaginary Soundscape and Imaginary Soundwalk.

Then I got a very basic idea:  What if I apply the same technique to famous paintings?  The result was surprisingly interesting and accurate.  Even though the model was trained with Flickr dataset, it apparently managed to capture the content in the paintings.