You are here

Smart software translates video pixels into words

The science of summarizing movies

Wouldn't it be nice if we could find certain scenes in movies on YouTube using simple search words? A double promotion of Amir Habibian (30) and Masoud Mazloom (38), from the University of Amsterdam. Both do video search: the science of automatically summarizing movies. Door Marc Laan


The two fresh PhD researchers developed algorithms that allow the computer to automatically analyse the content of a video, to make individual scenes accessible. Habibian Amir explains: ‘Until three years ago, even the smartest computer algorithm couldn't distinguish a video with an iPhone from an image with a cup of tea. Even people often have difficulty with image recognition. Now there are fast graphics processors that nearly flawless recognize and describe videos with fifty frames per second. Our software can translate video pixels into words.

Video recognition search
Video recognition search is the name of the discipline of the two PhD's. They each follow another track when creating video summaries. Habibian learns the computer to recognize the events in a video using 'tags', the explanatory texts that the makers of a film add to their images.


Mazloom follows a different path. He doesn't use textual tags, his computer learns to interpret the video itself. He trains his software in order to identify images that are typical of certain events. For example, someone with a skateboard pulling a trick, or a baseball player hitting a ball. Or a wedding: scenes with cake cutting, kiss and dance indicate a marriage. Mazloom trains his software algorithm by feeding it with positive and negative terms: this is a wedding, this is not a wedding, true or untrue. Mazloom: ‘My algorithm can create a timeline of the entire film.’


Where the algorithm of Habibian delivers descriptive words for the imported images, in Mazlooms method it's the other way round: a given search word delivers the corresponding images. The two UvA researchers were financed by Commit, the Dutch public-private ICT program.

Habibian explains his tags method: ‘My algorithm describes the video story of a film. I assume that the texts that accompany a video, more or less describe what there is to see. You can use all kinds of tags, such as the name of the video, the subtitles, or the description of the images, for instance 'war in Syria'. I feed the computer with thousands of examples. In this way, I'm learning my search algorithm for instance to associate the word ‘dog’ to a picture of a dog. In the end the software can scan the images and can recognize a dog. If you feed millions of videos with tags into the computer, the recognition rate is growing. My algorithm now knows ten thousand words. Training the software in a new domain takes about twelve hours. It is a form of machine learning. ‘

Airport security
Habibian's video search algorithm can even analyze live images. ‘At airports hundreds of cameras deliver video streams 24 hours a day. With video search, you can automatically analyse what's to see there. With street cameras the software can recognize if there is fighting.’ ‘But you can also filter out videos on YouTube that show sensitive materials. That is not an easy task to do for people. YouTube receives hundreds of hours of video every hour. My algorithm can summarize a one-hour video, 108,000 frames, in 2 minutes. The accuracy is now up to 70 percent, and that's growing slowly.’ At the Dutch broadcasting museum Sound and Vision in Bussum, Habibian used his search algorithm to provide old news broadcasts with keywords, thus increasing the discoverability to the public.

Tourists in Amsterdam
Masoud Mazloom specializes with his algorithm in the field of brand marketing. With the tourist office 'I Amsterdam' he is testing a system that delivers real-time recommendations to tourists visiting Amsterdam. ‘The city centre is too crowded with visitors. We spread them to other places, on the basis of their personal characteristics, for example to Haarlem or to the Keukenhof. Through their social media activities and their Amsterdam Card, we know their backgrounds and preferences. If it is too busy at the Van Gogh Museum, we can advise them in real time to go to the Rijksmuseum, if it is more quiet there.’

SEALINCMedia (Socially-enriched access to linked cultural media)
Ook dit is een COMMIT/ project