Grammar-like algorithm identifies actions in video

Photo courtesy of
Photo courtesy of

Body language is a powerful thing, allowing us to gauge the tone and intention of a person, often without accompanying words. But is this a skill that is unique to humans, or are computers also capable of being intuitive?

To date, picking up on the subtext of a person’s movements is still not something machines can do, however, researchers at MIT and UC Irvine┬áhave developed an algorithm that can observe small actions in videos and string them together, piecing together an idea of what is occurring. Much like grammar helps create and connect ideas into complete thoughts, the algorithm is capable of not only analyzing what actions are taking place, but guessing what movements will come next.

There are a handful of ways that this technology would benefit humans. For example, if could help an athlete practicing his or her form and technique. Researchers also posit that it could be useful in a future where humans and robots are sharing the same workspace and doing similar tasks.

But with any technological advancement comes the question of cost–not money, but privacy. In this case, would the positives outweigh the negatives? In what ways can you envision this tool being helpful for your everyday tasks?



VISAPP Computer Vision conference extends submission deadline

VISAPP_2014_conference_logoComputer Vision is an interesting kind of technology in many ways, but perhaps one of the most notable things about it is how applicable it is and can be in our every day lives. And although it’s not necessarily a “new” field, it is something that is gaining popularity and recognition in the lives of “normal” people, meaning those who are not scientists, researchers, programmers, etc.

At the start of next year, Lisbon, Portugal will play host to a conference on this very topic, which highlights the work being done in the field and the emerging technologies that can help Computer Vision help people. Currently, VISAPP 2014, the 9th International Conference on Computer Vision Theory and Applications, is accepting paper submissions for the conference, with its submission deadline having been extended until September 18.

Computer Vision studies bird flocking behavior

Photo courtesy of Andreas Trepte.
Photo courtesy of Andreas Trepte.
Flocking is a behavior exhibited in birds, which is similar to how land animals join together in herds. And while there is an intricate pattern to this flocking, it’s difficult to establish exactly how birds communicate to keep this form. Their movements are synchronous, but the question is: how do birds on the outer edges of the flock stay in sync and help guide the group? Luckily, we have computer vision to help answer that question.

Before, scientists used to simulate this behavior and then compare it to what occurs with birds in real life in an attempt to demonstrate the how and the why. However, now computer vision can measure both position and velocity of objects in a frame, thanks to the work of William Bialek at Princeton University, which is demonstrating that birds are capable of matching the speed and direction of their neighbor birds.

Additionally, the concept of “critical point” helps explain this, showing that the social desires of the birds overwhelms the motivation of each individual bird, as they work toward flying as a collective flock and not as solo birds.

There still remains more to be seen and explored, but check out this study for further reading.

Computer Vision aids endangered species conservation efforts

Photo by Dr. Paddy Ryan/The National Heritage Collection
Photo by Dr. Paddy Ryan/The National Heritage Collection

In an effort to help protect and conserve endangered species, scientists have been tracking and tagging them for years. However, there are some species that are either too large in population or too sensitive to tagging, and researchers have been working on another way to track them.

Now, thanks to SLOOP, a new computer vision software program from MIT, identifying animals has never been easier. A human sorting through 10,000 images would likely take years to properly identify animals, but this computer program cuts down the manpower and does things much quicker. Through the use of pattern-recognition algorithms, the program is able to match up stripes and spots on an animal and return 20 images that are likely matches, giving researchers a much smaller and more accurate pool to work with. Then the researchers turn to crowdsourcing, and with the aid of adept pattern-matchers, are able to narrow things down even more, resulting in 97% accuracy. This will allow researchers to spend more practical time in the field working on conversation efforts instead of wasting time in front of a computer screen.

Automated baked-goods identification can benefit businesses

Researchers at the University of Hyogo, alongside Brain Corporation, have created a computer-vision system that works to develop individual baked goods in a second.

The system had its first test-run at a bakery in Tokyo, where employers are benefitting. This is because their new employees who haven’t yet learned the ropes, or part-timers who don’t know the name of every kind of baked good, can still work the cash registers. Additionally, when there are long lines, it can speed up the check-out process, making the entire operation run more efficiently and smoothly.

While the system works relatively well, there still are some kinks to work out. For example, baked goods are easily distinguish by their shapes and toppings, but when it comes to sandwiches, the machine has a tougher time telling them apart.

Luckily, there are other companies out there with the technology to build even better versions of this same sample system. For example, the people at ImageGraphicsVideo can build a similar system which also has a learning capability. This means that whoever is using the system can input, or “teach,” new items to the computer. Not only that, but the user can point out when items are incorrectly identified, which the program then learns and uses in the future to avoid making the same mistakes.

Guaranteed win at Rock, Paper, Scissors with computer vision

Rock, Paper, Scissors is a famous hand game known throughout the world and dating back to the Han Dynasty. Largely a game of luck, it can be won by the most skilled of players based on various strategies such as studying the patterns of an opponent and playing against their predictability.

However, there is now a machine that is virtually impossible to beat. Researchers in the Ishikawa Oku Laboratory at the University of Tokyo have created a robot that has a 100% success rate at winning matches of Rock, Paper, Scissors.

Using computer vision, the robot is able to recognize the hand movements of its human opponent and respond with a winning match. This “cheat” occurs in the period of 1 millisecond, an amount of time that the human mind cannot see, process and respond to, ensuring the computer always wins.

How might this kind of technology be applied in more practical realms?

ComputerVision aiding the fishing industry

When Japan was struck last year with a massive earthquake and a subsequent tsunami, it affected everyone. Among those groups were the fishermen on the coast of the Iwate prefecture. Yet instead of waiting for the government to aid them in rebuilding the fishing industry, some individuals took matters into their own hands.

Enter Kenichiro Yagi, who installed laptops and webcams on four boats, in an effort to record information about his fishing trips online. As a result, Yagi and his crews are better able to cater what they catch to meet the demand from consumers. And computer vision is helping to play a role in this.

How it could work is that computers installed on trawling systems would be able to identify fish based on their scales. Those that don’t meet the demand will be returned to the sea, avoiding an excess of product which customers don’t specifically want.

There are downsides to this technology, however. First and foremost, it’s not certain that this kind of computer vision technology exists in such advanced stages. Secondly, experts claim that much of what’s returned to the sea will likely already have been killed in the trawling process.

But it’s still interesting to think how this could revolutionize fishing, allowing consumers to be specific in their demands and fisherman to supply the precise product requested.

An inferiority complex for ComputerVision?

It’s undeniable that the rise of ComputerVision technology has aided our society in many ways, by making the completion of complex and time-consuming tasks easier and faster. Yet in spite of the many advances made in the field, particularly over the past few years, the technology still isn’t able to rival the capabilities of humans – at least not now.

According to an article entitled “Comparing machines and humans on a visual categorization test,” published this month in the Proceedings of the National Academy of Sciences (PNAS), the ability of ComputerVision software to recognize pre-defined objects, complete projects, and solve problems may be quick and efficient, but it still falls short of what humans are capable of.

In an experiment conducted with people and machines, the test subjects had to recognize and classify abstract images. Again and again, human test subjects proved that they had the ability to “learn” – and apply what has been learned – to decrease the rate of error in recognizing reoccurring patterns. After viewing less than 20 images, most human participants were able to pick up on the pattern. Meanwhile, computers that normally fare well while working within a set of limited data required thousands of examples to produce correct answers, demonstrating that elaborative tasks which rely upon more abstract identification and reasoning are a weakness.

The study refers to this shortcoming of computer technology as a “semantic gap.” Of course, the pertinent question isn’t necessarily whether or not the reasoning abilities of computers will ever be able to parallel that of humans. Instead, perhaps we should be asking when they will be able to.