Thursday, 26 February 2015

Performance issues and graphs

Well things progressing if not a little slower than expected. Currently working on making some graphs and doing a mean shift for the HoG(Histogram of orientated gradients).

I have been creating some files to generate statistics for the different scenarios. The xml files generated have the frame number and HoG and moving average for that frame. This is so I have a file to go through to generate a graph from or use in the future for any data analysis. I also have a python file to generate the ground truth for images. This is done by popping up images and then the user enters the number of people in the frame and it creates an xml entry in the file given. At the moment it keeps the key codes in so they need to be changed to be the actual number of people in the frame.

With the xml statistics files that I have generated for variations on the same scene I am creating a graph against the ground truth to see which one performs better. I started doing this in PyGal but am now looking at MatPlotLib to see if any better. The graph in PyGal is a little claustrophobic see below for current graph from PyGal against graph from MatPlotLib.

PyGal graph



MatPlotLib graph

Looking at the graphs the MatPlotLib graph looks a little neater so will decide to go with that one. Also more graphs to follow.

The moving average of the HoG detector is good when the HoG detector is more stable with a high frame sample set. This is because the average does not deviate too much but when it has a large error it stays incorrect a lot longer. With the 2 frame sample set it adjusts more rapidly.

The next task to look at doing is trying variations of the HoG detector values on a set of images to determine the best settings to use. With the best HoG detector values set I can then re run the graph generators and see what the improvement is. Then I will implement a mean shift on the HoG detector to track people through the scene. This will give me some visuals on the movement of people with in the scene.

If I manage to get that far before my next meeting I will look at tweaking the HoG detector to get a closer box around the people detected to increase the accuracy of the mean shift.



Monday, 23 February 2015

Video and XML fun

This is only going to be a short over view of what I was up to over the weekend as there was a bit of time limitation with what I was up to. As I will now explain.

So over the weekend I was working on getting video up of PETS 2009 with HoG(Histogram of oriented gradients), moving average of HoG and ground truth. My first issue came when I noticed that the ground truth I was using was not accurate and missing out counting people that were in the scene. As I had trouble finding other ground truths for S2L1 scene and other scenes I created my own xml ground truth generator. This goes through the images one by one and the user can then enter the number of people in the scene. This then creates an xml document with the ground truth the user enters for the scene. This makes creating ground truth for future scenes that I can not find ground truths for a lot easier. I can then check through this and compare to the results of HoG and the moving average.

Once this was implemented then I had to generate a video displaying this information. This originally took a long time of close to an hour for 794 images which was was too long. I then reduced the number of frames to take down to 120 this meant it was a lot quicker. Then I had to turn the images into a video, originally I tried turning them straight into a .mp4 but took to long and crashed laptop several times. To overcome this I converted them to a .gif and then turned the .gif into a .mp4. This was still time consuming but more stable. I did this using the following commands.


convert -delay 30 -loop 0 *.jpg result.gif - using this source

ffmpeg -f gif -i infile.gif outfile.mp4 - using this source

Once I got them done I joined them together using the following command.

MP4Box -cat s2f.mp4 -cat s2ff.mp4 -cat s6f.mp4 -cat s6ff.mp4 -new all.mp4
 - That I found here


Now the video was done I uploaded to YouTube with a brief description. The video can be found here.

I am now looking at implementing blob tracking on the videos then move onto dealing with occlusions which will help me more accurately count people in a crowd. First I need to create some tests for the stuff I have.

Wednesday, 18 February 2015

Faces, Bodies and Rectangles

I have finally got Viola-Jones implemented in OpenCV in Python evern if the faces it detects on the PETS 2009 data sets are not very accurate. As I am sure grass does not have a face. I will work on fine tuning it and maybe even combining it with HoG(histogram of orientated gradients) person detection.

Thats another thing I have implemented, using some of OpenCVs example code but tweaked to be a little more modular. HoG person detector can now pick up people in images with a reasonable degree of accuracy. Both this, Viola-Jones and any future implementations will be checked against the ground truth for the set of images. This will give me an idea of how accurate the implementations are when values are changed or my own combinations/implementations are used.

HoG detection green boxes, Viola-Jones in blue boxes


I have also written tests for my files that I have made such as img_handler.py, viola_jones.py and hogDetector.py using pytest. This allows me to test all the methods to make sure they do what they are meant to and will not break when passing certain data. This makes them good for regression tests or if I ever need to change something in them. There are some parts which I have not tested as they require user input. I am currently looking it to simulating this to be able to test them.

Showing my tests completing successfully


The only ground truth file I have come across for PETS 2009 is one from this site for scenario S2.L1. This comes in a .xml file so I will be looking at parsing this to get the relevant data for comparison. I am also looking at other data sets to test more. As well as this I am looking into creating my own small test data set of around 6 people, a plan of what will be acted is currently being drafted up.

I have encountered a few problems over the past few days one of which being OpenCV on Ubuntu 14.04... again. I think due to a recent install of some software I damaged the OpenCV install that I had previously done. After following the steps from this website again it was up and running. As well as this the HoG detector was not picking up al people but after a few tweaks with some settings to do with it I am now picking up most of the samples with a few false positives. As well as making it more accurate in certain aspects it has also increased the time taken to process images. Viola-Jones needs some tweaking with its settings as it is detecting faces where there are definitely not faces, such as the grass.

I have also altered the file I was using to manually step through the images to do it automatically and put the results from both Viola-Jones and HoG detector. These are then saved and will be turned it to a video which will be linked on this site when up and running. It will have a voice over explaining what is going on and what each rectangle is .


 My next tasks if I manage to get that all done as well as a few improvements to files is to work on optical flow tracking. This will require me to compare positions of people in a previous image to current image and map their movement over time. This will be done in a different colour that will hopefully get darker the faster they appear to be moving. When I have that working I will post a video on YouTube and some images on here.

So that's it for this week, got a lot to crack on with hopefully won't be too bad and hopefully OpenCV won't mess up again.

Wednesday, 11 February 2015

That'ssss some very nice Python code there...

Well it has been a busy week of work on my dissertation and so with out messing around lets dive straight into what I have been up to.

Firstly I have handed in my project specification in on Friday and got feedback on The following Tuesday. The feedback was very helpful and helped clear up some thing such as focusing me more on target rather than trying to solve everything. As a result of this feedback my goals are even more clear now and they are to work on counting people in a crowd and determining if the crowd is calm or not. Other areas for improvement include my bibliography which did not have all the relevant information on some of the references.

As well as this I have been carrying on my reading and spent a long time trying to get OpenCV installed for Python on Ubuntu 14.04. I got shown a nice way of installing OpenCV via pip which is a nice way of installing stuff for Python. It installed but when I tried to run any test code with windows it was throwing dependency issues. So I looked around and found this nice tutorial which helped resolve some of my issues but still had an issue with gtk. This answer managed to fix the issue and I was up and running. I have also been learning a lot about Python such as how to be modular and returning tuples, example code below.


def tuple_return_function():
        return ("10", "20")

def function_name();
        tuple1, tuple2 = tuple_return_function()

        print("Tuple1 is " + tuple1 + " Tuple2 is " + tuple2)
       # Outputs: 'Tuple1 is 10 Tuple2 is 20'


Once I was up and running I got to work trying different OpenCV implemented methods using this site. As such I got some nice SIFT and SURF and a interactive foreground extraction using GrabCut algorithm. For SIFT and SURF I also made it loop so it only found a specific number of points so wouldn't detect everything. These helped me understand the basic functionality such as loading in images, copying them and converting them to grey and other basic stuff. You can see some examples below.
SURF - Before and after


GrabCut - Before and after

SIFT - Before and after


I have also started working on some code that will be used in the final system. Currently I have a Python file that is used for reading in image files from a directory as well as a basic implementation of Viola-Jones. Once I am happy with my implementation I will then begin writing tests for both the reading in image files and the Viola-Jones. I will be starting on implementing histograms of oriented gradients for human detection next after the tests are implemented.


I am having some issues with the Viola-Jones implementation for detecting faces in a crowd and it isn't that the faces are obstructed. It seems to be that the faces are too far away to be picked up. I am having a look at this and seeing if I can tweak some settings.

Sunday, 1 February 2015

I predict a riot

So finally decided on what my dissertation is going to focus on and, drum roll please, it is crowd behaviour. Specifically trying to count out when a crowd is likely to form, how many people are in a crowd, why they are they are in a crowd and dispersal patterns of the crowd. As well as this I will have to take into account the ethics of analysing people in crowds.

The main goals of this is to be able to determine at minimum number/groups of people in crowds. From this I can try to ascertain the situation and evaluate possible outcomes. It would be nice to be able to accurately count the people in a crowd however there are issues with trying to count people close together. If they are too close then an issue arises with counting many people as one person, as well as this the faces may not all be visible or clear enough to use facial recognition or just obstructed. There has been research into counting people in crowds or mapping high density crowds but I have not come across anything that is accurate and versatile between scenarios. The research linked for 'counting people in crowds' has a high success rate of more than 96% but the data set is only three videos. With regards to the link provided in the 'mapping high density crowds' they track motion against a certain threshold and if not over the threshold then it is considered static. This raises the issue of people who are not moving fast enough which can happen in over crowded areas would be considered background. On the other hand things, such as large animals, that move at the threshold could be used in the generation of the map thus making it not accurate. Being able to determine when a crowd is about to form would help determine things such as when a riot or a big event is about to happen( discussed in more detail later).

Below is an image of where people counting could be useful to make sure there is no over crowding. As well as this it could be used to determine best flow of traffic or if something abnormal is happening. *




Within crowds there are specific behaviours that can looked out for to help identify individual people or to try and work out what is going on in the scene. When walking down the street and another person is heading towards you the closer they get the more you move to the side to pass them as explained here. When people are coming from different angles but heading in the same direction they tend to merge in to a single flow of traffic. There is however, no specific detailed definition of a crowd but is defined as 'a large number of people gathered together in a disorganized or unruly way'. A few definitions do exist and share common specifics such as 'conceptualising a crowd as a sizeable number of people gathered at a specific location for a measurable time period, with common goals and displaying common behaviours' which is a extract from this online pdf. This document also goes into more detail about what is expected from a crowd. There is however issues with things such as determining when a group of people are considered a crowd, such as number of people and time spent together/at a location. For these reasons a set definition of a crowd must be made to allow a system to appropriately determine if there is a crowd or if a crowd is likely to form. As well as defining a crowd, definitions of crowds in different situations from data sets will allow the system to more clearly work out what is going on.

The accuracy of this is likely to be incrementally lower as the crowds get bigger as it will be harder to count people and monitoring the flow will become process heavy. As such focusing on each part such as the counting will allow more targeted results with hopefully higher accuracy. This could be useful though for predicting violence in crowds in which some work has been done here or even counting people in areas to avoid overcrowding and injuries.

Below image shows how the faces are not always showing on cameras, this makes it more difficult to count people using facial recognition. *




Now comes the fun part, ethics and what is ok to use, after all we will be looking at humans and their behaviour.The first thing we need to make sure is that the people who are on the video are ok to have themselves used for research purposes and their privacy is protected. Some questions need to be asked such as ones raised in this papers abstract. Questions such as 'Under what conditions should video be presented and to which audiences'. Videos that are used should have the consent of the people in it and should only be used for the purposes they are made for. As well as this there should be no attempt to try and identify persons within the videos unless that is the reason for the videos and you have express permission from all persons involved.


This is just a little overview of the parts of what needs to be discussed and will be discussed in more detail over the coming weeks.