All these low-level algorithms are used in the implementation of higher-level functionality in our research projects.
Matrix Programming is at the core of mathematical models and it is vital do it carefully and precisely. It is easy to create software that implements matrix operations exactly as they are spelled out in a matrix equation. But that is seldom the best way to do it. For example, matrices may be "unstable" due to being close to "singular" or for other reasons while still being valid algebraically. It would be dangerous to program without taking that into account. Singular and unstable matrices are common in data analysis due to measured data being correlated with each other. So matrix operations are seldom programmed as they are written.
That's why numerical programmers use LU and Cholesky Factorizations of matrices for example. These simplify a complicated matrix into two simpler matrices. The two simpler matrices are then safer to manipulate.
Another matrix factorisation is the Singular Value Decomposition (SVD). But this deserves it own section because it has special properties for statistical learning. An n by m matrix has nm values and that can be a lot of numbers. But the SVD can reduce that matrix down to a handful of numbers (eigenvalues) that still contain the essence of those nm values. This makes the SVD a powerful tool for reducing piles of data into a form that can be easily interpreted. Also the SVD can be reversed to give a "smooth" version of the original matrix. That can be useful too. See the Statistical Learning page for more details on smoothing.
We have expertise in optimisation, interpolation and integration algorithms since these occur frequently in modeling and learning. Where possible we prefer to use code written by other people but frequently we have to write our own code, especially for the next three sections below.
Monte Carlo Simulation uses random numbers to do calculations that are too difficult to do any other way. The two main uses are to simulate real life scenarios and to do numerical integration. (Actually simulation and integration are the same - they are different ways of looking at the problem. The advantage of Monte Carlo is that it lets you solve problems that would otherwise be difficult. The disadvantage is that it may require a lot of computing time and gives a different answer each time. So most of the programming effort is concerned with techniques to get the maximum accuracy for the minimum computing time.
We use pseudo random numbers (numbers generated by a computer algorithm that look random) and quasi random numbers numbers generated by a computer algorithm that don't look random but behave like random numbers - for some problems these give better accuracy).
Markov Chain Monte Carlo (MCMC) methods are so useful they deserve their own section. Markov Chains are statistical processes (like time series) that have particular equilibrium properties in the long run. They are frequently used to model financial process such as stock prices. But they can also be used to do integration by relating the integral to be calculated to the equilibrium properties. This can be easy to do.
The essential advantage of MCMC is that the random samples are correlated (whereas the samples in normally Monte Carlo are independent) and so results can be more accurate.
MCMC is mentioned here because it is used in Bayesian statistics which is one of our preferred Statistical Learning techniques.
If you have calculated a quantity (such as expected profit) from a model you must also estimate the degree of confidence in that quantity. You are a bit of a fraud if report that you expect to beat the stock market index by 9% per annum and fail to mention that your 95% confidence interval for that 9% is plus or minus 20% (ie your expected return lies somewhere between -11% and +29%).
The bootstrap lets you relatively easily calculate confidence intervals for parameters of your models. It works by converting your requirements into an integral which you can then calculate using one of the Monte Carlo methods above. That means that it does have the drawbacks of the Monte Carlo methods. The founder of DDNUM Tony Cooper wrote his PhD thesis on numerical methods for calculating the bootstrap.
The name Bootstrap comes from the fact that the method uses your data sample as though it were the population sample so in a way it resembles picking yourself up by your bootstraps.
Multi-threading refers to the simultaneous running of two or more chunks of code on one or more computer CPU "cores." Basically it means doing two or more things at once which isn't always easy to do. For example if you have two lawn mowers you can easily divide up the task of lawn mowing among two people (well you might have some collisions though so it's not completely easy) and halve the time it takes to mow the lawn. But with two people it's not easy to halve the time required to take out the trash. In some of the tasks we do, especially simulation, the more CPU power you can use, the better the results. So Multi-threading can offer very good value for little extra money if your computer has unused cores.
We use quad-core computers and habitually write our low level code to use all four cores. The code still runs correctly on single core computers. We are a bit restricted when we used code written by other people, however. MATLAB still mostly runs single-threaded although it does offer the facility to split for-loops over multiple cores. So mostly when we write MATLAB code we write it so that four instances of MATLAB can be run simultaneously and each instance does a quarter of the work.
Most of the other software and libraries that we have available does not use multi-threading. The era of multi-threaded algorithms is relatively new and most commercial code still runs single-threaded. So there is an opportunity here to get an advantage over your competitors by designing your code to be multi-threaded from the start.
Grid Computing is similar to Multi-threading in that is allows multi-threading over more than one computer. Using readily available (and free) grid computing libraries we can write code that runs over hundreds of computers.
MATLAB has the facility to divide some tasks (mainly for-loops) up to run over multiple computers without adding much extra programming effort.
We also can do GPU computing using the CUDA libraries on NVIDIA graphics cards.
Computational photography refers to computational processing, and manipulation techniques that enhance digital photography after the photo has been taken. Common examples include the creation of panoramas or high-dynamic-range imaging.
Our specialist applications include post processing of astronomical images especially stacking (using multiple photos to create a combined better single image) and motion blur reduction. We also do fog removal to remove fog caused by city lights.
This refers to the computational understanding of images.
Applications and examples are numerous. We can do the classic example of handwriting recognition. We can also do image tracking using filters such as the Kalman Filter and Particle Filters. Those (signal processing) filters are also useful for financial time series applications.
We also do Deep Learning for Machine Vision and for financial applications.
One of our research projects is the automated rounding up of herds via Quadcopter.