Menu [toggle]

Tikiwiki Assistant

Thank you for installing Tikiwiki!

LoginTo begin configuring Tiki, please login as the Admin.

The Tikiwiki CommunityTo learn more, visit:

Tikiwiki DocumentationFor help, visit



A common task in multi-wavelength astronomy is to take two catalogs of objects and compare them in order to identify the same objects in each catalog. This can be implemented quite easily, by looping over each object in one catalog, and computing the distance between it and the object in a second catalog. In Cartesian space, the distance is just

<latex>D_{ij} = \left( x_i - x_j2\right)

Here, <latex>x_i, y_i</latex> is the position of the object in the first catalog, and <latex>x_j, y_j</latex> are each of the positions in the second catalog. A match is defined if the distance between two objects is less than the sum of the squares of the uncertainties in the positions of the two objects.

The coordinates astronomers work with are of course spherical; typically, <latex>\alpha</latex> and <latex>\delta</latex> representing the right ascension and declination. If you are matching objects that are close together, then you can use the following approximation:

<latex>D = \left( \alpha_{\rm ref} - \alpha_i
2 \cos\delta_{\rm ref}2 \right)

This process can take a long time, however, if there are a lot of objects to compare. The process above becomes an order <latex>N
2</latex> process, because we have to compute the distance between every pair of stars, and then find which offsets were smaller than the positional uncertainties.

For a long time, I lived with the slow, order <latex>N
2</latex> process, because my lists of stars were relatively small; a few thousand in each catalog. Making lists of matches wasn't that slow. However, I eventually started using a longer list for one of my catalogs, containing millions of stars. The process was slower, but I could take a little break and come back to the result when the computer was done.

Then, I realized another step was needed in the process of making a catalog. I needed to know how likely it was that the matches between objects in the two catalogs were real. I could do a quick calculation of the density of stars in one catalog, and compute the probability that a star would randomly fall within the positional uncertainty of a star in the second catalog. From that probability, I could estimate how many of the matches were spurious, and consequently how many were genuine.

However, in my application, the stars were not uniformly distributed; I was looking at the Galactic center, where stars were centrally concentrated, and where differential absorption produced patches with few stars. The only way to approach my problem was through a Monte Carlo simulation. I would add a random, uniform shift to one catalog, and then test how many matches occurred between the bootstrapped distribution and the second catalog. Then, I would repeat the process hundreds of time so I could derive confidence limit.

With an order <latex>N
2</latex> matching algorithm, this was barely tolerable for catalogs of a few thousand stars, and excruciating when one of the catalogs had tens or hundreds of thousands of objects (I didn't try millions).

I knew there had to be a better matching algorithm (otherwise Gator (external link) would be prohibitively slow), but I couldn't find one in the literature. So, I re-invented the wheel. One can produce a faster matching algorithm as follows.

  1. Define a grid that will be used to divide and conquer the positions of the objects in the larger catalog. The grid resolution should be just a bit larger than the largest uncertainty in the position of any object.
  2. Assign each star in the longer catalog to a cell in the grid.
  3. Assign the objects in the second catalog to cells in the grid.
  4. For each object in the shorter catalog, determine which stars in the longer catalog fall within a the 3 by 3 group of cells surrounding the cell of the test object.
  5. Compute the distance between the test object and the objects from the long catalog on the 3 by 3 block of cells.

This process will be order N, significantly faster. With this, the Monte Carlo routine to count the number of spurious matches runs at a tolerable speed, even if one of the catalogs contains 10,000 objects, and the other one million objects (although you still might want to take a short break when you run it).

The following files implement these sorting and matching algorithms in IDL:

Created by: MikeMuno. Last Modification: Monday 11 of January, 2010 19:50:04 CST by MikeMuno.