Voigt profile fitting program

Copyright (C) 1992,1995 R.F.Carswell, J.K.Webb, A.J.Cooke, M.J.Irwin


March 2001 Version (V4)

Last update:  7 January 2002



This program is free software, distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. For a copy of the GNU General Public License see the information which comes e.g. with the GNU Emacs editor, or write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

Contact: R.F.Carswell, Institute of Astronomy, Madingley Rd, Cambridge CB3 0HA, UK
  rfc@ast.cam.ac.uk
 

CONTENTS:
 INTRODUCTION
 COPYING THE CODE
 ENVIRONMENT VARIABLES
 DATA REQUIRED
 OUTPUT FILES
 INTERACTIVE USE
    Single Features
    Complex features
    Multiple lines
 RUNNING FROM FILE
 CONTINUUM ABSORPTION
 FIXED PARAMETERS
 TIED PARAMETERS
 SUMMED COLUMN DENSITIES
 ZERO LEVEL
 CONTINUUM LEVEL
 UPPER LIMITS
 OPTIONAL EXTRAS
 EMISSION LINES
 ADDING EXTRA LINES
 AUTO CONTINUUM
 OTHER FEATURES & IMPROVEMENTS
 .. AND FINALLY
 
 

INTRODUCTION:The VPFIT program enables you to fit multiple Voigt profiles (convolved with the instrument profiles) to spectroscopic data. If you don't want to do this, read no further. If you want to display data, play around with data and models, try "chi-by-eye" fits, display the result of a proper fit, do pretty plots, etc., you might prefer a program written by Joe Liske called vpguess. The vpguess program also has the great merit of providing initial estimates for  the VPFIT program.

Before you embark on Voigt profile fitting with VPFIT, the program needs quite a bit of information. The most obvious is data file(s). with wavelengths and data values. These may be ASCII files or IRAF .imh and .pix files. Since it fits absorption lines it also needs continuum estimates, created using IRAF 'continuum' (or equivalent) in the same format as the data, to put the model lines on to compare with the data. It convolves the model profiles with the instrument profile for comparison with the data, so you do need estimates of the instrument profile. Depending on observing conditions and slit width, this may be adequately approximated by gaussian fits to arc lines which have been extracted and processed in the same way as the data, so you need to know the instrumental profile width in the regions of interest, though it will take fits to the resolution as a function of wavelength, or even pixel by pixel instrument profiles if you want. The other thing which is vital an array of  data error estimates in the same format as the data, so that an objective goodness of fit assessment may be made. There are additional minor pecularities of various datasets which have to be accommodated, but that covers the important ones.

Basic use involves marking off regions of the data to be fitted, and putting in first guesses for the line parameters for the absorption lines in these regions. Generally this is most easily done by marking things off with a cursor. You can fit several lines from a single ion, or several different ions which may overlap in various regions, simultaneously.

When the program is run it first looks for an atomic data file (which is provided), and then starts to ask obscure questions about what you might want to do. You have to tell it which spectra you want to fit, which regions of those spectra, and what ions you want to use with initial guesses at the column density, Doppler parameter and redshift for each system. The program then puts all lines in from those ions in the regions you have selected, and wanders off and finds a minimum chi-squared fit, displays statistics for this fit, and allows you to look at the results. About the only restriction is that at least one line from each ion selected at whatever guessed redshift is within at least one wavelength region, otherwise there is no constraint on that ion at all. If the fit is unacceptable, you can start again with more redshift systems and see if it gets any better.

For example data, use the Hubble Deep Field South QSO echelle spectrum, which is available at http://www.aao.gov.au/hdfs/. It is probably worth your going through fitting some of the lines independently (solutions are published for all) before embarking on a new dataset. [top]

COPYING THE CODE
The VPFIT source code is in the anonymous ftp space pub/rfc on cass41.ast.cam.ac.uk (131.111.69.186) as oldvpfit.tar.gz. After you gunzip and extract the tarfile, you will need to edit the makefile to reflect where the PGPLOT and IRAF libraries are held. Since different incantations are used in different places to access these the makefile shows the IoA version (where PGPLOT can migrate, but is aliased) and UNSW version (which is more direct). No other libraries are needed. In particular NAG has been superseded by cms_ and pda_ routines, which are in the public domain and the relevant ones included in the tarfile.

Makefile will yield the executable code (after you get it right). At this stage it is worth trying to run it just to see if there are any problems, like the instant response: 'vpfit: Not enough memory'. If this happens, then ask to have your swap space increased. What is needed will depend on the array sizes used (in vp_sizes.f).

ENVIRONMENT VARIABLES
 The program uses PGPLOT for all graphics, and PGPLOT programs read a binary font file at run time. They need to be told where to find this file by environment variable PGPLOT_FONT (or maybe PGPLOT_DIR). You can define this in your .cshrc file. setenv PGPLOT_FONT /usr/local/vlb/pgplot/grfont.dat .. or wherever these are in your system. If the variable is not defined correctly, you will get a warning message at run time, e.g. %PGPLOT, Unable to read font file: /usr/local/vlb/pgplot/junk The program will continue, but all alphanumeric characters will be omitted from the plot.

Other environment variables are  ATOMDIR, which should point to the atomic data file (atom.dat, provided), and optionally for setup files for various program parameters (VPFSETUP) and to override display defaults (VPFPLOTS)
[top]

DATA REQUIREMENTS

The data can be either in IRAF .imh and .pix form, or in an ASCII file.
 

IRAF Data:

1. data file in IRAF 1-d or echelle format, as <file>.imh and .pix.

2. Wavelength coeffts as Chebyschev polynomials or linear coeffts. These can be held in the header as IRAF v2.10 coefficients stored as WAT2_nnn character strings, in a separate file in database/ec<NAME> in a two dimension Chebyschev form as from IRAF (echelle) ecidentify, or, for historical reasons, as linear coeffts under APNUMn (lam0=#3, dlam=#4). [Note that the internal coeffts refer to channel zero and increment, while IRAF uses channel 1 and delta lambda. This is sorted out when the coeffts are read by the program.] A data-like file containing a wavelength for each pixel as a table is also OK. Uniform wavelength bins are supported (CVAL1 as the wavelength of the first pixel, and CDELT1 the wavelength increment per pixel in the header).  Velocity binned wavelengths are also OK. WCFTYPE in header has values 'velocity' (or 'VELOCITY', or 'VE<anything>) if uniform velocity bins are used, in which case only the first two coefficients are used. wavelength = cft1*(1.0+cft2)**n for the center of the nth channel. Note that cft2 is v/c, NOT v! If WCFTYPE is absent, or anything else, then wavelengths are derived from the standard polynomial.

3. Error file, in same format as data file. This can have any name, and is searched for first as the value of SIGFILE in the data header (if you have set SIGFILE in the header of course). If that fails, then it tries <NAME>.var, <NAME>.sig, <NAME>.err before giving up and prompting. If you don't have an error file, then a <CR> response to the prompt here results in sigma(i)=sqrt(max(data(i),0.2*mean(data))) for each channel -- this is not desperately useful, but is better than nothing. If the error file is absent, you will have to be very lucky (or unlucky) to get chi-squared values which look at all sensible. There is no provision for a quality array, but bad data pixels may be flagged by giving them a negative error (or variance), and then the corresponding data is ignored entirely.

4. Continuum file, as for data, specified by CONTFILE in header, or, if that is absent, <name>.cont. It prompts if that fails, and a <CR> response to the prompt gives a unit continuum (useful for renormalized data). If you have normalized your data to the continuum, then CONTFILE set to 'unity' (or 'UNITY') stops that prompt, and gives unit continuum (so don't call a data file 'unity', it will never find it).

5. Polynomial fit to resolution FWHM (in A) vs wavelength (also in A) in double precision ASCII file, specified by RESFILE in data header. File should be in database/(resname). Often you don't want to be as detailed as this, so instead in the header you can enter RESVEL nnn, where nnn is the instrumental FWHM in km/s. This value is taken if RESFILE is absent. If both are absent (or the file corresponding to RESFILE does not exist) you are prompted for a FWHM (in km/s or A).

6. Heliocentric correction (if required) in data header (in km/s). The VHELIO value from the IRAF rvcorrect task is sought first (the wavelengths are multiplied by 1+v/c). If that is absent, then it tries HELVEL and then the wavelengths are multiplied by (1-v/c) [for those who like a different sign convention!]. If neither is present, zero is assumed. Several extraction packages do the heliocentric correction for you -- in such cases VHELIO and HELVEL should be absent, or have the variable zero associated with them.

7. Sigma scale file for corrections due to rebinning of the data (if necessary). SCLFILE in data header, again database/(sclname) or in the header as SIGSCLn=(real number). If neither are present, unity is assumed (i.e. the scaling appropriate for independent channels).

8. The program assumes vacuum wavelengths for everything. If they are not, then an item AIR in the header will result in an internal air -> vacuum conversion automatically. This is not yet done for the database/ 2D Chebyschev polynomials made by IRAF. If you have an item 'AIR' anything (like temperature, some observatories put all sorts of rubbish in the header), and have vacuum wavelengths, then an item 'VACUUM' anywhere (with any value or text string) will force the use of vacuum wavelengths.

9. Other header items: REFSPEC1 -- wavelength coefficients are picked up from this file if it exists, but only as a last resort if the standard IRAF wavelength header is absent.

10. An atomic data file, specified by the environment variable ATOMDIR. If that is not set, then, unless you have something called /home1/rfc/atom.dat (which it will then try to read), it asks for a filename.

         [top]
 

ASCII DATA:

The data may be in an ASCII table consisting of up to four columns of  ascending wavelength, spectrum, error estimate, continuum. It is assumed that the wavelengths are vacuum, heliocentric corrected, and any scaling of the error estimate has been applied. You can specify the spectral resolution by inserting a line 'resvel nnn' anywhere in the file, where nnn is the FWHM in km/s. The input terminates on detecting the end-of-file. Sample input might look like:

resvel 6.6
4990.  0.946703 0.03102309
4990.04  0.9681857 0.03154853
4990.08  0.9858243 0.03178264
4990.12  0.9543419 0.03165093
4990.16  0.918863 0.03112572
4990.2  0.9460486 0.03140076
4990.24  0.9262011 0.03116661
4990.28  0.9371514 0.03115548
4990.32  0.9425622 0.03101928
........

Note that this example has only three columns. If the continuum column is blank, then the continuum is taken to be unity.

IMPORTANT: If you are using this option, it is vital to make sure that the wavelength array length is large enough to hold a table of wavelengths. Edit vp_sizes.f before compiling so that the last line of the section

* MAXimum Wavelength Coefficient Order (and resolution polynomial!)
* This is also the array size for wavelength tables, so should be
* the same  maxfis if that is the way the wavelengths are stored.
      integer maxwco
      parameter ( maxwco = 30 )

becomes

      parameter ( maxwco = 125000 )

(where 125000 is an example maxfis).
 

FURTHER WARNING: With wavelength tables like this you could in principle store things in any order. Please don't - the program assumes an ordered sequence of wavelengths. The other thing you could do is have large gaps in the data, like finish at 5000 and restart at 8000 because you don't see any point in keeping information in between. This is not handled at all either, and it is better to keep the data as separate files. VPFIT is fine with files from all over, but does not handle gaps at all well (since in generating narrow profiles it resamples the pixels to make the Voigt profile, and assumes slow changes in pixel width as you go from one pixel to the next, estimating the width by looking at the wavelength midpoints between pixel centres).

        [top]
 

INTERACTIVE USE:

Single Features
Get into the directory with the data in it (otherwise you'll wind up typing pathnames a lot), and start the program. Then respond to the prompts, which may or may not have obvious answers. Where the answer is not obvious, the a carriage-return normally gives something sensible.

If you have copied the Hubble Deep Field quasar spectrum there, then you might like to try using it to see how things work in practice. An example is given here.

Note that if you try to type the line parameters or ID's in the VPFIT command window, as opposed to the PGPLOT window, the program will ignore them as line ID's and try to save them as commands. This is not useful, and can be confusing. It was done this way so that you don't have to oscillate between windows, as well as keyboard/mouse, when entering in a number of lines. What you type is reflected in the command window so you can see what you are typing. The backspace or del keys can be used to correct mistakes, though if you do use these then the ID line being typed goes on to a new line in the command window .. so the last line you see there is the one which is being entered as the line ID, preceded by a history of your mistakes which are ignored!


OUTPUT FILES:

The program generates a number of files:

fort.18: essentially a summary of the fit, containing the information given on the sceen while running the program but without the questions.

fort.13: the starting guesses in case you have to restart. You can edit this to have closer values if you want, or to put another line in. In the example  case it looks like:
   *
 q2233   1    3809.08    3814.04
  *
   H I    1.340E+01    2.135245      24.10       0.00   0.00E+00
 

fort.26: summary of the final results, which is described here.

If you have tried that example you will notice that the fit probability is amazingly high, and (equivalently) that the (normalized) chi-squared is amazingly low, at 0.467 when it should be near unity. This is because the error estimate array is too large by about a factor 1.5, so the whole array should be divided by 1.5 (use IRAF, or, alternatively, create a file called 'vp_setup.dat' in the same directory as you are running the program from, containing the line  'sigscalemult 1.5', which will cause VPFIT to do it for you for any incoming error array). When you do this you find that the normalized chi-squared is 1.049, and the fit probability 0.294.

This may seem to be a totally arbitrary thing to do, and as presented here, it is. However, the chi-squared statistic assumes that the data bins are independent, and the data have been rebinned in this example, so there is a correlation between neighboring channels -- effectively a smoothing on a scale of up to two channels  if the rebinning is to a uniform binsize which is an average from the (usually not completely linear) raw data. As a consequence the fluctuations in the rebinned data are smaller than in the raw data, and it is the raw data which is generally used to estimate the 1-sigma error array. Thus the chi-squared values, which are the sum of [(data-fit)/sigma]^2, will be underestimated because sigma is an overextimate given the fluctuations. A way of determining the appropriate scalefactor is to measure the rms about the mean using IRAF (in splot) and comparing that with the sigma value for the same region.

The error estimates for the parameters are based on an approximation which should work reasonably well for unstaurated lines, and which allows for errors introduced by uncertainties in the parameters for other lines in the blend which make up the feature you are fitting. A little more information on error estimates in other circumstances is given here.

          [top]

Complex features

You might now try something a bit more ambitious than a single  line fit to a single feature, and you can enter multiple lines in complex features. Another example illustrates this.

You don't normally know how many components are needed to fit complex features from the start, and there are two approaches to this.

One is to give starting values interactively for as many as you think you might need, and then when you find you got it wrong add more lines where the fit looks bad and restart the program. This is normally done by editing the file containing the best guess output from the previous run to contain more lines where you think you want them, and restarting the program now running from a file  (see below for running from a file of first guesses).

The other approach is for the program itself to try to put extra lines in where the fit is worst. You can make it do this by extending the first input line giving the options from 'I' to, for example, 'I ad il'. In this case the 'ad' part  results in lines being added where the fit is worst, one at a time, until the fit probability (chi^2) is greater than 0.01 and the K-S probability is also greater than 0.01, at least in an ideal world. Adding new lines for the fit reduces the number of statistical degrees of freedom, and it is possible that addition of lines does not improve the fit statistically. If adding new lines causes the normalized chi^2 to increase, then the iteration cycle stops and you will probably have to rescue the program by hand. The 'il' part results in the program removing ill-conditioned lines, i.e. those which are not constrained by the data (because they are too weak or have moved outside the fitting region selected from the spectrum).

[If you don't like the default probabilities of 0.01 you can change them by specifying them after the 'ad' on the same line -- so 'I ad .02 .001 il' has the same effect as above, except that the process is terminated if the chi^2 probability is >0.02 and the K-S probability is >0.001, so a better fit is required from the chi^2 test, while a worse one is tolerated from the K-S test.]
 

Multiple lines

And you don't need to have all the lines of interest in only one region. For example, there is a SiIV doublet at z=1.87 in the spectrum of the Hubble Deep Field South Quasar, and it is clearly best to fit both lines simultaneously. There is also a MgII doublet in the Ly-a forest which would benefit. An example is based on a CIV doublet in the same spectrum.

This example does illustrate a deficiency in the automatic 'add more lines' approach, and that is that there is no algorithm to decide which of the possible lines it could add should be added. The approach taken here is to look to see what the ID is for the nearest feature to the place where one is to be added, and call it that. While this will work if only a single ion is present, it can lead you astray if there is more than one. If you are dealing with heavy elements in the Ly-alpha forest, or with high order Lyman lines, then you can make sure that any new lines added are Ly-alpha rather than the nearest neighbour by specifying 'I ad Ly-a il' in the options line, but this forces any new line added to be Ly-alpha whether or not it should be. If you don't like the Ly-a ID you can always change the output file so the HI becomes something else at a suitable redshift and rerun the program from file.

This example would also benefit from automatic local continuum fitting when the normalized chi-squared dropped below about 2.

         [top]
 

RUNNING FROM FILE:

While the interactive mode is good for learning what the program does, it is tedious having to use it every time. If you have first guesses then it is almost invariably more convenient to run directly from a file. The first guesses need not come from interactive runs of the program, but could come from anywhere. Joe Liske's program vpguess provides a more convenient way of setting up an initial file.

The results are the same as for the interactive mode, but now without the hassle of having to remark the region and enter guesses for something you tried before and somehow things were not quite right. The output file from the program, either fort.13 or fort.26, can be used as input for a subsequent fit, after editing if you wish. This can have the format as for the output file written by the program, but a freer format is accepted. Thus an acceptable fort.13 is:

*
  sect4.ec 1  7180.15 7188.2
sect4.ec 1   7005.63 7014.83
sect4.ec 1  7020      7025.3
sect7.ec 1  8216.73 8222.84
*
C II 15.19 4.383024b   1.63m  0.0 1.0e4
C II 1.4043E+01 4.382e 4.25p 0.00 1.00E+00 0
C II 1.4043E+01 4.382893j 4.25u 0.00 1.00E+00 0
CII 15.56 4.383865c       11.32n 0.00 1.00E+00 0
C II 1.2242E+01 4.383487h 13.04s 0.00 1.00E+00 0
C II 1.4093E+01 4.384351f 17.61q 0.00 1.0
C II 1.2890E+01 4.385030g 11.93r 0.00 1.00E+00 0
OI 1.8209E+01 4.383024B 1.63M
O I 1.4043E+01 4.382795E 4.25P
O I 1.4043E+01  4.382893J 4.25U
O I 1.5047E+01    4.382820E 14.25P
..etc
 

Note that a blank line is a terminator for a dataset.

This enables you to write a fort.13 by hand if you wish to, rather than the easier method of using the program to generate it. Note that a space between element and ionization is allowed, or may be omitted. You can also treat a fort.26 summary file as a fort.13 file if you want to -- it will be interpreted correctly unless you have removed the %% markers before the filenames.

To use this file just type 'F' (from file) instead of 'I' (interactive) as the option on startup:
 

.....
  options:   <CR> for previous value
  I - interactive setup and fit
  F - run from an input file
  D - display profiles from input file
  ? for help
  option (key) (key)...
> F                       (and any other parameters you want on that line)

  setup: ? help, <CR> defaults, n,z,b,cs,sf,il,w,me,p,d,v to change
>                           <--

 Column density (n), logN (l) or emission (e), scalefactor
>                             <--
  logN scalefactor is     1.00000
 Parameter input file, # entries? [fort.13,1]
>                            <--

 filename       : whatever
 echelle order  :  1
  Continuum set to     1.00000  everywhere
  sigma scale (errors scaled) :    1.00000
 FWHM : 0.095A
  Start & end chans:   1610  1738
   1 regions fitted
   3 systems

  no. of ions for fitting is   3

......

 If you have a file from trials like the above, you can try it and see what happens.

         [top]

LYMAN CONTINUUM ABSORPTION: The Lyman continuum optical depth is occasionally useful for setting total HI column densities for a complex, though it does not constrain the Doppler parameter or the redshift. Lyman continuum absorption is included if it falls in any of the wavelength regions chosen, and if logN(HI)>15.0 (this is only to save time, and is adjustable -- see optional extras below). There is a complication that the atom.dat table does not contain the full Lyman series (since it is infinite), so for the region between the Lyman limit and the highest order line in the table the spectrum is interpolated. This should be adequate.

FIXED PARAMETERS: On occasion it is useful to fix the redshift or the Doppler parameter because you know them accurately from fits to other lines.. eg z and b from SiII, and you wish to apply these to OI or CII. Fixed parameters are indicated by UPPER CASE letters after the parameters in question. For the input fort.13 file the parameter lines could look like:

O I 14.043 4.382795E 4.25P
O I 14.043 4.382893J 4.25U
O I 12.346 4.383487 13.04

.. in which case the redshifts and the Doppler parameters are fixed, but the column densities are free to vary for the first two, and everything varies in the fit for the third.

TIED PARAMETERS: Another thing you might want to do is to tie parameters -- for example so that SiII and CII are at the same redshift, and have the same Doppler parameters (or Doppler parameters in the ratio set by the square root of their mass ratios to mimic thermal broadening). This is done by the same letter approach, but now one thinks of a reference ion (with lower case letters) and others tied to it (by the corresponding upper case ones), so the ion detail lines in fort.13 become like:

C II 14.043 4.382893j 4.25u 0.00 1.00E+00 0
C II 15.568 4.383865c 11.32n 0.00 1.00E+00 0
SiII 14.043 4.382893J 4.25U 0.00 1.00E+00 0
SiII 13.276 4.383865C 11.32N 0.00 1.00E+00 0

.. here the redshifts and the Doppler parameters are tied. The mysterious 1.00E+00 is the ASSUMED temperature, which is fixed at 1K in this case so that the Doppler parameters are the same, i.e. all the b is turbulent. If you want thermal widths, set the 0.00 to be some small number (like 0.01) -- it is the turbulent Doppler parameter in km/s. If you like 1.0E4K, then change the 1.00E+00 to 1.00E+04, and so on. If there is a number there it fixes the value, and 0.0 means free to fit. (0.00 0.00E+00 gives a thermal fit). You can tie Doppler parameters at different redshifts if you want to, but it is not clear what that would mean.

Note, however, that you can't fix both the turbulent and thermal parameters at the same time, because this fixes the total b-value (which you can do, as above). If the turbulent value input is non-zero, then the temperature value is ignored, whatever it is, and the program treats it as an input value zero.

In this case the column densities are free parameters, but you can fix and tie these as well if you want to. Fixing is obvious, but tied column densities is not something which has been wanted often, so it is a bit primitive. Outline details on how to do this are here.

        [top]
 

SUMMED COLUMN DENSITIES:  There is another way tieing parameters, and that is to make the ratio of ions within a complex to be the same for all components in the complex while letting the actual ratios be determined by the fit. This can also be used for single ions where you might be interested in the total column density rather than  the individual components -- since the total in close blends may often be determined more accurately than the root sum squared error indicates. The reason for this is that within complexes the uncertainties for individual components may arise because similar profiles can be obtained while trading column density contributions between components while the overall total changes little.  The procedure for doing this is described here, and it involves vp_setup.dat (see 'Optional Extras' below) parameters, but is reasonably straightforward.

Pretty well everything which can go in the fort.13 file has now been covered except the last item on the line which gives the ion, redshift etc.. which has always shown up as 0. It should be zero in any sensible fit, but occasionally it is useful to have lines of a system appear in one region only. To do this just replace the '0' by the region number (which is just the placement in the list at the beginning of the file). Its main use is as a convenient (?) tool for modifying local continua when you have overlapping spectral regions which don't agree.

         [top]

ZERO LEVEL: There are two ways of adjusting this. If you use the line __ (1215.67 assumed wavelength, to make life easy (?), but you can change it in the atom.dat file), then the continuum is left where it was, and the fitted spectrum stretched linearly to the new zero level. Note that the first parameter <zero level> is now expressed as a fraction of the continuum, rather than put on a scale where it is forced to look like a column density, so the line in fort.13 now looks like:

__ <zero level> <redshift>N <b-value> 0.00 0.00E+00 n

where the zero level adjustment will normally be around zero (it may be negative), and will be less than 1! As before, N is a sample letter to fix the redshift, which should be such that 1215.67*(1+z) is within the region of interest. The number n is the dataset number for which the zero level is to be adjusted (1 for first region, 2 for second, etc.), and this is not needed if there is only one region which contains the wavelength 1215.67*(1+<redshift>). It performs a linear fit of the form <zero level> + <b-value>*delta lambda/ 1215.67*(1+<redshift>), and you can fix either of the other two values (indeed, it usually is best to set the <b-value> to zero and fix it, but occasionally you might want a slope).

Note that the zero level adjustment applies to the input spectrum received at that point, so all the real ions with lines in that region should appear higher in the list than the '__' one. Continuum adjustment matters less, but it is probably best to adopt a policy of putting the '__' last just for internal consistency.

If you use the AD option (ladd when it asks for linear, logs or emission) the the baseline, or indeed whatever profile you wish to choose, is added to the whole spectrum, so the continuum moves as well. Since this was written to allow for partially covered emission lines it is probably less appropriate. They are both matters of last resort, because the few trials done suggest that the process is pretty unstable when you start with bad guesses and try to get the zero level to help sort them out. Both work from fort.13 files, but are not wonderfully reliable from interactive input.
 

CONTINUUM LEVEL: You can adjust the continuum level using a linear multiplicative factor. It is based on false elements as tags, and the line in the fort.13 file looks like: <> <level> <redshift>N <linterm>(M) 0.0 0.00E+00 <region no>. The value of <level> is usually near unity [it is what the continuum is multiplied by at the wavelength corresponding to (1+<redshift>) * 1215.67=reference wavelength, which must be within the relevant wavelength chunk], and the redshift MUST be fixed. The <linterm> value may be fixed (usually at zero, for a constant rescaling of the continuum) or may be left to be determined by the program. The input continuum through the fitting region is multiplied by (<level> + <linterm>* (wavelength/reference wavelength - 1), so for small adjustments <level> is somewhere vaguely near unity. You can input this interactively as well. In cursor mode it asks if you want to fit the slope or not, and you fix it (at zero) by pressing the left mouse button or the central one, allow it to vary by pressing on the right mouse button. The warning regarding zero level adjustment applies here as well -- it is all too easy to have a continuum change trade off against a broad line, so it is probably best to use this only for (small) final adjustments to the parameters. If these final adjustments are not small, worry about the reliability of your results.

         [top]
 

UPPER LIMITS: If a line isn't there the program has the annoying habit of throwing it out. This is fine if you are fitting, but something of a nuisance if you are interested in an upper limit to the column density for some ion -- where you might hope to use the error estimate as a 1-sigma limit. To do this you need to know the redshift and Doppler parameter, so need information from elsewhere anyway. Then the way to get an upper limit is to create an input (fort.13) file along the lines of:
*
spect 1 5698.96 5702.88 ! CIV 1548
spect 1 5708.48 5711.60 ! CIV 1550
spect 1 5162.27 5169.1 ! SiIV 1402
*
C IV 12.7591 2.682135aa 13.87ab 0.00 1.00E+04 0 ! 1
SiIV 11.0000 2.682135AA 13.58AB 0.00 0.00E+00 0 ! 2

..if you are interested in an upper limit for SiIV, and the CIV lines are there. Then, at the start type the letters against the > prompt options:

<CR> for previous value
I - interactive setup and fit
F - run from an input file
D - display profiles from input file
? for help
option (key) (key)...
> e                  <-------------- Error estimates without iteration

lower case characters following values are treated as labels.
if the same upper case character follows another value then it will be tied to the lower case label.
isolated (un-paired) upper case characters fix values.
setup: ? help, <CR> defaults, p,v to change
>
Column density (n), logN (l) or emission (e), scalefactor
> n                  <-------------- Needs to be in linear mode.
N scalefactor is 1.00000E-14
Parameter input file, # entries? [fort.13,1]
>
etc...

The errors are calculated then without iteration, so you can use the error estimate to provide an upper limit to the column density PROVIDED THAT

(i) you have used the 'n' option above. For a barely detectable line the formal lower limit to the log column density is minus infinity, so the approximation that the errors are symmetric about the estimate breaks down.

(ii) you have not put a very low column density in for the one you want the limit for. If a low number goes in here the error estimate is low, and the value seems to flatten off at the correct value at about 1/30 of the local detection limit. We suspect this is because negative Voigt profiles (i.e. in emission) are not catered for, and it hardly seems worth putting them in.

(iii) you have fitted the line if there is one there which is not thrown out. If there really is a fittable line at the position of interest, however weak, then the error estimate alone does NOT give you the basis for an upper limit. You should take the fit value and add the error estimate (times two, or whatever you choose) to obtain  an upper limit to the column density.

         [top]

OPTIONAL EXTRAS:
 

 1. 'vp_setup.dat' (or the filename set by environment VPFSETUP), if present, over-rides internal defaults for some variables. It can contain something like:
bvalmin 0.03  Minimum b value
bvalmax 1000. Maximum b value
bgtdrop 1001. Drop any system with b .gt. this (> bvalmax, so none).
bltdrop 0.031 drop system if b value less than this AND ..
clogltdrop 14.0 .. if log column density less than this.
lastchtied v - last character for used as tied rather than total properties for a complex (later ones are for total N)

Other setup variables are:
dots 10 Minimum number of systems for which dots printed to reassure you that the program is actually doing something. To suppress the dots use 'dots 1000' or any other ridiculously large number.
sigscalemult 1.0 Multiply (data-fit) by this number to account for data smoothing e.g. from rebinning. So the chi**2 values are multiplied by no.**2
llscollim 15.0 Include Lyman limit absorption in regions where appropriate if log HI column density exceeds value.
llsbmult 3.0 Used for interpolation of transmitted flux between highest order Lyman line and Lyman limit. It actually interpolates from limit to (minimum Lyman line wavelength in table)*(1.0+llsbmult*bvalue/c).
enotify execute ./vpgti.done when finished (which might contain 'mail -s vpfit <your ID> < message' if you want mail notification). Useful if you are running it as background.
fcollallzn 19.0 Include contributions from ions with values of fik * column densities greater than this in all regions fitted.
gcursor Use cursor mode interactive input for starting values for the lines. [NOW THE DEFAULT]
keyboard Use keyboard input for initial interactive guesses (and not the cursor).
wr13sum f13.w Write restart values at end to f13.wnnnn, where nnnn is the integer part of the central wavelength of the first region in the list of fitting regions.
wr26sum f26.w m  Write the summary of results to f26.wnnnn instead of fort.26, where nnnn is the central wavelength of the first region in the list of fitting regions. If m=1, then the form of the file is f26.wnnnn.n, and m=2 or more uses the central wavelength to two decimal places.
wr25 Write final Hessian matrix to fort.25. There is no reason why you might want to do this, but it can be a useful development capability.
wrcitn Write (and update) current iteration results to junk.dat, and remove this file at successful completion.
adsplit 1.75     Parameter for auto line fitting (below)
adcontf #1 #2 #3 #4   Used in auto continuum fitting
 
 

To determine sigscalemult is appropriate for your data, display the data with IRAF splot, choose a flat lineless bit of spectrum, and mark the limits of this using 'm' (at left) and 'm' (at right) in the IRAF splot cursor. It then tells you the mean, rms, S/N. Note down the rms, then overplot ('o') the sigma array and 'm' out the same region. the mean of the sigma array divided by the rms of the data is the number which should be entered.
 
 

2. 'vp_splot.dat' or environment VPFPLOTS (sought in reverse order), if present, can be used to preset PGPLOT parameters. If you want a fairly bizarre color scheme, and the continuum and errors to be shown along with the data, then try:
at da co 7
at co co 5
at er co 2
at ax co 6
at ti co 5
at co st 2
al
<EOF>

.. this gives data color #7, continuum #5, error #2, axes #6 and tick marks #5, plots all arrays, and the continuum is shown as a dashed line (PGPLOT style #2).

3. 'vp_files.dat' in the current directory (or a file corresponding to the the environment variable VPFFILES) can contain filenames and wavelength ranges and be used to search for regions of interest. Details are here, but it is usually better to have thought about what lines you want to cover beforehand.

4. Complex fits take a lot of time, and your patience may run out before the program thinks it has converged. If you want to stop things early, then, using another window set to be in the same directory as the running program, type touch stop A zero-length file called 'stop' will be created there. At the end of each iteration VPFIT checks to see if this file exists, and if it does it deletes the 'stop' file, and halts the iterations in precisely the same way as if it had converged to the final answer. This means you can look at error estimates on the basis of where it has got to so far, and look at the plots. To restart, copy the last iteration (from screen or fort.18) to fort.13, and set it all off again. If you change your mind about stopping the iterations, just 'rm stop' before VPFIT does the check.
 

         [top]

ADDITIONAL FEATURE(!) Due to the imfort header being restricted in length, if the header is more than about 200 lines long then not all the header info is accessible to the program. This is unfortunate, since wavelength coefficients and other info is appended to the header which comes from the telescope, and some places put vast amounts of info in the header (I suppose for engineering reasons). The only general way around this is to strip out the rubbish if there are too many lines .. the usual signature of this is VPFIT's failure to pick up wavelength or scale coefficients.
 
 
 

EMISSION LINES:

Clearly a minor variant of the program could be used for fitting components to emission line profiles as well. A description of how you can do this for Gaussian emission lines is available.
 
 
 

ADDING EXTRA LINES:

If the profile generated is not a good fit to the data then what you would do by hand is add one or more extra components until you are satisified with the match between model and data. The program can do this for you by adding lines where there is the most significant departure from the fit, and splitting single features which have a significant high component within them. To invoke this mode simply include 'ad' in the first command line. So:

>vpsol

 VPFIT 2.04

 .....

  options:   <CR> for previous value
  I - interactive setup and fit
  F - run from an input file
  D - display profiles from input file
  ? for help
  option (key) (key)...
> I ad                      <----------- the 'ad' flag activates this bit.

 Add lines if prob(tot) < 0.0100   or prob(KS) < 0.0100

.....
.. then choose a messy single region, stick one line in it as a starter, and watch the fit develop. What you should see is the fit improving before your eyes..

If you like to stop adding lines after a particular normalized chi-squared, or fit probability, is reached, then just use e.g.

> I ad 1.25

and it will stop adding lines when the normalized chi-squared is less than 1.25. If the number following the 'ad' is less than 1, it is interpreted as a probability, so you can modify the default of 0.01 if you wish. Note that if you have a version of the program dated before 12 Sept, 2000, this option was not implemented so if you try it lines will be added forever......
 

If you get fed up with an individual iteration, 'touch stop' in the same directory as the one you are running the program from will terminate the current iteration. 'touch stopit' will end the iteration sequence with the current one.
 

As things stand the default is to add a line with species and rest wavelength of the line nearest existing component, unless the species is hydrogen in which cas Ly-a is added. You can arrange that Ly-a be the line added every time by putting

>I ad Ly-a

instead of the arrowed line above. If you wish to mix and match you have to do it by hand.

The program either adds absorption lines where they are most needed, or splits them where there is an emission peak relative to the local fit. The choice of which to do depends on an obscure parameter

adsplit 1.75           in vp_setup.dat,

which is the significance ratio an emission peak has to have over the biggest residual absorption before an attempt is made to split a line. This bias is built in for speed -- guesses for line splitting are less easy to get right, so it takes longer to iterate when lines are split, and often adding needed components to shoulders of lines drives the result the same way.

         [top]

REMOVING UNWANTED LINES:

You can do this as well as adding new ones, and do both at the same time.  To activate this option you need to expand the first command line to include 're' (for remove lines with large errors) or 'il' (remove ill-conditioned lines), or both. So the command line can be

I ad re il

for interactive input, adding and removing lines. Of course you can get into giant loops doing this, adding and removing the same lines.
 
 

AUTO CONTINUUM LEVEL:

The continuum level can be adjusted using the line

<>  <factor1> <redshift>SX <factor2>

in the fort.13 file. This provides a multiplicative linear fit to a factor which multiplies the input continuum. So factor1=1.0 and factor2=0.0 is an excellent start.

The feature which is important for the auto-line fitting is that after a while you may wish to allow the continuum to float free. You can do this automatically if the vp_setup.dat contains the line

adcontf #1 #2 #3 #4

A continuum fit will be tried when EITHER the normalized chi^2 < #1 OR there have been #2 consecutive added lines all with b<#3 and logN<#4. [I'm not sure how well this works yet]. The defaults are 1.8 10 5.0 12.5.
 

SIGMA SCALING:

Because it is a nuisance having to fit a fairly arbitrary function, the handling of sigma scaling has changed. You can now, instead of the polynomial fit (which should still be still OK), enter a table of wavelength scalefactor pairs, one pair per line, where the scalefactor is the sigma array value divided by the local rms at the wavelength point in question. The program linearly interpolates between the two nearest table values to the wavelength in question.

The table is referred to in the header in the same way as before, the program distinguishes between input types by seeing how many values there are per line (1 => polynomial, >1 => table).

In fact, a tabular form is more appropriate generally if spectra with different coverage (but similar resolution) are added together, since the rebinning (and hence scalefactor) depend on the regions used, and how the bins relate to the raw data pixels.

         [top]
 

OTHER FEATURES:

As with many programs, there are continual developments and (hopefully) enhancements. If a more recent version exists, and works at least tolerably well, it is in the anonymous ftp space as vpbeta.tar.gz. A description of what it will do in addition to the above is also given here. There is also a rather less well documented program rdgen which can be used to look at the data, and what documentation there is for it. The source is included in the tarfile. It carries an even greater health warning....
 
 

.. AND FINALLY Of course we do our best to make sure that the program gives accurate results, and try to remove bugs etc., but of course we can't guarantee anything. It has been checked pretty well where the line FWHM is more than about 1/2 a pixel, and when convolved with the instrument profile is well sampled by the detector array. If this is not true for your data then you may spur us into modifying the way the convolution is done (though it will slow the program a bit), so let us know. Also if you have any problems, inconsistencies, e-mail to the contact address at the top. It might also be worth checking the list of changes to make sure that the version you have is not out of date and the problem (hopefully) already solved. There is also a summary of some proposed changes, but this is of more academic interest. Some of these might have been implemented in the development version, - if you are prepared to risk it, ask for access. If you make any improvements yourself we'd appreciate knowing about them and incorporating them.

For those who have provided help, improvements and suggestions -- notably Alec Boksenberg, Limin Lu, Michael Murphy, Phil Outram, Michael Rauch, Wal Sargent, Ray Weymann and Gerry Williger -- many thanks.