/usr/share/gnudatalanguage/astrolib/pca.pro is in gdl-astrolib 2018.02.16+dfsg-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 | PRO PCA, data, eigenval, eigenvect, percentages, proj_obj, proj_atr, $
MATRIX=AM,TEXTOUT=textout,COVARIANCE=cov,SSQ=ssq,SILENT=silent
;+
; NAME:
; PCA
;
; PURPOSE:
; Carry out a Principal Components Analysis (Karhunen-Loeve Transform)
; EXPLANATION:
; Results can be directed to the screen, a file, or output variables
; See notes below for comparison with the intrinsic IDL function PCOMP.
;
; Harris Geospatial has a video/blog post on using pca.pro at
; http://tinyurl.com/h6ky6qy . Also see David Fanning's discussion of
; PCA analysis with IDL ( http://www.idlcoyote.com/code_tips/pca.html )
;
; CALLING SEQUENCE:
; PCA, data, eigenval, eigenvect, percentages, proj_obj, proj_atr,
; [MATRIX =, TEXTOUT = ,/COVARIANCE, /SSQ, /SILENT ]
;
; INPUT PARAMETERS:
; data - 2-d data matrix, data(i,j) contains the jth attribute value
; for the ith object in the sample. If N_OBJ is the total
; number of objects (rows) in the sample, and N_ATTRIB is the
; total number of attributes (columns) then data should be
; dimensioned N_OBJ x N_ATTRIB.
;
; OPTIONAL INPUT KEYWORD PARAMETERS:
; /COVARIANCE - if this keyword is set, then the PCA will be carried out
; on the covariance matrix (rare), the default is to use the
; correlation matrix
; /SILENT - If this keyword is set, then no output is printed
; /SSQ - if this keyword is set, then the PCA will be carried out on
; on the sums-of-squares & cross-products matrix (rare)
; TEXTOUT - Controls print output device, defaults to !TEXTOUT
;
; textout=1 TERMINAL using /more option
; textout=2 TERMINAL without /more option
; textout=3 <program>.prt
; textout=4 laser.tmp
; textout=5 user must open file
; textout = filename (default extension of .prt)
;
; OPTIONAL OUTPUT PARAMETERS:
; eigenval - N_ATTRIB element vector containing the sorted eigenvalues
; eigenvect - N_ATRRIB x N_ATTRIB matrix containing the corresponding
; eigenvectors
; percentages - N_ATTRIB element containing the cumulative percentage
; variances associated with the principal components
; proj_obj - N_OBJ by N_ATTRIB matrix containing the projections of the
; objects on the principal components
; proj_atr - N_ATTRIB by N_ATTRIB matrix containing the projections of
; the attributes on the principal components
;
; OPTIONAL OUTPUT PARAMETER
; MATRIX = analysed matrix, either the covariance matrix if /COVARIANCE
; is set, the "sum of squares and cross-products" matrix if
; /SSQ is set, or the (by default) correlation matrix. Matrix
; will have dimensions N_ATTRIB x N_ATTRIB
;
; NOTES:
; This procedure performs Principal Components Analysis (Karhunen-Loeve
; Transform) according to the method described in "Multivariate Data
; Analysis" by Murtagh & Heck [Reidel : Dordrecht 1987], pp. 33-48.
; See http://www.classification-society.org/csna/mda-sw/pca.f
;
; Keywords /COVARIANCE and /SSQ are mutually exclusive.
;
; The printout contains only (at most) the first seven principle
; eigenvectors. However, the output variables EIGENVECT contain
; all the eigenvectors
;
; Different authors scale the covariance matrix in different ways.
; The eigenvalues output by PCA may have to be scaled by 1/N_OBJ or
; 1/(N_OBJ-1) to agree with other calculations when /COVAR is set.
;
; PCA uses the non-standard system variables !TEXTOUT and !TEXTUNIT.
; These are automatically added if not originally present.
;
; The intrinsic IDL function PCOMP duplicates most of the
; functionality of PCA, but uses different conventions and
; normalizations. Note the following:
;
; (1) PCOMP requires a N_ATTRIB x N_OBJ input array; this is the transpose
; of what PCA expects
; (2) PCA uses standardized variables for the correlation matrix: the input
; vectors are set to a mean of zero and variance of one and divided by
; sqrt(n); use the /STANDARDIZE keyword to PCOMP for a direct comparison.
; (3) PCA (unlike PCOMP) normalizes the eigenvectors by the square root
; of the eigenvalues.
; (4) PCA returns cumulative percentages; the VARIANCES keyword of PCOMP
; returns the variance in each variable
; (5) PCOMP divides the eigenvalues by (1/N_OBJ-1) when the covariance matrix
; is used.
;
; EXAMPLE:
; Perform a PCA analysis on the covariance matrix of a data matrix, DATA,
; and write the results to a file
;
; IDL> PCA, data, /COVAR, t = 'pca.dat'
;
; Perform a PCA analysis on the correlation matrix. Suppress all
; printing, and save the eigenvectors and eigenvalues in output variables
;
; IDL> PCA, data, eigenval, eigenvect, /SILENT
;
; PROCEDURES CALLED:
; TEXTOPEN, TEXTCLOSE
;
; REVISION HISTORY:
; Immanuel Freedman (after Murtagh F. and Heck A.). December 1993
; Wayne Landsman, modified I/O December 1993
; Fix MATRIX output, remove GOTO statements W. Landsman August 1998
; Changed some index variable to type LONG W. Landsman March 2000
; Fix error in computation of proj_atr, see Jan 1990 fix in
; http://www.classification-society.org/csna/mda-sw/pca.f W. Landsman Feb 2008
;-
compile_opt idl2
; Constants
TOLERANCE = 1.0E-5 ; are array elements near-zero ?
; Dispatch table
IF N_PARAMS() EQ 0 THEN BEGIN
print,'Syntax - PCA, data, [eigenval, eigenvect, percentages, proj_obj, proj_atr,'
print,' [MATRIX =, /COVARIANCE, /SSQ, /SILENT, TEXTOUT=]'
RETURN
ENDIF
; Constants
TOLERANCE = 1.0E-5 ; are array elements near-zero ?
Catch, theError
IF theError NE 0 then begin
Catch,/Cancel
void = cgErrorMsg(/quiet)
RETURN
ENDIF
if size(data,/N_dimen) NE 2 THEN BEGIN
HELP,data
MESSAGE,'ERROR - Data matrix is not two-dimensional'
ENDIF
dimen = size(data,/dimen)
Nobj = dimen[0] & Mattr = dimen[1] ;Number of objects and attributes
IF KEYWORD_SET(cov) THEN BEGIN
msg = 'Covariance matrix will be analyzed'
; form column-means
column_mean = total( data,1 )/Nobj
temp = replicate(1.0, Nobj)
X = (data - temp # transpose(column_mean))
ENDIF ELSE $
IF KEYWORD_SET(ssq) THEN BEGIN
msg = 'Sum-of-squares & cross-products matrix will be analyzed'
X = data
ENDIF ELSE BEGIN
msg = 'Default: Correlation matrix will be analyzed'
; form column-means
temp = replicate( 1.0, Nobj )
column_mean = (temp # data)/ Nobj
X = (data - temp # transpose(column_mean))
S = sqrt(temp # (X*X)) & X = X/(temp # S)
ENDELSE
A = transpose(X) # X
if arg_present(AM) then AM = A
; Carry out eigenreduction
trired, A, D, E ; D contains diagonal, E contains off-diagonal
triql, D, E, A ; D contains the eigen-values, A(*,i) -vectors
; Use TOLERANCE to decide if eigenquantities are sufficiently near zero
index = where(abs(D) LE TOLERANCE*MAX(abs(D)),count)
if count NE 0 THEN D[index]=0
index = where(abs(A) LE TOLERANCE*MAX(abs(A)),count)
if count NE 0 THEN A[index]=0
index = sort(D) ; Order by increasing eigenvalue
D = D[index] & E=E[index]
A = A[*,index]
; Eigenvalues expressed as percentage variance and ...
W1 = 100.0 * reverse(D)/total(D)
;... Cumulative percentage variance
W = total(W1, /cumul)
;Define returned parameters
eigenval = reverse(D)
eigenvect = reverse(transpose(A))
percentages = W
; Output eigen-values and -vectors
if ~keyword_set(SILENT) then begin
; Open output file
textopen,'PCA', TEXTOUT = textout
printf,!TEXTUNIT,'PCA: ' + systime()
sz1 = strtrim( Nobj,2) & sz2 = strtrim( Mattr, 2 )
printf,!TEXTUNIT, 'Data matrix has '+ sz1 + ' objects with up to ' + $
sz2 + ' attributes'
printf,!TEXTUNIT, msg
printf,!TEXTUNIT, " "
printf,!TEXTUNIT, $
' Eigenvalues As Percentages Cumul. percentages'
for i = 0L, Mattr-1 do $
printf,!TEXTUNIT, eigenval[i], W1[i], percentages[i] ,f = '(3f15.4)'
printf,!TEXTUNIT," "
printf,!TEXTUNIT, 'Corresponding eigenvectors follow...'
Mprint = Mattr < 7
header = ' VBLE '
for i = 1, Mprint do header = header + ' EV-' + strtrim(i,2) + ' '
printf,!TEXTUNIT, header
for i = 1L, Mattr do printf,!TEXTUNIT, $
i, eigenvect[0:Mprint-1,i-1],f='(i4,7f9.4)'
endif
; Obtain projection of row-point on principal axes (Murtagh & Heck convention)
projx = X # A
; Use TOLERANCE again...
index = where(abs(projx) LE TOLERANCE*MAX(abs(projx)),count)
if count NE 0 THEN projx[index]=0
proj_obj = reverse( transpose(projx) )
if ~keyword_set( SILENT ) then begin
printf,!TEXTUNIT,' '
printf,!TEXTUNIT, 'Projection of objects on principal axes ...'
printf,!TEXTUNIT,' '
header = ' VBLE '
for i = 1, Mprint do header = header + 'PROJ-' + strtrim(i,2) + ' '
printf,!TEXTUNIT, header
for i = 0L, Nobj-1 do printf,!TEXTUNIT, $
i+1, proj_obj[0:Mprint-1,i], f='(i4,7f9.4)'
endif
; Obtain projection of column-points on principal axes
projy = transpose(projx)#X
; Use TOLERANCE again...
index = where(abs(projy) LE TOLERANCE*MAX(abs(projy)),count)
if count NE 0 THEN projy[index] = 0
; scale by square root of eigenvalues...
temp = replicate( 1.0, Mattr )
proj_atr = reverse(projy)/(sqrt(eigenval)#temp)
if ~keyword_set( SILENT ) then begin
printf,!TEXTUNIT,' '
printf,!TEXTUNIT,'Projection of attributes on principal axes ...'
printf,!TEXTUNIT,' '
printf,!TEXTUNIT, header
for i = 0L, Mattr-1 do printf,!TEXTUNIT, $
i+1, proj_atr[0:Mprint-1,i], f='(i4,7f9.4)'
textclose, TEXTOUT = textout ; Close output file
endif
RETURN
END
|