LAPACK  3.7.0
LAPACK: Linear Algebra PACKage
subroutine cgejsv ( character*1  JOBA,
character*1  JOBU,
character*1  JOBV,
character*1  JOBR,
character*1  JOBT,
character*1  JOBP,
integer  M,
integer  N,
complex, dimension( lda, * )  A,
integer  LDA,
real, dimension( n )  SVA,
complex, dimension( ldu, * )  U,
integer  LDU,
complex, dimension( ldv, * )  V,
integer  LDV,
complex, dimension( lwork )  CWORK,
integer  LWORK,
real, dimension( lrwork )  RWORK,
integer  LRWORK,
integer, dimension( * )  IWORK,
integer  INFO 
)

CGEJSV

Download CGEJSV + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 CGEJSV computes the singular value decomposition (SVD) of a complex M-by-N
 matrix [A], where M >= N. The SVD of [A] is written as

              [A] = [U] * [SIGMA] * [V]^*,

 where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
 diagonal elements, [U] is an M-by-N (or M-by-M) unitary matrix, and
 [V] is an N-by-N unitary matrix. The diagonal elements of [SIGMA] are
 the singular values of [A]. The columns of [U] and [V] are the left and
 the right singular vectors of [A], respectively. The matrices [U] and [V]
 are computed and stored in the arrays U and V, respectively. The diagonal
 of [SIGMA] is computed and stored in the array SVA.

Arguments:

Parameters
[in]JOBA
          JOBA is CHARACTER*1
         Specifies the level of accuracy:
       = 'C': This option works well (high relative accuracy) if A = B * D,
              with well-conditioned B and arbitrary diagonal matrix D.
              The accuracy cannot be spoiled by COLUMN scaling. The
              accuracy of the computed output depends on the condition of
              B, and the procedure aims at the best theoretical accuracy.
              The relative error max_{i=1:N}|d sigma_i| / sigma_i is
              bounded by f(M,N)*epsilon* cond(B), independent of D.
              The input matrix is preprocessed with the QRF with column
              pivoting. This initial preprocessing and preconditioning by
              a rank revealing QR factorization is common for all values of
              JOBA. Additional actions are specified as follows:
       = 'E': Computation as with 'C' with an additional estimate of the
              condition number of B. It provides a realistic error bound.
       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
              D1, D2, and well-conditioned matrix C, this option gives
              higher accuracy than the 'C' option. If the structure of the
              input matrix is not known, and relative accuracy is
              desirable, then this option is advisable. The input matrix A
              is preprocessed with QR factorization with FULL (row and
              column) pivoting.
       = 'G'  Computation as with 'F' with an additional estimate of the
              condition number of B, where A=B*D. If A has heavily weighted
              rows, then using this condition number gives too pessimistic
              error bound.
       = 'A': Small singular values are not well determined by the data 
              and are considered as noisy; the matrix is treated as
              numerically rank defficient. The error in the computed
              singular values is bounded by f(m,n)*epsilon*||A||.
              The computed SVD A = U * S * V^* restores A up to
              f(m,n)*epsilon*||A||.
              This gives the procedure the licence to discard (set to zero)
              all singular values below N*epsilon*||A||.
       = 'R': Similar as in 'A'. Rank revealing property of the initial
              QR factorization is used do reveal (using triangular factor)
              a gap sigma_{r+1} < epsilon * sigma_r in which case the
              numerical RANK is declared to be r. The SVD is computed with
              absolute error bounds, but more accurately than with 'A'.
[in]JOBU
          JOBU is CHARACTER*1
         Specifies whether to compute the columns of U:
       = 'U': N columns of U are returned in the array U.
       = 'F': full set of M left sing. vectors is returned in the array U.
       = 'W': U may be used as workspace of length M*N. See the description
              of U.
       = 'N': U is not computed.
[in]JOBV
          JOBV is CHARACTER*1
         Specifies whether to compute the matrix V:
       = 'V': N columns of V are returned in the array V; Jacobi rotations
              are not explicitly accumulated.
       = 'J': N columns of V are returned in the array V, but they are
              computed as the product of Jacobi rotations, if JOBT .EQ. 'N'.
       = 'W': V may be used as workspace of length N*N. See the description
              of V.
       = 'N': V is not computed.
[in]JOBR
          JOBR is CHARACTER*1
         Specifies the RANGE for the singular values. Issues the licence to
         set to zero small positive singular values if they are outside
         specified range. If A .NE. 0 is scaled so that the largest singular
         value of c*A is around SQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
         the licence to kill columns of A whose norm in c*A is less than
         SQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
         where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
       = 'N': Do not kill small columns of c*A. This option assumes that
              BLAS and QR factorizations and triangular solvers are
              implemented to work in that range. If the condition of A
              is greater than BIG, use CGESVJ.
       = 'R': RESTRICTED range for sigma(c*A) is [SQRT(SFMIN), SQRT(BIG)]
              (roughly, as described above). This option is recommended.
                                             ===========================
         For computing the singular values in the FULL range [SFMIN,BIG]
         use CGESVJ.
[in]JOBT
          JOBT is CHARACTER*1
         If the matrix is square then the procedure may determine to use
         transposed A if A^* seems to be better with respect to convergence.
         If the matrix is not square, JOBT is ignored.
         The decision is based on two values of entropy over the adjoint
         orbit of A^* * A. See the descriptions of WORK(6) and WORK(7).
       = 'T': transpose if entropy test indicates possibly faster
         convergence of Jacobi process if A^* is taken as input. If A is
         replaced with A^*, then the row pivoting is included automatically.
       = 'N': do not speculate.
         The option 'T' can be used to compute only the singular values, or
         the full SVD (U, SIGMA and V). For only one set of singular vectors
         (U or V), the caller should provide both U and V, as one of the
         matrices is used as workspace if the matrix A is transposed.
         The implementer can easily remove this constraint and make the
         code more complicated. See the descriptions of U and V.
         In general, this option is considered experimental, and 'N'; should
         be preferred. This is subject to changes in the future.
[in]JOBP
          JOBP is CHARACTER*1
         Issues the licence to introduce structured perturbations to drown
         denormalized numbers. This licence should be active if the
         denormals are poorly implemented, causing slow computation,
         especially in cases of fast convergence (!). For details see [1,2].
         For the sake of simplicity, this perturbations are included only
         when the full SVD or only the singular values are requested. The
         implementer/user can easily add the perturbation for the cases of
         computing one set of singular vectors.
       = 'P': introduce perturbation
       = 'N': do not perturb
[in]M
          M is INTEGER
         The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
         The number of columns of the input matrix A. M >= N >= 0.
[in,out]A
          A is COMPLEX array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is REAL array, dimension (N)
          On exit,
          - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
            computation SVA contains Euclidean column norms of the
            iterated matrices in the array A.
          - For WORK(1) .NE. WORK(2): The singular values of A are
            (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
            sigma_max(A) overflows or if small singular values have been
            saved from underflow by scaling the input matrix A.
          - If JOBR='R' then some of the singular values may be returned
            as exact zeros obtained by "set to zero" because they are
            below the numerical rank threshold or are denormalized numbers.
[out]U
          U is COMPLEX array, dimension ( LDU, N ) or ( LDU, M )
          If JOBU = 'U', then U contains on exit the M-by-N matrix of
                         the left singular vectors.
          If JOBU = 'F', then U contains on exit the M-by-M matrix of
                         the left singular vectors, including an ONB
                         of the orthogonal complement of the Range(A).
          If JOBU = 'W'  .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
                         then U is used as workspace if the procedure
                         replaces A with A^*. In that case, [V] is computed
                         in U as left singular vectors of A^* and then
                         copied back to the V array. This 'W' option is just
                         a reminder to the caller that in this case U is
                         reserved as workspace of length N*N.
          If JOBU = 'N'  U is not referenced, unless JOBT='T'.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U,  LDU >= 1.
          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.
[out]V
          V is COMPLEX array, dimension ( LDV, N )
          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
                         then V is used as workspace if the pprocedure
                         replaces A with A^*. In that case, [U] is computed
                         in V as right singular vectors of A^* and then
                         copied back to the U array. This 'W' option is just
                         a reminder to the caller that in this case V is
                         reserved as workspace of length N*N.
          If JOBV = 'N'  V is not referenced, unless JOBT='T'.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V,  LDV >= 1.
          If JOBV = 'V' or 'J' or 'W', then LDV >= N.
[out]CWORK
          CWORK is COMPLEX array, dimension (MAX(2,LWORK))
          If the call to CGEJSV is a workspace query (indicated by LWORK=-1 or
          LRWORK=-1), then on exit CWORK(1) contains the required length of 
          CWORK for the job parameters used in the call.
[in]LWORK
          LWORK is INTEGER
          Length of CWORK to confirm proper allocation of workspace.
          LWORK depends on the job:

          1. If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
            1.1 .. no scaled condition estimate required (JOBA.NE.'E'.AND.JOBA.NE.'G'):
               LWORK >= 2*N+1. This is the minimal requirement.
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= N + (N+1)*NB. Here NB is the optimal
               block size for CGEQP3 and CGEQRF.
               In general, optimal LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CGEQRF), LWORK(CGESVJ)).        
            1.2. .. an estimate of the scaled condition number of A is
               required (JOBA='E', or 'G'). In this case, LWORK the minimal
               requirement is LWORK >= N*N + 2*N.
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= max(N+(N+1)*NB, N*N+2*N)=N**2+2*N.
               In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CGEQRF), LWORK(CGESVJ),
                            N*N+LWORK(CPOCON)).
          2. If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
             (JOBU.EQ.'N')
            2.1   .. no scaled condition estimate requested (JOBE.EQ.'N'):    
            -> the minimal requirement is LWORK >= 3*N.
            -> For optimal performance, 
               LWORK >= max(N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
               where NB is the optimal block size for CGEQP3, CGEQRF, CGELQ,
               CUNMLQ. In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3), N+LWORK(CGESVJ),
                       N+LWORK(CGELQF), 2*N+LWORK(CGEQRF), N+LWORK(CUNMLQ)).
            2.2 .. an estimate of the scaled condition number of A is
               required (JOBA='E', or 'G').
            -> the minimal requirement is LWORK >= 3*N.      
            -> For optimal performance, 
               LWORK >= max(N+(N+1)*NB, 2*N,2*N+N*NB)=2*N+N*NB,
               where NB is the optimal block size for CGEQP3, CGEQRF, CGELQ,
               CUNMLQ. In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3), LWORK(CPOCON), N+LWORK(CGESVJ),
                       N+LWORK(CGELQF), 2*N+LWORK(CGEQRF), N+LWORK(CUNMLQ)).   
          3. If SIGMA and the left singular vectors are needed
            3.1  .. no scaled condition estimate requested (JOBE.EQ.'N'):
            -> the minimal requirement is LWORK >= 3*N.
            -> For optimal performance:
               if JOBU.EQ.'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
               where NB is the optimal block size for CGEQP3, CGEQRF, CUNMQR.
               In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3), 2*N+LWORK(CGEQRF), N+LWORK(CUNMQR)). 
            3.2  .. an estimate of the scaled condition number of A is
               required (JOBA='E', or 'G').
            -> the minimal requirement is LWORK >= 3*N.
            -> For optimal performance:
               if JOBU.EQ.'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
               where NB is the optimal block size for CGEQP3, CGEQRF, CUNMQR.
               In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CPOCON),
                        2*N+LWORK(CGEQRF), N+LWORK(CUNMQR)).

          4. If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and
            4.1. if JOBV.EQ.'V'
               the minimal requirement is LWORK >= 5*N+2*N*N.
            4.2. if JOBV.EQ.'J' the minimal requirement is
               LWORK >= 4*N+N*N.
            In both cases, the allocated CWORK can accommodate blocked runs
            of CGEQP3, CGEQRF, CGELQF, CUNMQR, CUNMLQ.
 
          If the call to CGEJSV is a workspace query (indicated by LWORK=-1 or
          LRWORK=-1), then on exit CWORK(1) contains the optimal and CWORK(2) contains the
          minimal length of CWORK for the job parameters used in the call.        
[out]RWORK
          RWORK is REAL array, dimension (MAX(7,LWORK))
          On exit,
          RWORK(1) = Determines the scaling factor SCALE = RWORK(2) / RWORK(1)
                    such that SCALE*SVA(1:N) are the computed singular values
                    of A. (See the description of SVA().)
          RWORK(2) = See the description of RWORK(1).
          RWORK(3) = SCONDA is an estimate for the condition number of
                    column equilibrated A. (If JOBA .EQ. 'E' or 'G')
                    SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
                    It is computed using SPOCON. It holds
                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
                    where R is the triangular factor from the QRF of A.
                    However, if R is truncated and the numerical rank is
                    determined to be strictly smaller than N, SCONDA is
                    returned as -1, thus indicating that the smallest
                    singular values might be lost.

          If full SVD is needed, the following two condition numbers are
          useful for the analysis of the algorithm. They are provied for
          a developer/implementer who is familiar with the details of
          the method.

          RWORK(4) = an estimate of the scaled condition number of the
                    triangular factor in the first QR factorization.
          RWORK(5) = an estimate of the scaled condition number of the
                    triangular factor in the second QR factorization.
          The following two parameters are computed if JOBT .EQ. 'T'.
          They are provided for a developer/implementer who is familiar
          with the details of the method.
          RWORK(6) = the entropy of A^* * A :: this is the Shannon entropy
                    of diag(A^* * A) / Trace(A^* * A) taken as point in the
                    probability simplex.
          RWORK(7) = the entropy of A * A^*. (See the description of RWORK(6).)
          If the call to CGEJSV is a workspace query (indicated by LWORK=-1 or
          LRWORK=-1), then on exit RWORK(1) contains the required length of
          RWORK for the job parameters used in the call.
[in]LRWORK
          LRWORK is INTEGER
          Length of RWORK to confirm proper allocation of workspace.
          LRWORK depends on the job:

       1. If only the singular values are requested i.e. if
          LSAME(JOBU,'N') .AND. LSAME(JOBV,'N')
          then:
          1.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
               then: LRWORK = max( 7, 2 * M ).
          1.2. Otherwise, LRWORK  = max( 7,  N ).
       2. If singular values with the right singular vectors are requested
          i.e. if
          (LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) .AND.
          .NOT.(LSAME(JOBU,'U').OR.LSAME(JOBU,'F'))
          then:
          2.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, 2 * M ).
          2.2. Otherwise, LRWORK  = max( 7,  N ).
       3. If singular values with the left singular vectors are requested, i.e. if
          (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
          .NOT.(LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
          then:
          3.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, 2 * M ).
          3.2. Otherwise, LRWORK  = max( 7,  N ).
       4. If singular values with both the left and the right singular vectors
          are requested, i.e. if
          (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
          (LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
          then:
          4.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, 2 * M ).
          4.2. Otherwise, LRWORK  = max( 7, N ).
 
          If, on entry, LRWORK = -1 ot LWORK=-1, a workspace query is assumed and 
          the length of RWORK is returned in RWORK(1). 
[out]IWORK
          IWORK is INTEGER array, of dimension at least 4, that further depends
          on the job:
 
          1. If only the singular values are requested then:
             If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
             then the length of IWORK is N+M; otherwise the length of IWORK is N.
          2. If the singular values and the right singular vectors are requested then:
             If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
             then the length of IWORK is N+M; otherwise the length of IWORK is N. 
          3. If the singular values and the left singular vectors are requested then:
             If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
             then the length of IWORK is N+M; otherwise the length of IWORK is N. 
          4. If the singular values with both the left and the right singular vectors
             are requested, then:      
             4.1. If LSAME(JOBV,'J') the length of IWORK is determined as follows:
                  If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
                  then the length of IWORK is N+M; otherwise the length of IWORK is N. 
             4.2. If LSAME(JOBV,'V') the length of IWORK is determined as follows:
                  If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
                  then the length of IWORK is 2*N+M; otherwise the length of IWORK is 2*N.
        
          On exit,
          IWORK(1) = the numerical rank determined after the initial
                     QR factorization with pivoting. See the descriptions
                     of JOBA and JOBR.
          IWORK(2) = the number of the computed nonzero singular values
          IWORK(3) = if nonzero, a warning message:
                     If IWORK(3).EQ.1 then some of the column norms of A
                     were denormalized floats. The requested high accuracy
                     is not warranted by the data.
          IWORK(4) = 1 or -1. If IWORK(4) .EQ. 1, then the procedure used A^* to
                     do the job as specified by the JOB parameters.
          If the call to CGEJSV is a workspace query (indicated by LWORK .EQ. -1 and 
          LRWORK .EQ. -1), then on exit IWORK(1) contains the required length of 
          IWORK for the job parameters used in the call.
[out]INFO
          INFO is INTEGER
           < 0  : if INFO = -i, then the i-th argument had an illegal value.
           = 0 :  successful exit;
           > 0 :  CGEJSV  did not converge in the maximal allowed number
                  of sweeps. The computed values may be inaccurate.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
June 2016
Further Details:
  CGEJSV implements a preconditioned Jacobi SVD algorithm. It uses CGEQP3,
  CGEQRF, and CGELQF as preprocessors and preconditioners. Optionally, an
  additional row pivoting can be used as a preprocessor, which in some
  cases results in much higher accuracy. An example is matrix A with the
  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
  diagonal matrices and C is well-conditioned matrix. In that case, complete
  pivoting in the first QR factorizations provides accuracy dependent on the
  condition number of C, and independent of D1, D2. Such higher accuracy is
  not completely understood theoretically, but it works well in practice.
  Further, if A can be written as A = B*D, with well-conditioned B and some
  diagonal D, then the high accuracy is guaranteed, both theoretically and
  in software, independent of D. For more details see [1], [2].
     The computational range for the singular values can be the full range
  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
  & LAPACK routines called by CGEJSV are implemented to work in that range.
  If that is not the case, then the restriction for safe computation with
  the singular values in the range of normalized IEEE numbers is that the
  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
  overflow. This code (CGEJSV) is best used in this restricted range,
  meaning that singular values of magnitude below ||A||_2 / SLAMCH('O') are
  returned as zeros. See JOBR for details on this.
     Further, this implementation is somewhat slower than the one described
  in [1,2] due to replacement of some non-LAPACK components, and because
  the choice of some tuning parameters in the iterative part (CGESVJ) is
  left to the implementer on a particular machine.
     The rank revealing QR factorization (in this code: CGEQP3) should be
  implemented as in [3]. We have a new version of CGEQP3 under development
  that is more robust than the current one in LAPACK, with a cleaner cut in
  rank deficient cases. It will be available in the SIGMA library [4].
  If M is much larger than N, it is obvious that the initial QRF with
  column pivoting can be preprocessed by the QRF without pivoting. That
  well known trick is not used in CGEJSV because in some cases heavy row
  weighting can be treated with complete pivoting. The overhead in cases
  M much larger than N is then only due to pivoting, but the benefits in
  terms of accuracy have prevailed. The implementer/user can incorporate
  this extra QRF step easily. The implementer can also improve data movement
  (matrix transpose, matrix copy, matrix transposed copy) - this
  implementation of CGEJSV uses only the simplest, naive data movement.
Contributor:
Zlatko Drmac (Zagreb, Croatia)
References:
 [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
     LAPACK Working note 169.
 [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
     LAPACK Working note 170.
 [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
     factorization software - a case study.
     ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
     LAPACK Working note 176.
 [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
     QSVD, (H,K)-SVD computations.
     Department of Mathematics, University of Zagreb, 2008, 2016.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 570 of file cgejsv.f.

570 *
571 * -- LAPACK computational routine (version 3.7.0) --
572 * -- LAPACK is a software package provided by Univ. of Tennessee, --
573 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
574 * December 2016
575 *
576 * .. Scalar Arguments ..
577  IMPLICIT NONE
578  INTEGER info, lda, ldu, ldv, lwork, lrwork, m, n
579 * ..
580 * .. Array Arguments ..
581  COMPLEX a( lda, * ), u( ldu, * ), v( ldv, * ), cwork( lwork )
582  REAL sva( n ), rwork( lrwork )
583  INTEGER iwork( * )
584  CHARACTER*1 joba, jobp, jobr, jobt, jobu, jobv
585 * ..
586 *
587 * ===========================================================================
588 *
589 * .. Local Parameters ..
590  REAL zero, one
591  parameter( zero = 0.0e0, one = 1.0e0 )
592  COMPLEX czero, cone
593  parameter( czero = ( 0.0e0, 0.0e0 ), cone = ( 1.0e0, 0.0e0 ) )
594 * ..
595 * .. Local Scalars ..
596  COMPLEX ctemp
597  REAL aapp, aaqq, aatmax, aatmin, big, big1, cond_ok,
598  $ condr1, condr2, entra, entrat, epsln, maxprj, scalem,
599  $ sconda, sfmin, small, temp1, uscal1, uscal2, xsc
600  INTEGER ierr, n1, nr, numrank, p, q, warning
601  LOGICAL almort, defr, errest, goscal, jracc, kill, lquery,
602  $ lsvec, l2aber, l2kill, l2pert, l2rank, l2tran, noscal,
603  $ rowpiv, rsvec, transp
604 *
605  INTEGER optwrk, minwrk, minrwrk, miniwrk
606  INTEGER lwcon, lwlqf, lwqp3, lwqrf, lwunmlq, lwunmqr, lwunmqrm,
607  $ lwsvdj, lwsvdjv, lrwqp3, lrwcon, lrwsvdj, iwoff
608  INTEGER lwrk_cgelqf, lwrk_cgeqp3, lwrk_cgeqp3n, lwrk_cgeqrf,
609  $ lwrk_cgesvj, lwrk_cgesvjv, lwrk_cgesvju, lwrk_cunmlq,
610  $ lwrk_cunmqr, lwrk_cunmqrm
611 * ..
612 * .. Local Arrays
613  COMPLEX cdummy(1)
614  REAL rdummy(1)
615 *
616 * .. Intrinsic Functions ..
617  INTRINSIC abs, cmplx, conjg, alog, max, min, REAL, nint, sqrt
618 * ..
619 * .. External Functions ..
620  REAL slamch, scnrm2
621  INTEGER isamax, icamax
622  LOGICAL lsame
623  EXTERNAL isamax, icamax, lsame, slamch, scnrm2
624 * ..
625 * .. External Subroutines ..
626  EXTERNAL slassq, ccopy, cgelqf, cgeqp3, cgeqrf, clacpy, clapmr,
629  $ xerbla
630 *
631  EXTERNAL cgesvj
632 * ..
633 *
634 * Test the input arguments
635 *
636  lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
637  jracc = lsame( jobv, 'J' )
638  rsvec = lsame( jobv, 'V' ) .OR. jracc
639  rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
640  l2rank = lsame( joba, 'R' )
641  l2aber = lsame( joba, 'A' )
642  errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
643  l2tran = lsame( jobt, 'T' ) .AND. ( m .EQ. n )
644  l2kill = lsame( jobr, 'R' )
645  defr = lsame( jobr, 'N' )
646  l2pert = lsame( jobp, 'P' )
647 *
648  lquery = ( lwork .EQ. -1 ) .OR. ( lrwork .EQ. -1 )
649 *
650  IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
651  $ errest .OR. lsame( joba, 'C' ) )) then
652  info = - 1
653  ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
654  $ ( lsame( jobu, 'W' ) .AND. rsvec .AND. l2tran ) ) ) then
655  info = - 2
656  ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
657  $ ( lsame( jobv, 'W' ) .AND. lsvec .AND. l2tran ) ) ) then
658  info = - 3
659  ELSE IF ( .NOT. ( l2kill .OR. defr ) ) then
660  info = - 4
661  ELSE IF ( .NOT. ( lsame(jobt,'T') .OR. lsame(jobt,'N') ) ) then
662  info = - 5
663  ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) then
664  info = - 6
665  ELSE IF ( m .LT. 0 ) then
666  info = - 7
667  ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) then
668  info = - 8
669  ELSE IF ( lda .LT. m ) then
670  info = - 10
671  ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) then
672  info = - 13
673  ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) then
674  info = - 15
675  else
676 * #:)
677  info = 0
678  END IF
679 *
680  IF ( info .EQ. 0 ) THEN
681 * .. compute the minimal and the optimal workspace lengths
682 * [[The expressions for computing the minimal and the optimal
683 * values of LCWORK, LRWORK are written with a lot of redundancy and
684 * can be simplified. However, this verbose form is useful for
685 * maintenance and modifications of the code.]]
686 *
687 * .. minimal workspace length for CGEQP3 of an M x N matrix,
688 * CGEQRF of an N x N matrix, CGELQF of an N x N matrix,
689 * CUNMLQ for computing N x N matrix, CUNMQR for computing N x N
690 * matrix, CUNMQR for computing M x N matrix, respectively.
691  lwqp3 = n+1
692  lwqrf = max( 1, n )
693  lwlqf = max( 1, n )
694  lwunmlq = max( 1, n )
695  lwunmqr = max( 1, n )
696  lwunmqrm = max( 1, m )
697 * .. minimal workspace length for CPOCON of an N x N matrix
698  lwcon = 2 * n
699 * .. minimal workspace length for CGESVJ of an N x N matrix,
700 * without and with explicit accumulation of Jacobi rotations
701  lwsvdj = max( 2 * n, 1 )
702  lwsvdjv = max( 2 * n, 1 )
703 * .. minimal REAL workspace length for CGEQP3, CPOCON, CGESVJ
704  lrwqp3 = n
705  lrwcon = n
706  lrwsvdj = n
707  IF ( lquery ) THEN
708  CALL cgeqp3( m, n, a, lda, iwork, cdummy, cdummy, -1,
709  $ rdummy, ierr )
710  lwrk_cgeqp3 = cdummy(1)
711  CALL cgeqrf( n, n, a, lda, cdummy, cdummy,-1, ierr )
712  lwrk_cgeqrf = cdummy(1)
713  CALL cgelqf( n, n, a, lda, cdummy, cdummy,-1, ierr )
714  lwrk_cgelqf = cdummy(1)
715  END IF
716  minwrk = 2
717  optwrk = 2
718  miniwrk = n
719  IF ( .NOT. (lsvec .OR. rsvec ) ) then
720 * .. minimal and optimal sizes of the complex workspace if
721 * only the singular values are requested
722  IF ( errest ) THEN
723  minwrk = max( n+lwqp3, n**2+lwcon, n+lwqrf, lwsvdj )
724  else
725  minwrk = max( n+lwqp3, n+lwqrf, lwsvdj )
726  END IF
727  IF ( lquery ) THEN
728  CALL cgesvj( 'L', 'N', 'N', n, n, a, lda, sva, n, v,
729  $ ldv, cdummy, -1, rdummy, -1, ierr )
730  lwrk_cgesvj = cdummy(1)
731  IF ( errest ) THEN
732  optwrk = max( n+lwrk_cgeqp3, n**2+lwcon,
733  $ n+lwrk_cgeqrf, lwrk_cgesvj )
734  else
735  optwrk = max( n+lwrk_cgeqp3, n+lwrk_cgeqrf,
736  $ lwrk_cgesvj )
737  END IF
738  END IF
739  IF ( l2tran .OR. rowpiv ) THEN
740  IF ( errest ) THEN
741  minrwrk = max( 7, 2*m, lrwqp3, lrwcon, lrwsvdj )
742  else
743  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
744  END IF
745  else
746  IF ( errest ) THEN
747  minrwrk = max( 7, lrwqp3, lrwcon, lrwsvdj )
748  else
749  minrwrk = max( 7, lrwqp3, lrwsvdj )
750  END IF
751  END IF
752  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
753  ELSE IF ( rsvec .AND. (.NOT.lsvec) ) then
754 * .. minimal and optimal sizes of the complex workspace if the
755 * singular values and the right singular vectors are requested
756  IF ( errest ) THEN
757  minwrk = max( n+lwqp3, lwcon, lwsvdj, n+lwlqf,
758  $ 2*n+lwqrf, n+lwsvdj, n+lwunmlq )
759  else
760  minwrk = max( n+lwqp3, lwsvdj, n+lwlqf, 2*n+lwqrf,
761  $ n+lwsvdj, n+lwunmlq )
762  END IF
763  IF ( lquery ) then
764  CALL cgesvj( 'L', 'U', 'N', n,n, u, ldu, sva, n, a,
765  $ lda, cdummy, -1, rdummy, -1, ierr )
766  lwrk_cgesvj = cdummy(1)
767  CALL cunmlq( 'L', 'C', n, n, n, a, lda, cdummy,
768  $ v, ldv, cdummy, -1, ierr )
769  lwrk_cunmlq = cdummy(1)
770  IF ( errest ) THEN
771  optwrk = max( n+lwrk_cgeqp3, lwcon, lwrk_cgesvj,
772  $ n+lwrk_cgelqf, 2*n+lwrk_cgeqrf,
773  $ n+lwrk_cgesvj, n+lwrk_cunmlq )
774  else
775  optwrk = max( n+lwrk_cgeqp3, lwrk_cgesvj,n+lwrk_cgelqf,
776  $ 2*n+lwrk_cgeqrf, n+lwrk_cgesvj,
777  $ n+lwrk_cunmlq )
778  END IF
779  END IF
780  IF ( l2tran .OR. rowpiv ) THEN
781  IF ( errest ) THEN
782  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
783  else
784  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
785  END IF
786  else
787  IF ( errest ) THEN
788  minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
789  else
790  minrwrk = max( 7, lrwqp3, lrwsvdj )
791  END IF
792  END IF
793  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
794  ELSE IF ( lsvec .AND. (.NOT.rsvec) ) THEN
795 * .. minimal and optimal sizes of the complex workspace if the
796 * singular values and the left singular vectors are requested
797  IF ( errest ) then
798  minwrk = n + max( lwqp3,lwcon,n+lwqrf,lwsvdj,lwunmqrm )
799  else
800  minwrk = n + max( lwqp3, n+lwqrf, lwsvdj, lwunmqrm )
801  END IF
802  IF ( lquery ) then
803  CALL cgesvj( 'L', 'U', 'N', n,n, u, ldu, sva, n, a,
804  $ lda, cdummy, -1, rdummy, -1, ierr )
805  lwrk_cgesvj = cdummy(1)
806  CALL cunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
807  $ ldu, cdummy, -1, ierr )
808  lwrk_cunmqrm = cdummy(1)
809  IF ( errest ) then
810  optwrk = n + max( lwrk_cgeqp3, lwcon, n+lwrk_cgeqrf,
811  $ lwrk_cgesvj, lwrk_cunmqrm )
812  else
813  optwrk = n + max( lwrk_cgeqp3, n+lwrk_cgeqrf,
814  $ lwrk_cgesvj, lwrk_cunmqrm )
815  END IF
816  END IF
817  IF ( l2tran .OR. rowpiv ) THEN
818  IF ( errest ) THEN
819  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
820  else
821  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
822  END IF
823  else
824  IF ( errest ) THEN
825  minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
826  else
827  minrwrk = max( 7, lrwqp3, lrwsvdj )
828  END IF
829  END IF
830  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
831  else
832 * .. minimal and optimal sizes of the complex workspace if the
833 * full SVD is requested
834  IF ( .NOT. jracc ) THEN
835  IF ( errest ) THEN
836  minwrk = max( n+lwqp3, n+lwcon, 2*n+n**2+lwcon,
837  $ 2*n+lwqrf, 2*n+lwqp3,
838  $ 2*n+n**2+n+lwlqf, 2*n+n**2+n+n**2+lwcon,
839  $ 2*n+n**2+n+lwsvdj, 2*n+n**2+n+lwsvdjv,
840  $ 2*n+n**2+n+lwunmqr,2*n+n**2+n+lwunmlq,
841  $ n+n**2+lwsvdj, n+lwunmqrm )
842  else
843  minwrk = max( n+lwqp3, 2*n+n**2+lwcon,
844  $ 2*n+lwqrf, 2*n+lwqp3,
845  $ 2*n+n**2+n+lwlqf, 2*n+n**2+n+n**2+lwcon,
846  $ 2*n+n**2+n+lwsvdj, 2*n+n**2+n+lwsvdjv,
847  $ 2*n+n**2+n+lwunmqr,2*n+n**2+n+lwunmlq,
848  $ n+n**2+lwsvdj, n+lwunmqrm )
849  END IF
850  miniwrk = miniwrk + n
851  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
852  else
853  IF ( errest ) THEN
854  minwrk = max( n+lwqp3, n+lwcon, 2*n+lwqrf,
855  $ 2*n+n**2+lwsvdjv, 2*n+n**2+n+lwunmqr,
856  $ n+lwunmqrm )
857  else
858  minwrk = max( n+lwqp3, 2*n+lwqrf,
859  $ 2*n+n**2+lwsvdjv, 2*n+n**2+n+lwunmqr,
860  $ n+lwunmqrm )
861  END IF
862  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
863  END IF
864  IF ( lquery ) then
865  CALL cunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
866  $ ldu, cdummy, -1, ierr )
867  lwrk_cunmqrm = cdummy(1)
868  CALL cunmqr( 'L', 'N', n, n, n, a, lda, cdummy, u,
869  $ ldu, cdummy, -1, ierr )
870  lwrk_cunmqr = cdummy(1)
871  IF ( .NOT. jracc ) then
872  CALL cgeqp3( n,n, a, lda, iwork, cdummy,cdummy, -1,
873  $ rdummy, ierr )
874  lwrk_cgeqp3n = cdummy(1)
875  CALL cgesvj( 'L', 'U', 'N', n, n, u, ldu, sva,
876  $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
877  lwrk_cgesvj = cdummy(1)
878  CALL cgesvj( 'U', 'U', 'N', n, n, u, ldu, sva,
879  $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
880  lwrk_cgesvju = cdummy(1)
881  CALL cgesvj( 'L', 'U', 'V', n, n, u, ldu, sva,
882  $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
883  lwrk_cgesvjv = cdummy(1)
884  CALL cunmlq( 'L', 'C', n, n, n, a, lda, cdummy,
885  $ v, ldv, cdummy, -1, ierr )
886  lwrk_cunmlq = cdummy(1)
887  IF ( errest ) THEN
888  optwrk = max( n+lwrk_cgeqp3, n+lwcon,
889  $ 2*n+n**2+lwcon, 2*n+lwrk_cgeqrf,
890  $ 2*n+lwrk_cgeqp3n,
891  $ 2*n+n**2+n+lwrk_cgelqf,
892  $ 2*n+n**2+n+n**2+lwcon,
893  $ 2*n+n**2+n+lwrk_cgesvj,
894  $ 2*n+n**2+n+lwrk_cgesvjv,
895  $ 2*n+n**2+n+lwrk_cunmqr,
896  $ 2*n+n**2+n+lwrk_cunmlq,
897  $ n+n**2+lwrk_cgesvju,
898  $ n+lwrk_cunmqrm )
899  else
900  optwrk = max( n+lwrk_cgeqp3,
901  $ 2*n+n**2+lwcon, 2*n+lwrk_cgeqrf,
902  $ 2*n+lwrk_cgeqp3n,
903  $ 2*n+n**2+n+lwrk_cgelqf,
904  $ 2*n+n**2+n+n**2+lwcon,
905  $ 2*n+n**2+n+lwrk_cgesvj,
906  $ 2*n+n**2+n+lwrk_cgesvjv,
907  $ 2*n+n**2+n+lwrk_cunmqr,
908  $ 2*n+n**2+n+lwrk_cunmlq,
909  $ n+n**2+lwrk_cgesvju,
910  $ n+lwrk_cunmqrm )
911  END IF
912  else
913  CALL cgesvj( 'L', 'U', 'V', n, n, u, ldu, sva,
914  $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
915  lwrk_cgesvjv = cdummy(1)
916  CALL cunmqr( 'L', 'N', n, n, n, cdummy, n, cdummy,
917  $ v, ldv, cdummy, -1, ierr )
918  lwrk_cunmqr = cdummy(1)
919  CALL cunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
920  $ ldu, cdummy, -1, ierr )
921  lwrk_cunmqrm = cdummy(1)
922  IF ( errest ) THEN
923  optwrk = max( n+lwrk_cgeqp3, n+lwcon,
924  $ 2*n+lwrk_cgeqrf, 2*n+n**2,
925  $ 2*n+n**2+lwrk_cgesvjv,
926  $ 2*n+n**2+n+lwrk_cunmqr,n+lwrk_cunmqrm )
927  else
928  optwrk = max( n+lwrk_cgeqp3, 2*n+lwrk_cgeqrf,
929  $ 2*n+n**2, 2*n+n**2+lwrk_cgesvjv,
930  $ 2*n+n**2+n+lwrk_cunmqr,
931  $ n+lwrk_cunmqrm )
932  END IF
933  END IF
934  END IF
935  IF ( l2tran .OR. rowpiv ) THEN
936  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
937  else
938  minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
939  END IF
940  END IF
941  minwrk = max( 2, minwrk )
942  optwrk = max( 2, optwrk )
943  IF ( lwork .LT. minwrk .AND. (.NOT.lquery) ) info = - 17
944  IF ( lrwork .LT. minrwrk .AND. (.NOT.lquery) ) info = - 19
945  END IF
946 *
947  IF ( info .NE. 0 ) then
948 * #:(
949  CALL xerbla( 'CGEJSV', - info )
950  return
951  ELSE IF ( lquery ) then
952  cwork(1) = optwrk
953  cwork(2) = minwrk
954  rwork(1) = minrwrk
955  iwork(1) = max( 4, miniwrk )
956  RETURN
957  END IF
958 *
959 * Quick return for void matrix (Y3K safe)
960 * #:)
961  IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) then
962  iwork(1:4) = 0
963  rwork(1:7) = 0
964  return
965  ENDIF
966 *
967 * Determine whether the matrix U should be M x N or M x M
968 *
969  IF ( lsvec ) then
970  n1 = n
971  IF ( lsame( jobu, 'F' ) ) n1 = m
972  END IF
973 *
974 * Set numerical parameters
975 *
976 *! NOTE: Make sure SLAMCH() does not fail on the target architecture.
977 *
978  epsln = slamch('Epsilon')
979  sfmin = slamch('SafeMinimum')
980  small = sfmin / epsln
981  big = slamch('O')
982 * BIG = ONE / SFMIN
983 *
984 * Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
985 *
986 *(!) If necessary, scale SVA() to protect the largest norm from
987 * overflow. It is possible that this scaling pushes the smallest
988 * column norm left from the underflow threshold (extreme case).
989 *
990  scalem = one / sqrt(REAL(m)*REAL(n))
991  noscal = .true.
992  goscal = .true.
993  DO 1874 p = 1, n
994  aapp = zero
995  aaqq = one
996  CALL classq( m, a(1,p), 1, aapp, aaqq )
997  IF ( aapp .GT. big ) then
998  info = - 9
999  CALL xerbla( 'CGEJSV', -info )
1000  return
1001  END IF
1002  aaqq = sqrt(aaqq)
1003  IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) then
1004  sva(p) = aapp * aaqq
1005  else
1006  noscal = .false.
1007  sva(p) = aapp * ( aaqq * scalem )
1008  IF ( goscal ) then
1009  goscal = .false.
1010  CALL sscal( p-1, scalem, sva, 1 )
1011  END IF
1012  END IF
1013  1874 continue
1014 *
1015  IF ( noscal ) scalem = one
1016 *
1017  aapp = zero
1018  aaqq = big
1019  DO 4781 p = 1, n
1020  aapp = max( aapp, sva(p) )
1021  IF ( sva(p) .NE. zero ) aaqq = min( aaqq, sva(p) )
1022  4781 continue
1023 *
1024 * Quick return for zero M x N matrix
1025 * #:)
1026  IF ( aapp .EQ. zero ) then
1027  IF ( lsvec ) CALL claset( 'G', m, n1, czero, cone, u, ldu )
1028  IF ( rsvec ) CALL claset( 'G', n, n, czero, cone, v, ldv )
1029  rwork(1) = one
1030  rwork(2) = one
1031  IF ( errest ) rwork(3) = one
1032  IF ( lsvec .AND. rsvec ) then
1033  rwork(4) = one
1034  rwork(5) = one
1035  END IF
1036  IF ( l2tran ) then
1037  rwork(6) = zero
1038  rwork(7) = zero
1039  END IF
1040  iwork(1) = 0
1041  iwork(2) = 0
1042  iwork(3) = 0
1043  iwork(4) = -1
1044  return
1045  END IF
1046 *
1047 * Issue warning if denormalized column norms detected. Override the
1048 * high relative accuracy request. Issue licence to kill nonzero columns
1049 * (set them to zero) whose norm is less than sigma_max / BIG (roughly).
1050 * #:(
1051  warning = 0
1052  IF ( aaqq .LE. sfmin ) then
1053  l2rank = .true.
1054  l2kill = .true.
1055  warning = 1
1056  END IF
1057 *
1058 * Quick return for one-column matrix
1059 * #:)
1060  IF ( n .EQ. 1 ) then
1061 *
1062  IF ( lsvec ) then
1063  CALL clascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
1064  CALL clacpy( 'A', m, 1, a, lda, u, ldu )
1065 * computing all M left singular vectors of the M x 1 matrix
1066  IF ( n1 .NE. n ) then
1067  CALL cgeqrf( m, n, u,ldu, cwork, cwork(n+1),lwork-n,ierr )
1068  CALL cungqr( m,n1,1, u,ldu,cwork,cwork(n+1),lwork-n,ierr )
1069  CALL ccopy( m, a(1,1), 1, u(1,1), 1 )
1070  END IF
1071  END IF
1072  IF ( rsvec ) then
1073  v(1,1) = cone
1074  END IF
1075  IF ( sva(1) .LT. (big*scalem) ) then
1076  sva(1) = sva(1) / scalem
1077  scalem = one
1078  END IF
1079  rwork(1) = one / scalem
1080  rwork(2) = one
1081  IF ( sva(1) .NE. zero ) then
1082  iwork(1) = 1
1083  IF ( ( sva(1) / scalem) .GE. sfmin ) then
1084  iwork(2) = 1
1085  else
1086  iwork(2) = 0
1087  END IF
1088  else
1089  iwork(1) = 0
1090  iwork(2) = 0
1091  END IF
1092  iwork(3) = 0
1093  iwork(4) = -1
1094  IF ( errest ) rwork(3) = one
1095  IF ( lsvec .AND. rsvec ) then
1096  rwork(4) = one
1097  rwork(5) = one
1098  END IF
1099  IF ( l2tran ) then
1100  rwork(6) = zero
1101  rwork(7) = zero
1102  END IF
1103  return
1104 *
1105  END IF
1106 *
1107  transp = .false.
1108 *
1109  aatmax = -one
1110  aatmin = big
1111  IF ( rowpiv .OR. l2tran ) then
1112 *
1113 * Compute the row norms, needed to determine row pivoting sequence
1114 * (in the case of heavily row weighted A, row pivoting is strongly
1115 * advised) and to collect information needed to compare the
1116 * structures of A * A^* and A^* * A (in the case L2TRAN.EQ..TRUE.).
1117 *
1118  IF ( l2tran ) then
1119  DO 1950 p = 1, m
1120  xsc = zero
1121  temp1 = one
1122  CALL classq( n, a(p,1), lda, xsc, temp1 )
1123 * CLASSQ gets both the ell_2 and the ell_infinity norm
1124 * in one pass through the vector
1125  rwork(m+p) = xsc * scalem
1126  rwork(p) = xsc * (scalem*sqrt(temp1))
1127  aatmax = max( aatmax, rwork(p) )
1128  IF (rwork(p) .NE. zero)
1129  $ aatmin = min(aatmin,rwork(p))
1130  1950 continue
1131  else
1132  DO 1904 p = 1, m
1133  rwork(m+p) = scalem*abs( a(p,icamax(n,a(p,1),lda)) )
1134  aatmax = max( aatmax, rwork(m+p) )
1135  aatmin = min( aatmin, rwork(m+p) )
1136  1904 continue
1137  END IF
1138 *
1139  END IF
1140 *
1141 * For square matrix A try to determine whether A^* would be better
1142 * input for the preconditioned Jacobi SVD, with faster convergence.
1143 * The decision is based on an O(N) function of the vector of column
1144 * and row norms of A, based on the Shannon entropy. This should give
1145 * the right choice in most cases when the difference actually matters.
1146 * It may fail and pick the slower converging side.
1147 *
1148  entra = zero
1149  entrat = zero
1150  IF ( l2tran ) then
1151 *
1152  xsc = zero
1153  temp1 = one
1154  CALL slassq( n, sva, 1, xsc, temp1 )
1155  temp1 = one / temp1
1156 *
1157  entra = zero
1158  DO 1113 p = 1, n
1159  big1 = ( ( sva(p) / xsc )**2 ) * temp1
1160  IF ( big1 .NE. zero ) entra = entra + big1 * alog(big1)
1161  1113 continue
1162  entra = - entra / alog(REAL(n))
1163 *
1164 * Now, SVA().^2/Trace(A^* * A) is a point in the probability simplex.
1165 * It is derived from the diagonal of A^* * A. Do the same with the
1166 * diagonal of A * A^*, compute the entropy of the corresponding
1167 * probability distribution. Note that A * A^* and A^* * A have the
1168 * same trace.
1169 *
1170  entrat = zero
1171  DO 1114 p = 1, m
1172  big1 = ( ( rwork(p) / xsc )**2 ) * temp1
1173  IF ( big1 .NE. zero ) entrat = entrat + big1 * alog(big1)
1174  1114 continue
1175  entrat = - entrat / alog(REAL(m))
1176 *
1177 * Analyze the entropies and decide A or A^*. Smaller entropy
1178 * usually means better input for the algorithm.
1179 *
1180  transp = ( entrat .LT. entra )
1181 *
1182 * If A^* is better than A, take the adjoint of A. This is allowed
1183 * only for square matrices, M=N.
1184  IF ( transp ) then
1185 * In an optimal implementation, this trivial transpose
1186 * should be replaced with faster transpose.
1187  DO 1115 p = 1, n - 1
1188  a(p,p) = conjg(a(p,p))
1189  DO 1116 q = p + 1, n
1190  ctemp = conjg(a(q,p))
1191  a(q,p) = conjg(a(p,q))
1192  a(p,q) = ctemp
1193  1116 continue
1194  1115 continue
1195  a(n,n) = conjg(a(n,n))
1196  DO 1117 p = 1, n
1197  rwork(m+p) = sva(p)
1198  sva(p) = rwork(p)
1199 * previously computed row 2-norms are now column 2-norms
1200 * of the transposed matrix
1201  1117 continue
1202  temp1 = aapp
1203  aapp = aatmax
1204  aatmax = temp1
1205  temp1 = aaqq
1206  aaqq = aatmin
1207  aatmin = temp1
1208  kill = lsvec
1209  lsvec = rsvec
1210  rsvec = kill
1211  IF ( lsvec ) n1 = n
1212 *
1213  rowpiv = .true.
1214  END IF
1215 *
1216  END IF
1217 * END IF L2TRAN
1218 *
1219 * Scale the matrix so that its maximal singular value remains less
1220 * than SQRT(BIG) -- the matrix is scaled so that its maximal column
1221 * has Euclidean norm equal to SQRT(BIG/N). The only reason to keep
1222 * SQRT(BIG) instead of BIG is the fact that CGEJSV uses LAPACK and
1223 * BLAS routines that, in some implementations, are not capable of
1224 * working in the full interval [SFMIN,BIG] and that they may provoke
1225 * overflows in the intermediate results. If the singular values spread
1226 * from SFMIN to BIG, then CGESVJ will compute them. So, in that case,
1227 * one should use CGESVJ instead of CGEJSV.
1228  big1 = sqrt( big )
1229  temp1 = sqrt( big / REAL(N) )
1230 * >> for future updates: allow bigger range, i.e. the largest column
1231 * will be allowed up to BIG/N and CGESVJ will do the rest. However, for
1232 * this all other (LAPACK) components must allow such a range.
1233 * TEMP1 = BIG/REAL(N)
1234 * TEMP1 = BIG * EPSLN this should 'almost' work with current LAPACK components
1235  CALL slascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
1236  IF ( aaqq .GT. (aapp * sfmin) ) then
1237  aaqq = ( aaqq / aapp ) * temp1
1238  else
1239  aaqq = ( aaqq * temp1 ) / aapp
1240  END IF
1241  temp1 = temp1 * scalem
1242  CALL clascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
1243 *
1244 * To undo scaling at the end of this procedure, multiply the
1245 * computed singular values with USCAL2 / USCAL1.
1246 *
1247  uscal1 = temp1
1248  uscal2 = aapp
1249 *
1250  IF ( l2kill ) then
1251 * L2KILL enforces computation of nonzero singular values in
1252 * the restricted range of condition number of the initial A,
1253 * sigma_max(A) / sigma_min(A) approx. SQRT(BIG)/SQRT(SFMIN).
1254  xsc = sqrt( sfmin )
1255  else
1256  xsc = small
1257 *
1258 * Now, if the condition number of A is too big,
1259 * sigma_max(A) / sigma_min(A) .GT. SQRT(BIG/N) * EPSLN / SFMIN,
1260 * as a precaution measure, the full SVD is computed using CGESVJ
1261 * with accumulated Jacobi rotations. This provides numerically
1262 * more robust computation, at the cost of slightly increased run
1263 * time. Depending on the concrete implementation of BLAS and LAPACK
1264 * (i.e. how they behave in presence of extreme ill-conditioning) the
1265 * implementor may decide to remove this switch.
1266  IF ( ( aaqq.LT.sqrt(sfmin) ) .AND. lsvec .AND. rsvec ) then
1267  jracc = .true.
1268  END IF
1269 *
1270  END IF
1271  IF ( aaqq .LT. xsc ) then
1272  DO 700 p = 1, n
1273  IF ( sva(p) .LT. xsc ) then
1274  CALL claset( 'A', m, 1, czero, czero, a(1,p), lda )
1275  sva(p) = zero
1276  END IF
1277  700 continue
1278  END IF
1279 *
1280 * Preconditioning using QR factorization with pivoting
1281 *
1282  IF ( rowpiv ) then
1283 * Optional row permutation (Bjoerck row pivoting):
1284 * A result by Cox and Higham shows that the Bjoerck's
1285 * row pivoting combined with standard column pivoting
1286 * has similar effect as Powell-Reid complete pivoting.
1287 * The ell-infinity norms of A are made nonincreasing.
1288  IF ( ( lsvec .AND. rsvec ) .AND. .NOT.( jracc ) ) THEN
1289  iwoff = 2*n
1290  else
1291  iwoff = n
1292  END IF
1293  DO 1952 p = 1, m - 1
1294  q = isamax( m-p+1, rwork(m+p), 1 ) + p - 1
1295  iwork(iwoff+p) = q
1296  IF ( p .NE. q ) then
1297  temp1 = rwork(m+p)
1298  rwork(m+p) = rwork(m+q)
1299  rwork(m+q) = temp1
1300  END IF
1301  1952 continue
1302  CALL claswp( n, a, lda, 1, m-1, iwork(iwoff+1), 1 )
1303  END IF
1304 *
1305 * End of the preparation phase (scaling, optional sorting and
1306 * transposing, optional flushing of small columns).
1307 *
1308 * Preconditioning
1309 *
1310 * If the full SVD is needed, the right singular vectors are computed
1311 * from a matrix equation, and for that we need theoretical analysis
1312 * of the Businger-Golub pivoting. So we use CGEQP3 as the first RR QRF.
1313 * In all other cases the first RR QRF can be chosen by other criteria
1314 * (eg speed by replacing global with restricted window pivoting, such
1315 * as in xGEQPX from TOMS # 782). Good results will be obtained using
1316 * xGEQPX with properly (!) chosen numerical parameters.
1317 * Any improvement of CGEQP3 improves overal performance of CGEJSV.
1318 *
1319 * A * P1 = Q1 * [ R1^* 0]^*:
1320  DO 1963 p = 1, n
1321 * .. all columns are free columns
1322  iwork(p) = 0
1323  1963 continue
1324  CALL cgeqp3( m, n, a, lda, iwork, cwork, cwork(n+1), lwork-n,
1325  $ rwork, ierr )
1326 *
1327 * The upper triangular matrix R1 from the first QRF is inspected for
1328 * rank deficiency and possibilities for deflation, or possible
1329 * ill-conditioning. Depending on the user specified flag L2RANK,
1330 * the procedure explores possibilities to reduce the numerical
1331 * rank by inspecting the computed upper triangular factor. If
1332 * L2RANK or L2ABER are up, then CGEJSV will compute the SVD of
1333 * A + dA, where ||dA|| <= f(M,N)*EPSLN.
1334 *
1335  nr = 1
1336  IF ( l2aber ) then
1337 * Standard absolute error bound suffices. All sigma_i with
1338 * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
1339 * agressive enforcement of lower numerical rank by introducing a
1340 * backward error of the order of N*EPSLN*||A||.
1341  temp1 = sqrt(REAL(n))*epsln
1342  DO 3001 p = 2, n
1343  IF ( abs(a(p,p)) .GE. (temp1*abs(a(1,1))) ) then
1344  nr = nr + 1
1345  else
1346  GO TO 3002
1347  END IF
1348  3001 continue
1349  3002 continue
1350  ELSE IF ( l2rank ) then
1351 * .. similarly as above, only slightly more gentle (less agressive).
1352 * Sudden drop on the diagonal of R1 is used as the criterion for
1353 * close-to-rank-defficient.
1354  temp1 = sqrt(sfmin)
1355  DO 3401 p = 2, n
1356  IF ( ( abs(a(p,p)) .LT. (epsln*abs(a(p-1,p-1))) ) .OR.
1357  $ ( abs(a(p,p)) .LT. small ) .OR.
1358  $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3402
1359  nr = nr + 1
1360  3401 continue
1361  3402 continue
1362 *
1363  else
1364 * The goal is high relative accuracy. However, if the matrix
1365 * has high scaled condition number the relative accuracy is in
1366 * general not feasible. Later on, a condition number estimator
1367 * will be deployed to estimate the scaled condition number.
1368 * Here we just remove the underflowed part of the triangular
1369 * factor. This prevents the situation in which the code is
1370 * working hard to get the accuracy not warranted by the data.
1371  temp1 = sqrt(sfmin)
1372  DO 3301 p = 2, n
1373  IF ( ( abs(a(p,p)) .LT. small ) .OR.
1374  $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3302
1375  nr = nr + 1
1376  3301 continue
1377  3302 continue
1378 *
1379  END IF
1380 *
1381  almort = .false.
1382  IF ( nr .EQ. n ) then
1383  maxprj = one
1384  DO 3051 p = 2, n
1385  temp1 = abs(a(p,p)) / sva(iwork(p))
1386  maxprj = min( maxprj, temp1 )
1387  3051 continue
1388  IF ( maxprj**2 .GE. one - REAL(n)*epsln ) almort = .true.
1389  END IF
1390 *
1391 *
1392  sconda = - one
1393  condr1 = - one
1394  condr2 = - one
1395 *
1396  IF ( errest ) then
1397  IF ( n .EQ. nr ) then
1398  IF ( rsvec ) then
1399 * .. V is available as workspace
1400  CALL clacpy( 'U', n, n, a, lda, v, ldv )
1401  DO 3053 p = 1, n
1402  temp1 = sva(iwork(p))
1403  CALL csscal( p, one/temp1, v(1,p), 1 )
1404  3053 continue
1405  IF ( lsvec )then
1406  CALL cpocon( 'U', n, v, ldv, one, temp1,
1407  $ cwork(n+1), rwork, ierr )
1408  else
1409  CALL cpocon( 'U', n, v, ldv, one, temp1,
1410  $ cwork, rwork, ierr )
1411  END IF
1412 *
1413  ELSE IF ( lsvec ) then
1414 * .. U is available as workspace
1415  CALL clacpy( 'U', n, n, a, lda, u, ldu )
1416  DO 3054 p = 1, n
1417  temp1 = sva(iwork(p))
1418  CALL csscal( p, one/temp1, u(1,p), 1 )
1419  3054 continue
1420  CALL cpocon( 'U', n, u, ldu, one, temp1,
1421  $ cwork(n+1), rwork, ierr )
1422  else
1423  CALL clacpy( 'U', n, n, a, lda, cwork, n )
1424 *[] CALL CLACPY( 'U', N, N, A, LDA, CWORK(N+1), N )
1425 * Change: here index shifted by N to the left, CWORK(1:N)
1426 * not needed for SIGMA only computation
1427  DO 3052 p = 1, n
1428  temp1 = sva(iwork(p))
1429 *[] CALL CSSCAL( p, ONE/TEMP1, CWORK(N+(p-1)*N+1), 1 )
1430  CALL csscal( p, one/temp1, cwork((p-1)*n+1), 1 )
1431  3052 continue
1432 * .. the columns of R are scaled to have unit Euclidean lengths.
1433 *[] CALL CPOCON( 'U', N, CWORK(N+1), N, ONE, TEMP1,
1434 *[] $ CWORK(N+N*N+1), RWORK, IERR )
1435  CALL cpocon( 'U', n, cwork, n, one, temp1,
1436  $ cwork(n*n+1), rwork, ierr )
1437 *
1438  END IF
1439  IF ( temp1 .NE. zero ) THEN
1440  sconda = one / sqrt(temp1)
1441  else
1442  sconda = - one
1443  END IF
1444 * SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
1445 * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1446  else
1447  sconda = - one
1448  END IF
1449  END IF
1450 *
1451  l2pert = l2pert .AND. ( abs( a(1,1)/a(nr,nr) ) .GT. sqrt(big1) )
1452 * If there is no violent scaling, artificial perturbation is not needed.
1453 *
1454 * Phase 3:
1455 *
1456  IF ( .NOT. ( rsvec .OR. lsvec ) ) then
1457 *
1458 * Singular Values only
1459 *
1460 * .. transpose A(1:NR,1:N)
1461  DO 1946 p = 1, min( n-1, nr )
1462  CALL ccopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1463  CALL clacgv( n-p+1, a(p,p), 1 )
1464  1946 continue
1465  IF ( nr .EQ. n ) a(n,n) = conjg(a(n,n))
1466 *
1467 * The following two DO-loops introduce small relative perturbation
1468 * into the strict upper triangle of the lower triangular matrix.
1469 * Small entries below the main diagonal are also changed.
1470 * This modification is useful if the computing environment does not
1471 * provide/allow FLUSH TO ZERO underflow, for it prevents many
1472 * annoying denormalized numbers in case of strongly scaled matrices.
1473 * The perturbation is structured so that it does not introduce any
1474 * new perturbation of the singular values, and it does not destroy
1475 * the job done by the preconditioner.
1476 * The licence for this perturbation is in the variable L2PERT, which
1477 * should be .FALSE. if FLUSH TO ZERO underflow is active.
1478 *
1479  IF ( .NOT. almort ) then
1480 *
1481  IF ( l2pert ) then
1482 * XSC = SQRT(SMALL)
1483  xsc = epsln / REAL(n)
1484  DO 4947 q = 1, nr
1485  ctemp = cmplx(xsc*abs(a(q,q)),zero)
1486  DO 4949 p = 1, n
1487  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1488  $ .OR. ( p .LT. q ) )
1489 * $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1490  $ a(p,q) = ctemp
1491  4949 continue
1492  4947 continue
1493  else
1494  CALL claset( 'U', nr-1,nr-1, czero,czero, a(1,2),lda )
1495  END IF
1496 *
1497 * .. second preconditioning using the QR factorization
1498 *
1499  CALL cgeqrf( n,nr, a,lda, cwork, cwork(n+1),lwork-n, ierr )
1500 *
1501 * .. and transpose upper to lower triangular
1502  DO 1948 p = 1, nr - 1
1503  CALL ccopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1504  CALL clacgv( nr-p+1, a(p,p), 1 )
1505  1948 continue
1506 *
1507  END IF
1508 *
1509 * Row-cyclic Jacobi SVD algorithm with column pivoting
1510 *
1511 * .. again some perturbation (a "background noise") is added
1512 * to drown denormals
1513  IF ( l2pert ) then
1514 * XSC = SQRT(SMALL)
1515  xsc = epsln / REAL(n)
1516  DO 1947 q = 1, nr
1517  ctemp = cmplx(xsc*abs(a(q,q)),zero)
1518  DO 1949 p = 1, nr
1519  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1520  $ .OR. ( p .LT. q ) )
1521 * $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1522  $ a(p,q) = ctemp
1523  1949 continue
1524  1947 continue
1525  else
1526  CALL claset( 'U', nr-1, nr-1, czero, czero, a(1,2), lda )
1527  END IF
1528 *
1529 * .. and one-sided Jacobi rotations are started on a lower
1530 * triangular matrix (plus perturbation which is ignored in
1531 * the part which destroys triangular form (confusing?!))
1532 *
1533  CALL cgesvj( 'L', 'N', 'N', nr, nr, a, lda, sva,
1534  $ n, v, ldv, cwork, lwork, rwork, lrwork, info )
1535 *
1536  scalem = rwork(1)
1537  numrank = nint(rwork(2))
1538 *
1539 *
1540  ELSE IF ( ( rsvec .AND. ( .NOT. lsvec ) .AND. ( .NOT. jracc ) )
1541  $ .OR.
1542  $ ( jracc .AND. ( .NOT. lsvec ) .AND. ( nr .NE. n ) ) ) then
1543 *
1544 * -> Singular Values and Right Singular Vectors <-
1545 *
1546  IF ( almort ) then
1547 *
1548 * .. in this case NR equals N
1549  DO 1998 p = 1, nr
1550  CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1551  CALL clacgv( n-p+1, v(p,p), 1 )
1552  1998 continue
1553  CALL claset( 'U', nr-1,nr-1, czero, czero, v(1,2), ldv )
1554 *
1555  CALL cgesvj( 'L','U','N', n, nr, v, ldv, sva, nr, a, lda,
1556  $ cwork, lwork, rwork, lrwork, info )
1557  scalem = rwork(1)
1558  numrank = nint(rwork(2))
1559 
1560  else
1561 *
1562 * .. two more QR factorizations ( one QRF is not enough, two require
1563 * accumulated product of Jacobi rotations, three are perfect )
1564 *
1565  CALL claset( 'L', nr-1,nr-1, czero, czero, a(2,1), lda )
1566  CALL cgelqf( nr,n, a, lda, cwork, cwork(n+1), lwork-n, ierr)
1567  CALL clacpy( 'L', nr, nr, a, lda, v, ldv )
1568  CALL claset( 'U', nr-1,nr-1, czero, czero, v(1,2), ldv )
1569  CALL cgeqrf( nr, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1570  $ lwork-2*n, ierr )
1571  DO 8998 p = 1, nr
1572  CALL ccopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1573  CALL clacgv( nr-p+1, v(p,p), 1 )
1574  8998 continue
1575  CALL claset('U', nr-1, nr-1, czero, czero, v(1,2), ldv)
1576 *
1577  CALL cgesvj( 'L', 'U','N', nr, nr, v,ldv, sva, nr, u,
1578  $ ldu, cwork(n+1), lwork-n, rwork, lrwork, info )
1579  scalem = rwork(1)
1580  numrank = nint(rwork(2))
1581  IF ( nr .LT. n ) then
1582  CALL claset( 'A',n-nr, nr, czero,czero, v(nr+1,1), ldv )
1583  CALL claset( 'A',nr, n-nr, czero,czero, v(1,nr+1), ldv )
1584  CALL claset( 'A',n-nr,n-nr,czero,cone, v(nr+1,nr+1),ldv )
1585  END IF
1586 *
1587  CALL cunmlq( 'L', 'C', n, n, nr, a, lda, cwork,
1588  $ v, ldv, cwork(n+1), lwork-n, ierr )
1589 *
1590  END IF
1591 * .. permute the rows of V
1592 * DO 8991 p = 1, N
1593 * CALL CCOPY( N, V(p,1), LDV, A(IWORK(p),1), LDA )
1594 * 8991 CONTINUE
1595 * CALL CLACPY( 'All', N, N, A, LDA, V, LDV )
1596  CALL clapmr( .false., n, n, v, ldv, iwork )
1597 *
1598  IF ( transp ) then
1599  CALL clacpy( 'A', n, n, v, ldv, u, ldu )
1600  END IF
1601 *
1602  ELSE IF ( jracc .AND. (.NOT. lsvec) .AND. ( nr.EQ. n ) ) THEN
1603 *
1604  CALL claset( 'L', n-1,n-1, czero, czero, a(2,1), lda )
1605 *
1606  CALL cgesvj( 'U','N','V', n, n, a, lda, sva, n, v, ldv,
1607  $ cwork, lwork, rwork, lrwork, info )
1608  scalem = rwork(1)
1609  numrank = nint(rwork(2))
1610  CALL clapmr( .false., n, n, v, ldv, iwork )
1611 *
1612  ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) then
1613 *
1614 * .. Singular Values and Left Singular Vectors ..
1615 *
1616 * .. second preconditioning step to avoid need to accumulate
1617 * Jacobi rotations in the Jacobi iterations.
1618  DO 1965 p = 1, nr
1619  CALL ccopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1620  CALL clacgv( n-p+1, u(p,p), 1 )
1621  1965 continue
1622  CALL claset( 'U', nr-1, nr-1, czero, czero, u(1,2), ldu )
1623 *
1624  CALL cgeqrf( n, nr, u, ldu, cwork(n+1), cwork(2*n+1),
1625  $ lwork-2*n, ierr )
1626 *
1627  DO 1967 p = 1, nr - 1
1628  CALL ccopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1629  CALL clacgv( n-p+1, u(p,p), 1 )
1630  1967 continue
1631  CALL claset( 'U', nr-1, nr-1, czero, czero, u(1,2), ldu )
1632 *
1633  CALL cgesvj( 'L', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1634  $ lda, cwork(n+1), lwork-n, rwork, lrwork, info )
1635  scalem = rwork(1)
1636  numrank = nint(rwork(2))
1637 *
1638  IF ( nr .LT. m ) then
1639  CALL claset( 'A', m-nr, nr,czero, czero, u(nr+1,1), ldu )
1640  IF ( nr .LT. n1 ) then
1641  CALL claset( 'A',nr, n1-nr, czero, czero, u(1,nr+1),ldu )
1642  CALL claset( 'A',m-nr,n1-nr,czero,cone,u(nr+1,nr+1),ldu )
1643  END IF
1644  END IF
1645 *
1646  CALL cunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
1647  $ ldu, cwork(n+1), lwork-n, ierr )
1648 *
1649  IF ( rowpiv )
1650  $ CALL claswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
1651 *
1652  DO 1974 p = 1, n1
1653  xsc = one / scnrm2( m, u(1,p), 1 )
1654  CALL csscal( m, xsc, u(1,p), 1 )
1655  1974 continue
1656 *
1657  IF ( transp ) then
1658  CALL clacpy( 'A', n, n, u, ldu, v, ldv )
1659  END IF
1660 *
1661  else
1662 *
1663 * .. Full SVD ..
1664 *
1665  IF ( .NOT. jracc ) then
1666 *
1667  IF ( .NOT. almort ) then
1668 *
1669 * Second Preconditioning Step (QRF [with pivoting])
1670 * Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1671 * equivalent to an LQF CALL. Since in many libraries the QRF
1672 * seems to be better optimized than the LQF, we do explicit
1673 * transpose and use the QRF. This is subject to changes in an
1674 * optimized implementation of CGEJSV.
1675 *
1676  DO 1968 p = 1, nr
1677  CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1678  CALL clacgv( n-p+1, v(p,p), 1 )
1679  1968 continue
1680 *
1681 * .. the following two loops perturb small entries to avoid
1682 * denormals in the second QR factorization, where they are
1683 * as good as zeros. This is done to avoid painfully slow
1684 * computation with denormals. The relative size of the perturbation
1685 * is a parameter that can be changed by the implementer.
1686 * This perturbation device will be obsolete on machines with
1687 * properly implemented arithmetic.
1688 * To switch it off, set L2PERT=.FALSE. To remove it from the
1689 * code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1690 * The following two loops should be blocked and fused with the
1691 * transposed copy above.
1692 *
1693  IF ( l2pert ) then
1694  xsc = sqrt(small)
1695  DO 2969 q = 1, nr
1696  ctemp = cmplx(xsc*abs( v(q,q) ),zero)
1697  DO 2968 p = 1, n
1698  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1699  $ .OR. ( p .LT. q ) )
1700 * $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1701  $ v(p,q) = ctemp
1702  IF ( p .LT. q ) v(p,q) = - v(p,q)
1703  2968 continue
1704  2969 continue
1705  else
1706  CALL claset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
1707  END IF
1708 *
1709 * Estimate the row scaled condition number of R1
1710 * (If R1 is rectangular, N > NR, then the condition number
1711 * of the leading NR x NR submatrix is estimated.)
1712 *
1713  CALL clacpy( 'L', nr, nr, v, ldv, cwork(2*n+1), nr )
1714  DO 3950 p = 1, nr
1715  temp1 = scnrm2(nr-p+1,cwork(2*n+(p-1)*nr+p),1)
1716  CALL csscal(nr-p+1,one/temp1,cwork(2*n+(p-1)*nr+p),1)
1717  3950 continue
1718  CALL cpocon('L',nr,cwork(2*n+1),nr,one,temp1,
1719  $ cwork(2*n+nr*nr+1),rwork,ierr)
1720  condr1 = one / sqrt(temp1)
1721 * .. here need a second oppinion on the condition number
1722 * .. then assume worst case scenario
1723 * R1 is OK for inverse <=> CONDR1 .LT. REAL(N)
1724 * more conservative <=> CONDR1 .LT. SQRT(REAL(N))
1725 *
1726  cond_ok = sqrt(sqrt(REAL(nr)))
1727 *[TP] COND_OK is a tuning parameter.
1728 *
1729  IF ( condr1 .LT. cond_ok ) then
1730 * .. the second QRF without pivoting. Note: in an optimized
1731 * implementation, this QRF should be implemented as the QRF
1732 * of a lower triangular matrix.
1733 * R1^* = Q2 * R2
1734  CALL cgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1735  $ lwork-2*n, ierr )
1736 *
1737  IF ( l2pert ) then
1738  xsc = sqrt(small)/epsln
1739  DO 3959 p = 2, nr
1740  DO 3958 q = 1, p - 1
1741  ctemp=cmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1742  $ zero)
1743  IF ( abs(v(q,p)) .LE. temp1 )
1744 * $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1745  $ v(q,p) = ctemp
1746  3958 continue
1747  3959 continue
1748  END IF
1749 *
1750  IF ( nr .NE. n )
1751  $ CALL clacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1752 * .. save ...
1753 *
1754 * .. this transposed copy should be better than naive
1755  DO 1969 p = 1, nr - 1
1756  CALL ccopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1757  CALL clacgv(nr-p+1, v(p,p), 1 )
1758  1969 continue
1759  v(nr,nr)=conjg(v(nr,nr))
1760 *
1761  condr2 = condr1
1762 *
1763  else
1764 *
1765 * .. ill-conditioned case: second QRF with pivoting
1766 * Note that windowed pivoting would be equaly good
1767 * numerically, and more run-time efficient. So, in
1768 * an optimal implementation, the next call to CGEQP3
1769 * should be replaced with eg. CALL CGEQPX (ACM TOMS #782)
1770 * with properly (carefully) chosen parameters.
1771 *
1772 * R1^* * P2 = Q2 * R2
1773  DO 3003 p = 1, nr
1774  iwork(n+p) = 0
1775  3003 continue
1776  CALL cgeqp3( n, nr, v, ldv, iwork(n+1), cwork(n+1),
1777  $ cwork(2*n+1), lwork-2*n, rwork, ierr )
1778 ** CALL CGEQRF( N, NR, V, LDV, CWORK(N+1), CWORK(2*N+1),
1779 ** $ LWORK-2*N, IERR )
1780  IF ( l2pert ) then
1781  xsc = sqrt(small)
1782  DO 3969 p = 2, nr
1783  DO 3968 q = 1, p - 1
1784  ctemp=cmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1785  $ zero)
1786  IF ( abs(v(q,p)) .LE. temp1 )
1787 * $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1788  $ v(q,p) = ctemp
1789  3968 continue
1790  3969 continue
1791  END IF
1792 *
1793  CALL clacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1794 *
1795  IF ( l2pert ) then
1796  xsc = sqrt(small)
1797  DO 8970 p = 2, nr
1798  DO 8971 q = 1, p - 1
1799  ctemp=cmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1800  $ zero)
1801 * V(p,q) = - TEMP1*( V(q,p) / ABS(V(q,p)) )
1802  v(p,q) = - ctemp
1803  8971 continue
1804  8970 continue
1805  else
1806  CALL claset( 'L',nr-1,nr-1,czero,czero,v(2,1),ldv )
1807  END IF
1808 * Now, compute R2 = L3 * Q3, the LQ factorization.
1809  CALL cgelqf( nr, nr, v, ldv, cwork(2*n+n*nr+1),
1810  $ cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1811 * .. and estimate the condition number
1812  CALL clacpy( 'L',nr,nr,v,ldv,cwork(2*n+n*nr+nr+1),nr )
1813  DO 4950 p = 1, nr
1814  temp1 = scnrm2( p, cwork(2*n+n*nr+nr+p), nr )
1815  CALL csscal( p, one/temp1, cwork(2*n+n*nr+nr+p), nr )
1816  4950 continue
1817  CALL cpocon( 'L',nr,cwork(2*n+n*nr+nr+1),nr,one,temp1,
1818  $ cwork(2*n+n*nr+nr+nr*nr+1),rwork,ierr )
1819  condr2 = one / sqrt(temp1)
1820 *
1821 *
1822  IF ( condr2 .GE. cond_ok ) then
1823 * .. save the Householder vectors used for Q3
1824 * (this overwrittes the copy of R2, as it will not be
1825 * needed in this branch, but it does not overwritte the
1826 * Huseholder vectors of Q2.).
1827  CALL clacpy( 'U', nr, nr, v, ldv, cwork(2*n+1), n )
1828 * .. and the rest of the information on Q3 is in
1829 * WORK(2*N+N*NR+1:2*N+N*NR+N)
1830  END IF
1831 *
1832  END IF
1833 *
1834  IF ( l2pert ) then
1835  xsc = sqrt(small)
1836  DO 4968 q = 2, nr
1837  ctemp = xsc * v(q,q)
1838  DO 4969 p = 1, q - 1
1839 * V(p,q) = - TEMP1*( V(p,q) / ABS(V(p,q)) )
1840  v(p,q) = - ctemp
1841  4969 continue
1842  4968 continue
1843  else
1844  CALL claset( 'U', nr-1,nr-1, czero,czero, v(1,2), ldv )
1845  END IF
1846 *
1847 * Second preconditioning finished; continue with Jacobi SVD
1848 * The input matrix is lower trinagular.
1849 *
1850 * Recover the right singular vectors as solution of a well
1851 * conditioned triangular matrix equation.
1852 *
1853  IF ( condr1 .LT. cond_ok ) then
1854 *
1855  CALL cgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u, ldu,
1856  $ cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,rwork,
1857  $ lrwork, info )
1858  scalem = rwork(1)
1859  numrank = nint(rwork(2))
1860  DO 3970 p = 1, nr
1861  CALL ccopy( nr, v(1,p), 1, u(1,p), 1 )
1862  CALL csscal( nr, sva(p), v(1,p), 1 )
1863  3970 continue
1864 
1865 * .. pick the right matrix equation and solve it
1866 *
1867  IF ( nr .EQ. n ) then
1868 * :)) .. best case, R1 is inverted. The solution of this matrix
1869 * equation is Q2*V2 = the product of the Jacobi rotations
1870 * used in CGESVJ, premultiplied with the orthogonal matrix
1871 * from the second QR factorization.
1872  CALL ctrsm('L','U','N','N', nr,nr,cone, a,lda, v,ldv)
1873  else
1874 * .. R1 is well conditioned, but non-square. Adjoint of R2
1875 * is inverted to get the product of the Jacobi rotations
1876 * used in CGESVJ. The Q-factor from the second QR
1877 * factorization is then built in explicitly.
1878  CALL ctrsm('L','U','C','N',nr,nr,cone,cwork(2*n+1),
1879  $ n,v,ldv)
1880  IF ( nr .LT. n ) then
1881  CALL claset('A',n-nr,nr,czero,czero,v(nr+1,1),ldv)
1882  CALL claset('A',nr,n-nr,czero,czero,v(1,nr+1),ldv)
1883  CALL claset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1884  END IF
1885  CALL cunmqr('L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1886  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1887  END IF
1888 *
1889  ELSE IF ( condr2 .LT. cond_ok ) then
1890 *
1891 * The matrix R2 is inverted. The solution of the matrix equation
1892 * is Q3^* * V3 = the product of the Jacobi rotations (appplied to
1893 * the lower triangular L3 from the LQ factorization of
1894 * R2=L3*Q3), pre-multiplied with the transposed Q3.
1895  CALL cgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1896  $ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1897  $ rwork, lrwork, info )
1898  scalem = rwork(1)
1899  numrank = nint(rwork(2))
1900  DO 3870 p = 1, nr
1901  CALL ccopy( nr, v(1,p), 1, u(1,p), 1 )
1902  CALL csscal( nr, sva(p), u(1,p), 1 )
1903  3870 continue
1904  CALL ctrsm('L','U','N','N',nr,nr,cone,cwork(2*n+1),n,
1905  $ u,ldu)
1906 * .. apply the permutation from the second QR factorization
1907  DO 873 q = 1, nr
1908  DO 872 p = 1, nr
1909  cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1910  872 continue
1911  DO 874 p = 1, nr
1912  u(p,q) = cwork(2*n+n*nr+nr+p)
1913  874 continue
1914  873 continue
1915  IF ( nr .LT. n ) then
1916  CALL claset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1917  CALL claset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1918  CALL claset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1919  END IF
1920  CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1921  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1922  else
1923 * Last line of defense.
1924 * #:( This is a rather pathological case: no scaled condition
1925 * improvement after two pivoted QR factorizations. Other
1926 * possibility is that the rank revealing QR factorization
1927 * or the condition estimator has failed, or the COND_OK
1928 * is set very close to ONE (which is unnecessary). Normally,
1929 * this branch should never be executed, but in rare cases of
1930 * failure of the RRQR or condition estimator, the last line of
1931 * defense ensures that CGEJSV completes the task.
1932 * Compute the full SVD of L3 using CGESVJ with explicit
1933 * accumulation of Jacobi rotations.
1934  CALL cgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1935  $ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1936  $ rwork, lrwork, info )
1937  scalem = rwork(1)
1938  numrank = nint(rwork(2))
1939  IF ( nr .LT. n ) then
1940  CALL claset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1941  CALL claset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1942  CALL claset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1943  END IF
1944  CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1945  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1946 *
1947  CALL cunmlq( 'L', 'C', nr, nr, nr, cwork(2*n+1), n,
1948  $ cwork(2*n+n*nr+1), u, ldu, cwork(2*n+n*nr+nr+1),
1949  $ lwork-2*n-n*nr-nr, ierr )
1950  DO 773 q = 1, nr
1951  DO 772 p = 1, nr
1952  cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1953  772 continue
1954  DO 774 p = 1, nr
1955  u(p,q) = cwork(2*n+n*nr+nr+p)
1956  774 continue
1957  773 continue
1958 *
1959  END IF
1960 *
1961 * Permute the rows of V using the (column) permutation from the
1962 * first QRF. Also, scale the columns to make them unit in
1963 * Euclidean norm. This applies to all cases.
1964 *
1965  temp1 = sqrt(REAL(n)) * epsln
1966  DO 1972 q = 1, n
1967  DO 972 p = 1, n
1968  cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
1969  972 continue
1970  DO 973 p = 1, n
1971  v(p,q) = cwork(2*n+n*nr+nr+p)
1972  973 continue
1973  xsc = one / scnrm2( n, v(1,q), 1 )
1974  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1975  $ CALL csscal( n, xsc, v(1,q), 1 )
1976  1972 continue
1977 * At this moment, V contains the right singular vectors of A.
1978 * Next, assemble the left singular vector matrix U (M x N).
1979  IF ( nr .LT. m ) then
1980  CALL claset('A', m-nr, nr, czero, czero, u(nr+1,1), ldu)
1981  IF ( nr .LT. n1 ) then
1982  CALL claset('A',nr,n1-nr,czero,czero,u(1,nr+1),ldu)
1983  CALL claset('A',m-nr,n1-nr,czero,cone,
1984  $ u(nr+1,nr+1),ldu)
1985  END IF
1986  END IF
1987 *
1988 * The Q matrix from the first QRF is built into the left singular
1989 * matrix U. This applies to all cases.
1990 *
1991  CALL cunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
1992  $ ldu, cwork(n+1), lwork-n, ierr )
1993 
1994 * The columns of U are normalized. The cost is O(M*N) flops.
1995  temp1 = sqrt(REAL(m)) * epsln
1996  DO 1973 p = 1, nr
1997  xsc = one / scnrm2( m, u(1,p), 1 )
1998  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1999  $ CALL csscal( m, xsc, u(1,p), 1 )
2000  1973 continue
2001 *
2002 * If the initial QRF is computed with row pivoting, the left
2003 * singular vectors must be adjusted.
2004 *
2005  IF ( rowpiv )
2006  $ CALL claswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2007 *
2008  else
2009 *
2010 * .. the initial matrix A has almost orthogonal columns and
2011 * the second QRF is not needed
2012 *
2013  CALL clacpy( 'U', n, n, a, lda, cwork(n+1), n )
2014  IF ( l2pert ) then
2015  xsc = sqrt(small)
2016  DO 5970 p = 2, n
2017  ctemp = xsc * cwork( n + (p-1)*n + p )
2018  DO 5971 q = 1, p - 1
2019 * CWORK(N+(q-1)*N+p)=-TEMP1 * ( CWORK(N+(p-1)*N+q) /
2020 * $ ABS(CWORK(N+(p-1)*N+q)) )
2021  cwork(n+(q-1)*n+p)=-ctemp
2022  5971 continue
2023  5970 continue
2024  else
2025  CALL claset( 'L',n-1,n-1,czero,czero,cwork(n+2),n )
2026  END IF
2027 *
2028  CALL cgesvj( 'U', 'U', 'N', n, n, cwork(n+1), n, sva,
2029  $ n, u, ldu, cwork(n+n*n+1), lwork-n-n*n, rwork, lrwork,
2030  $ info )
2031 *
2032  scalem = rwork(1)
2033  numrank = nint(rwork(2))
2034  DO 6970 p = 1, n
2035  CALL ccopy( n, cwork(n+(p-1)*n+1), 1, u(1,p), 1 )
2036  CALL csscal( n, sva(p), cwork(n+(p-1)*n+1), 1 )
2037  6970 continue
2038 *
2039  CALL ctrsm( 'L', 'U', 'N', 'N', n, n,
2040  $ cone, a, lda, cwork(n+1), n )
2041  DO 6972 p = 1, n
2042  CALL ccopy( n, cwork(n+p), n, v(iwork(p),1), ldv )
2043  6972 continue
2044  temp1 = sqrt(REAL(n))*epsln
2045  DO 6971 p = 1, n
2046  xsc = one / scnrm2( n, v(1,p), 1 )
2047  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2048  $ CALL csscal( n, xsc, v(1,p), 1 )
2049  6971 continue
2050 *
2051 * Assemble the left singular vector matrix U (M x N).
2052 *
2053  IF ( n .LT. m ) then
2054  CALL claset( 'A', m-n, n, czero, czero, u(n+1,1), ldu )
2055  IF ( n .LT. n1 ) then
2056  CALL claset('A',n, n1-n, czero, czero, u(1,n+1),ldu)
2057  CALL claset( 'A',m-n,n1-n, czero, cone,u(n+1,n+1),ldu)
2058  END IF
2059  END IF
2060  CALL cunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
2061  $ ldu, cwork(n+1), lwork-n, ierr )
2062  temp1 = sqrt(REAL(m))*epsln
2063  DO 6973 p = 1, n1
2064  xsc = one / scnrm2( m, u(1,p), 1 )
2065  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2066  $ CALL csscal( m, xsc, u(1,p), 1 )
2067  6973 continue
2068 *
2069  IF ( rowpiv )
2070  $ CALL claswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2071 *
2072  END IF
2073 *
2074 * end of the >> almost orthogonal case << in the full SVD
2075 *
2076  else
2077 *
2078 * This branch deploys a preconditioned Jacobi SVD with explicitly
2079 * accumulated rotations. It is included as optional, mainly for
2080 * experimental purposes. It does perfom well, and can also be used.
2081 * In this implementation, this branch will be automatically activated
2082 * if the condition number sigma_max(A) / sigma_min(A) is predicted
2083 * to be greater than the overflow threshold. This is because the
2084 * a posteriori computation of the singular vectors assumes robust
2085 * implementation of BLAS and some LAPACK procedures, capable of working
2086 * in presence of extreme values, e.g. when the singular values spread from
2087 * the underflow to the overflow threshold.
2088 *
2089  DO 7968 p = 1, nr
2090  CALL ccopy( n-p+1, a(p,p), lda, v(p,p), 1 )
2091  CALL clacgv( n-p+1, v(p,p), 1 )
2092  7968 continue
2093 *
2094  IF ( l2pert ) then
2095  xsc = sqrt(small/epsln)
2096  DO 5969 q = 1, nr
2097  ctemp = cmplx(xsc*abs( v(q,q) ),zero)
2098  DO 5968 p = 1, n
2099  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
2100  $ .OR. ( p .LT. q ) )
2101 * $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
2102  $ v(p,q) = ctemp
2103  IF ( p .LT. q ) v(p,q) = - v(p,q)
2104  5968 continue
2105  5969 continue
2106  else
2107  CALL claset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
2108  END IF
2109 
2110  CALL cgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
2111  $ lwork-2*n, ierr )
2112  CALL clacpy( 'L', n, nr, v, ldv, cwork(2*n+1), n )
2113 *
2114  DO 7969 p = 1, nr
2115  CALL ccopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
2116  CALL clacgv( nr-p+1, u(p,p), 1 )
2117  7969 continue
2118 
2119  IF ( l2pert ) then
2120  xsc = sqrt(small/epsln)
2121  DO 9970 q = 2, nr
2122  DO 9971 p = 1, q - 1
2123  ctemp = cmplx(xsc * min(abs(u(p,p)),abs(u(q,q))),
2124  $ zero)
2125 * U(p,q) = - TEMP1 * ( U(q,p) / ABS(U(q,p)) )
2126  u(p,q) = - ctemp
2127  9971 continue
2128  9970 continue
2129  else
2130  CALL claset('U', nr-1, nr-1, czero, czero, u(1,2), ldu )
2131  END IF
2132 
2133  CALL cgesvj( 'L', 'U', 'V', nr, nr, u, ldu, sva,
2134  $ n, v, ldv, cwork(2*n+n*nr+1), lwork-2*n-n*nr,
2135  $ rwork, lrwork, info )
2136  scalem = rwork(1)
2137  numrank = nint(rwork(2))
2138 
2139  IF ( nr .LT. n ) then
2140  CALL claset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
2141  CALL claset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
2142  CALL claset( 'A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv )
2143  END IF
2144 
2145  CALL cunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
2146  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
2147 *
2148 * Permute the rows of V using the (column) permutation from the
2149 * first QRF. Also, scale the columns to make them unit in
2150 * Euclidean norm. This applies to all cases.
2151 *
2152  temp1 = sqrt(REAL(n)) * epsln
2153  DO 7972 q = 1, n
2154  DO 8972 p = 1, n
2155  cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
2156  8972 continue
2157  DO 8973 p = 1, n
2158  v(p,q) = cwork(2*n+n*nr+nr+p)
2159  8973 continue
2160  xsc = one / scnrm2( n, v(1,q), 1 )
2161  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2162  $ CALL csscal( n, xsc, v(1,q), 1 )
2163  7972 continue
2164 *
2165 * At this moment, V contains the right singular vectors of A.
2166 * Next, assemble the left singular vector matrix U (M x N).
2167 *
2168  IF ( nr .LT. m ) then
2169  CALL claset( 'A', m-nr, nr, czero, czero, u(nr+1,1), ldu )
2170  IF ( nr .LT. n1 ) then
2171  CALL claset('A',nr, n1-nr, czero, czero, u(1,nr+1),ldu)
2172  CALL claset('A',m-nr,n1-nr, czero, cone,u(nr+1,nr+1),ldu)
2173  END IF
2174  END IF
2175 *
2176  CALL cunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
2177  $ ldu, cwork(n+1), lwork-n, ierr )
2178 *
2179  IF ( rowpiv )
2180  $ CALL claswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2181 *
2182 *
2183  END IF
2184  IF ( transp ) then
2185 * .. swap U and V because the procedure worked on A^*
2186  DO 6974 p = 1, n
2187  CALL cswap( n, u(1,p), 1, v(1,p), 1 )
2188  6974 continue
2189  END IF
2190 *
2191  END IF
2192 * end of the full SVD
2193 *
2194 * Undo scaling, if necessary (and possible)
2195 *
2196  IF ( uscal2 .LE. (big/sva(1))*uscal1 ) then
2197  CALL slascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
2198  uscal1 = one
2199  uscal2 = one
2200  END IF
2201 *
2202  IF ( nr .LT. n ) then
2203  DO 3004 p = nr+1, n
2204  sva(p) = zero
2205  3004 continue
2206  END IF
2207 *
2208  rwork(1) = uscal2 * scalem
2209  rwork(2) = uscal1
2210  IF ( errest ) rwork(3) = sconda
2211  IF ( lsvec .AND. rsvec ) then
2212  rwork(4) = condr1
2213  rwork(5) = condr2
2214  END IF
2215  IF ( l2tran ) then
2216  rwork(6) = entra
2217  rwork(7) = entrat
2218  END IF
2219 *
2220  iwork(1) = nr
2221  iwork(2) = numrank
2222  iwork(3) = warning
2223  IF ( transp ) then
2224  iwork(4) = 1
2225  else
2226  iwork(4) = -1
2227  END IF
2228 
2229 *
2230  return
2231 * ..
2232 * .. END OF CGEJSV
2233 * ..
subroutine cgesvj(JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, LDV, CWORK, LWORK, RWORK, LRWORK, INFO)
CGESVJ
Definition: cgesvj.f:351
real function scnrm2(N, X, INCX)
SCNRM2
Definition: scnrm2.f:56
integer function icamax(N, CX, INCX)
ICAMAX
Definition: icamax.f:53
subroutine classq(N, X, INCX, SCALE, SUMSQ)
CLASSQ updates a sum of squares represented in scaled form.
Definition: classq.f:108
subroutine claset(UPLO, M, N, ALPHA, BETA, A, LDA)
CLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: claset.f:108
subroutine claswp(N, A, LDA, K1, K2, IPIV, INCX)
CLASWP performs a series of row interchanges on a general rectangular matrix.
Definition: claswp.f:116
subroutine cungqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
CUNGQR
Definition: cungqr.f:130
subroutine clapmr(FORWRD, M, N, X, LDX, K)
CLAPMR rearranges rows of a matrix as specified by a permutation vector.
Definition: clapmr.f:106
subroutine clascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
CLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: clascl.f:145
integer function isamax(N, SX, INCX)
ISAMAX
Definition: isamax.f:53
subroutine cpocon(UPLO, N, A, LDA, ANORM, RCOND, WORK, RWORK, INFO)
CPOCON
Definition: cpocon.f:123
subroutine ctrsm(SIDE, UPLO, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
CTRSM
Definition: ctrsm.f:182
subroutine cgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
CGELQF
Definition: cgelqf.f:137
subroutine cgeqp3(M, N, A, LDA, JPVT, TAU, WORK, LWORK, RWORK, INFO)
CGEQP3
Definition: cgeqp3.f:161
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine cgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
CGEQRF
Definition: cgeqrf.f:138
subroutine slascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
SLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: slascl.f:145
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
real function slamch(CMACH)
SLAMCH
Definition: slamch.f:69
subroutine clacpy(UPLO, M, N, A, LDA, B, LDB)
CLACPY copies all or part of one two-dimensional array to another.
Definition: clacpy.f:105
subroutine clacgv(N, X, INCX)
CLACGV conjugates a complex vector.
Definition: clacgv.f:76
subroutine cunmqr(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
CUNMQR
Definition: cunmqr.f:170
subroutine sscal(N, SA, SX, INCX)
SSCAL
Definition: sscal.f:55
subroutine ccopy(N, CX, INCX, CY, INCY)
CCOPY
Definition: ccopy.f:52
subroutine cswap(N, CX, INCX, CY, INCY)
CSWAP
Definition: cswap.f:52
subroutine cunmlq(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
CUNMLQ
Definition: cunmlq.f:170
subroutine slassq(N, X, INCX, SCALE, SUMSQ)
SLASSQ updates a sum of squares represented in scaled form.
Definition: slassq.f:105
subroutine csscal(N, SA, CX, INCX)
CSSCAL
Definition: csscal.f:54

Here is the call graph for this function:

Here is the caller graph for this function: