LAPACK  3.7.0
LAPACK: Linear Algebra PACKage
subroutine zgejsv ( character*1  JOBA,
character*1  JOBU,
character*1  JOBV,
character*1  JOBR,
character*1  JOBT,
character*1  JOBP,
integer  M,
integer  N,
complex*16, dimension( lda, * )  A,
integer  LDA,
double precision, dimension( n )  SVA,
complex*16, dimension( ldu, * )  U,
integer  LDU,
complex*16, dimension( ldv, * )  V,
integer  LDV,
complex*16, dimension( lwork )  CWORK,
integer  LWORK,
double precision, dimension( lrwork )  RWORK,
integer  LRWORK,
integer, dimension( * )  IWORK,
integer  INFO 
)

ZGEJSV

Download ZGEJSV + dependencies [TGZ] [ZIP] [TXT]

Purpose:
 ZGEJSV computes the singular value decomposition (SVD) of a complex M-by-N
 matrix [A], where M >= N. The SVD of [A] is written as

              [A] = [U] * [SIGMA] * [V]^*,

 where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
 diagonal elements, [U] is an M-by-N (or M-by-M) unitary matrix, and
 [V] is an N-by-N unitary matrix. The diagonal elements of [SIGMA] are
 the singular values of [A]. The columns of [U] and [V] are the left and
 the right singular vectors of [A], respectively. The matrices [U] and [V]
 are computed and stored in the arrays U and V, respectively. The diagonal
 of [SIGMA] is computed and stored in the array SVA.

Arguments:

Parameters
[in]JOBA
          JOBA is CHARACTER*1
         Specifies the level of accuracy:
       = 'C': This option works well (high relative accuracy) if A = B * D,
              with well-conditioned B and arbitrary diagonal matrix D.
              The accuracy cannot be spoiled by COLUMN scaling. The
              accuracy of the computed output depends on the condition of
              B, and the procedure aims at the best theoretical accuracy.
              The relative error max_{i=1:N}|d sigma_i| / sigma_i is
              bounded by f(M,N)*epsilon* cond(B), independent of D.
              The input matrix is preprocessed with the QRF with column
              pivoting. This initial preprocessing and preconditioning by
              a rank revealing QR factorization is common for all values of
              JOBA. Additional actions are specified as follows:
       = 'E': Computation as with 'C' with an additional estimate of the
              condition number of B. It provides a realistic error bound.
       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
              D1, D2, and well-conditioned matrix C, this option gives
              higher accuracy than the 'C' option. If the structure of the
              input matrix is not known, and relative accuracy is
              desirable, then this option is advisable. The input matrix A
              is preprocessed with QR factorization with FULL (row and
              column) pivoting.
       = 'G'  Computation as with 'F' with an additional estimate of the
              condition number of B, where A=B*D. If A has heavily weighted
              rows, then using this condition number gives too pessimistic
              error bound.
       = 'A': Small singular values are not well determined by the data 
              and are considered as noisy; the matrix is treated as
              numerically rank defficient. The error in the computed
              singular values is bounded by f(m,n)*epsilon*||A||.
              The computed SVD A = U * S * V^* restores A up to
              f(m,n)*epsilon*||A||.
              This gives the procedure the licence to discard (set to zero)
              all singular values below N*epsilon*||A||.
       = 'R': Similar as in 'A'. Rank revealing property of the initial
              QR factorization is used do reveal (using triangular factor)
              a gap sigma_{r+1} < epsilon * sigma_r in which case the
              numerical RANK is declared to be r. The SVD is computed with
              absolute error bounds, but more accurately than with 'A'.
[in]JOBU
          JOBU is CHARACTER*1
         Specifies whether to compute the columns of U:
       = 'U': N columns of U are returned in the array U.
       = 'F': full set of M left sing. vectors is returned in the array U.
       = 'W': U may be used as workspace of length M*N. See the description
              of U.
       = 'N': U is not computed.
[in]JOBV
          JOBV is CHARACTER*1
         Specifies whether to compute the matrix V:
       = 'V': N columns of V are returned in the array V; Jacobi rotations
              are not explicitly accumulated.
       = 'J': N columns of V are returned in the array V, but they are
              computed as the product of Jacobi rotations, if JOBT .EQ. 'N'.
       = 'W': V may be used as workspace of length N*N. See the description
              of V.
       = 'N': V is not computed.
[in]JOBR
          JOBR is CHARACTER*1
         Specifies the RANGE for the singular values. Issues the licence to
         set to zero small positive singular values if they are outside
         specified range. If A .NE. 0 is scaled so that the largest singular
         value of c*A is around SQRT(BIG), BIG=DLAMCH('O'), then JOBR issues
         the licence to kill columns of A whose norm in c*A is less than
         SQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
         where SFMIN=DLAMCH('S'), EPSLN=DLAMCH('E').
       = 'N': Do not kill small columns of c*A. This option assumes that
              BLAS and QR factorizations and triangular solvers are
              implemented to work in that range. If the condition of A
              is greater than BIG, use ZGESVJ.
       = 'R': RESTRICTED range for sigma(c*A) is [SQRT(SFMIN), SQRT(BIG)]
              (roughly, as described above). This option is recommended.
                                             ===========================
         For computing the singular values in the FULL range [SFMIN,BIG]
         use ZGESVJ.
[in]JOBT
          JOBT is CHARACTER*1
         If the matrix is square then the procedure may determine to use
         transposed A if A^* seems to be better with respect to convergence.
         If the matrix is not square, JOBT is ignored. 
         The decision is based on two values of entropy over the adjoint
         orbit of A^* * A. See the descriptions of WORK(6) and WORK(7).
       = 'T': transpose if entropy test indicates possibly faster
         convergence of Jacobi process if A^* is taken as input. If A is
         replaced with A^*, then the row pivoting is included automatically.
       = 'N': do not speculate.
         The option 'T' can be used to compute only the singular values, or
         the full SVD (U, SIGMA and V). For only one set of singular vectors
         (U or V), the caller should provide both U and V, as one of the
         matrices is used as workspace if the matrix A is transposed.
         The implementer can easily remove this constraint and make the
         code more complicated. See the descriptions of U and V.
         In general, this option is considered experimental, and 'N'; should
         be preferred. This is subject to changes in the future.
[in]JOBP
          JOBP is CHARACTER*1
         Issues the licence to introduce structured perturbations to drown
         denormalized numbers. This licence should be active if the
         denormals are poorly implemented, causing slow computation,
         especially in cases of fast convergence (!). For details see [1,2].
         For the sake of simplicity, this perturbations are included only
         when the full SVD or only the singular values are requested. The
         implementer/user can easily add the perturbation for the cases of
         computing one set of singular vectors.
       = 'P': introduce perturbation
       = 'N': do not perturb
[in]M
          M is INTEGER
         The number of rows of the input matrix A.  M >= 0.
[in]N
          N is INTEGER
         The number of columns of the input matrix A. M >= N >= 0.
[in,out]A
          A is COMPLEX*16 array, dimension (LDA,N)
          On entry, the M-by-N matrix A.
[in]LDA
          LDA is INTEGER
          The leading dimension of the array A.  LDA >= max(1,M).
[out]SVA
          SVA is DOUBLE PRECISION array, dimension (N)
          On exit,
          - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
            computation SVA contains Euclidean column norms of the
            iterated matrices in the array A.
          - For WORK(1) .NE. WORK(2): The singular values of A are
            (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
            sigma_max(A) overflows or if small singular values have been
            saved from underflow by scaling the input matrix A.
          - If JOBR='R' then some of the singular values may be returned
            as exact zeros obtained by "set to zero" because they are
            below the numerical rank threshold or are denormalized numbers.
[out]U
          U is COMPLEX*16 array, dimension ( LDU, N )
          If JOBU = 'U', then U contains on exit the M-by-N matrix of
                         the left singular vectors.
          If JOBU = 'F', then U contains on exit the M-by-M matrix of
                         the left singular vectors, including an ONB
                         of the orthogonal complement of the Range(A).
          If JOBU = 'W'  .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
                         then U is used as workspace if the procedure
                         replaces A with A^*. In that case, [V] is computed
                         in U as left singular vectors of A^* and then
                         copied back to the V array. This 'W' option is just
                         a reminder to the caller that in this case U is
                         reserved as workspace of length N*N.
          If JOBU = 'N'  U is not referenced, unless JOBT='T'.
[in]LDU
          LDU is INTEGER
          The leading dimension of the array U,  LDU >= 1.
          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.
[out]V
          V is COMPLEX*16 array, dimension ( LDV, N )
          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
                         the right singular vectors;
          If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
                         then V is used as workspace if the pprocedure
                         replaces A with A^*. In that case, [U] is computed
                         in V as right singular vectors of A^* and then
                         copied back to the U array. This 'W' option is just
                         a reminder to the caller that in this case V is
                         reserved as workspace of length N*N.
          If JOBV = 'N'  V is not referenced, unless JOBT='T'.
[in]LDV
          LDV is INTEGER
          The leading dimension of the array V,  LDV >= 1.
          If JOBV = 'V' or 'J' or 'W', then LDV >= N.
[out]CWORK
          CWORK is COMPLEX*16 array, dimension (MAX(2,LWORK))
          If the call to ZGEJSV is a workspace query (indicated by LWORK=-1 or
          LRWORK=-1), then on exit CWORK(1) contains the required length of
          CWORK for the job parameters used in the call.
[in]LWORK
          LWORK is INTEGER
          Length of CWORK to confirm proper allocation of workspace.
          LWORK depends on the job:

          1. If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
            1.1 .. no scaled condition estimate required (JOBA.NE.'E'.AND.JOBA.NE.'G'):
               LWORK >= 2*N+1. This is the minimal requirement.
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= N + (N+1)*NB. Here NB is the optimal
               block size for ZGEQP3 and ZGEQRF.
               In general, optimal LWORK is computed as
               LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF), LWORK(ZGESVJ)).
            1.2. .. an estimate of the scaled condition number of A is
               required (JOBA='E', or 'G'). In this case, LWORK the minimal
               requirement is LWORK >= N*N + 2*N.
               ->> For optimal performance (blocked code) the optimal value
               is LWORK >= max(N+(N+1)*NB, N*N+2*N)=N**2+2*N.
               In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF), LWORK(ZGESVJ),
                            N*N+LWORK(ZPOCON)).
          2. If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
             (JOBU.EQ.'N')
            2.1   .. no scaled condition estimate requested (JOBE.EQ.'N'):    
            -> the minimal requirement is LWORK >= 3*N.
            -> For optimal performance, 
               LWORK >= max(N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
               where NB is the optimal block size for ZGEQP3, ZGEQRF, ZGELQ,
               ZUNMLQ. In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(ZGEQP3), N+LWORK(ZGESVJ),
                       N+LWORK(ZGELQF), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMLQ)).
            2.2 .. an estimate of the scaled condition number of A is
               required (JOBA='E', or 'G').
            -> the minimal requirement is LWORK >= 3*N.      
            -> For optimal performance, 
               LWORK >= max(N+(N+1)*NB, 2*N,2*N+N*NB)=2*N+N*NB,
               where NB is the optimal block size for ZGEQP3, ZGEQRF, ZGELQ,
               ZUNMLQ. In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(ZGEQP3), LWORK(ZPOCON), N+LWORK(ZGESVJ),
                       N+LWORK(ZGELQF), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMLQ)).   
          3. If SIGMA and the left singular vectors are needed
            3.1  .. no scaled condition estimate requested (JOBE.EQ.'N'):
            -> the minimal requirement is LWORK >= 3*N.
            -> For optimal performance:
               if JOBU.EQ.'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
               where NB is the optimal block size for ZGEQP3, ZGEQRF, ZUNMQR.
               In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(ZGEQP3), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMQR)). 
            3.2  .. an estimate of the scaled condition number of A is
               required (JOBA='E', or 'G').
            -> the minimal requirement is LWORK >= 3*N.
            -> For optimal performance:
               if JOBU.EQ.'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
               where NB is the optimal block size for ZGEQP3, ZGEQRF, ZUNMQR.
               In general, the optimal length LWORK is computed as
               LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZPOCON),
                        2*N+LWORK(ZGEQRF), N+LWORK(ZUNMQR)).
          4. If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and 
            4.1. if JOBV.EQ.'V'  
               the minimal requirement is LWORK >= 5*N+2*N*N. 
            4.2. if JOBV.EQ.'J' the minimal requirement is 
               LWORK >= 4*N+N*N.
            In both cases, the allocated CWORK can accomodate blocked runs
            of ZGEQP3, ZGEQRF, ZGELQF, SUNMQR, ZUNMLQ.

          If the call to ZGEJSV is a workspace query (indicated by LWORK=-1 or
          LRWORK=-1), then on exit CWORK(1) contains the optimal and CWORK(2) contains the
          minimal length of CWORK for the job parameters used in the call.
[out]RWORK
          RWORK is DOUBLE PRECISION array, dimension (MAX(7,LWORK))
          On exit,
          RWORK(1) = Determines the scaling factor SCALE = RWORK(2) / RWORK(1)
                    such that SCALE*SVA(1:N) are the computed singular values
                    of A. (See the description of SVA().)
          RWORK(2) = See the description of RWORK(1).
          RWORK(3) = SCONDA is an estimate for the condition number of
                    column equilibrated A. (If JOBA .EQ. 'E' or 'G')
                    SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
                    It is computed using SPOCON. It holds
                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
                    where R is the triangular factor from the QRF of A.
                    However, if R is truncated and the numerical rank is
                    determined to be strictly smaller than N, SCONDA is
                    returned as -1, thus indicating that the smallest
                    singular values might be lost.

          If full SVD is needed, the following two condition numbers are
          useful for the analysis of the algorithm. They are provied for
          a developer/implementer who is familiar with the details of
          the method.

          RWORK(4) = an estimate of the scaled condition number of the
                    triangular factor in the first QR factorization.
          RWORK(5) = an estimate of the scaled condition number of the
                    triangular factor in the second QR factorization.
          The following two parameters are computed if JOBT .EQ. 'T'.
          They are provided for a developer/implementer who is familiar
          with the details of the method.
          RWORK(6) = the entropy of A^* * A :: this is the Shannon entropy
                    of diag(A^* * A) / Trace(A^* * A) taken as point in the
                    probability simplex.
          RWORK(7) = the entropy of A * A^*. (See the description of RWORK(6).)
          If the call to ZGEJSV is a workspace query (indicated by LWORK=-1 or
          LRWORK=-1), then on exit RWORK(1) contains the required length of
          RWORK for the job parameters used in the call.
[in]LRWORK
          LRWORK is INTEGER
          Length of RWORK to confirm proper allocation of workspace.
          LRWORK depends on the job:

       1. If only the singular values are requested i.e. if
          LSAME(JOBU,'N') .AND. LSAME(JOBV,'N')
          then:
          1.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
               then: LRWORK = max( 7, 2 * M ).
          1.2. Otherwise, LRWORK  = max( 7,  N ).
       2. If singular values with the right singular vectors are requested
          i.e. if
          (LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) .AND.
          .NOT.(LSAME(JOBU,'U').OR.LSAME(JOBU,'F'))
          then:
          2.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, 2 * M ).
          2.2. Otherwise, LRWORK  = max( 7,  N ).
       3. If singular values with the left singular vectors are requested, i.e. if
          (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
          .NOT.(LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
          then:
          3.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, 2 * M ).
          3.2. Otherwise, LRWORK  = max( 7,  N ).
       4. If singular values with both the left and the right singular vectors
          are requested, i.e. if
          (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND.
          (LSAME(JOBV,'V').OR.LSAME(JOBV,'J'))
          then:
          4.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'),
          then LRWORK = max( 7, 2 * M ).
          4.2. Otherwise, LRWORK  = max( 7, N ).

          If, on entry, LRWORK = -1 ot LWORK=-1, a workspace query is assumed and 
          the length of RWORK is returned in RWORK(1). 
[out]IWORK
          IWORK is INTEGER array, of dimension at least 4, that further depends 
          on the job:

          1. If only the singular values are requested then:
             If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
             then the length of IWORK is N+M; otherwise the length of IWORK is N.
          2. If the singular values and the right singular vectors are requested then:
             If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
             then the length of IWORK is N+M; otherwise the length of IWORK is N. 
          3. If the singular values and the left singular vectors are requested then:
             If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
             then the length of IWORK is N+M; otherwise the length of IWORK is N. 
          4. If the singular values with both the left and the right singular vectors
             are requested, then:      
             4.1. If LSAME(JOBV,'J') the length of IWORK is determined as follows:
                  If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
                  then the length of IWORK is N+M; otherwise the length of IWORK is N. 
             4.2. If LSAME(JOBV,'V') the length of IWORK is determined as follows:
                  If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) 
                  then the length of IWORK is 2*N+M; otherwise the length of IWORK is 2*N.
        
          On exit,
          IWORK(1) = the numerical rank determined after the initial
                     QR factorization with pivoting. See the descriptions
                     of JOBA and JOBR.
          IWORK(2) = the number of the computed nonzero singular values
          IWORK(3) = if nonzero, a warning message:
                     If IWORK(3).EQ.1 then some of the column norms of A
                     were denormalized floats. The requested high accuracy
                     is not warranted by the data.
          IWORK(4) = 1 or -1. If IWORK(4) .EQ. 1, then the procedure used A^* to
                     do the job as specified by the JOB parameters.
          If the call to ZGEJSV is a workspace query (indicated by LWORK .EQ. -1 or
          LRWORK .EQ. -1), then on exit IWORK(1) contains the required length of 
          IWORK for the job parameters used in the call.
[out]INFO
          INFO is INTEGER
           < 0  : if INFO = -i, then the i-th argument had an illegal value.
           = 0 :  successful exit;
           > 0 :  ZGEJSV  did not converge in the maximal allowed number
                  of sweeps. The computed values may be inaccurate.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Date
June 2016
Further Details:
  ZGEJSV implements a preconditioned Jacobi SVD algorithm. It uses ZGEQP3,
  ZGEQRF, and ZGELQF as preprocessors and preconditioners. Optionally, an
  additional row pivoting can be used as a preprocessor, which in some
  cases results in much higher accuracy. An example is matrix A with the
  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
  diagonal matrices and C is well-conditioned matrix. In that case, complete
  pivoting in the first QR factorizations provides accuracy dependent on the
  condition number of C, and independent of D1, D2. Such higher accuracy is
  not completely understood theoretically, but it works well in practice.
  Further, if A can be written as A = B*D, with well-conditioned B and some
  diagonal D, then the high accuracy is guaranteed, both theoretically and
  in software, independent of D. For more details see [1], [2].
     The computational range for the singular values can be the full range
  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
  & LAPACK routines called by ZGEJSV are implemented to work in that range.
  If that is not the case, then the restriction for safe computation with
  the singular values in the range of normalized IEEE numbers is that the
  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
  overflow. This code (ZGEJSV) is best used in this restricted range,
  meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
  returned as zeros. See JOBR for details on this.
     Further, this implementation is somewhat slower than the one described
  in [1,2] due to replacement of some non-LAPACK components, and because
  the choice of some tuning parameters in the iterative part (ZGESVJ) is
  left to the implementer on a particular machine.
     The rank revealing QR factorization (in this code: ZGEQP3) should be
  implemented as in [3]. We have a new version of ZGEQP3 under development
  that is more robust than the current one in LAPACK, with a cleaner cut in
  rank deficient cases. It will be available in the SIGMA library [4].
  If M is much larger than N, it is obvious that the initial QRF with
  column pivoting can be preprocessed by the QRF without pivoting. That
  well known trick is not used in ZGEJSV because in some cases heavy row
  weighting can be treated with complete pivoting. The overhead in cases
  M much larger than N is then only due to pivoting, but the benefits in
  terms of accuracy have prevailed. The implementer/user can incorporate
  this extra QRF step easily. The implementer can also improve data movement
  (matrix transpose, matrix copy, matrix transposed copy) - this
  implementation of ZGEJSV uses only the simplest, naive data movement.
Contributor:
Zlatko Drmac, Department of Mathematics, Faculty of Science, University of Zagreb (Zagreb, Croatia); drmac.nosp@m.@mat.nosp@m.h.hr
References:
 [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
     LAPACK Working note 169.
 [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
     LAPACK Working note 170.
 [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
     factorization software - a case study.
     ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
     LAPACK Working note 176.
 [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
     QSVD, (H,K)-SVD computations.
     Department of Mathematics, University of Zagreb, 2008, 2016.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac.nosp@m.@mat.nosp@m.h.hr. Thank you.

Definition at line 571 of file zgejsv.f.

571 *
572 * -- LAPACK computational routine (version 3.7.0) --
573 * -- LAPACK is a software package provided by Univ. of Tennessee, --
574 * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
575 * December 2016
576 *
577 * .. Scalar Arguments ..
578  IMPLICIT NONE
579  INTEGER info, lda, ldu, ldv, lwork, lrwork, m, n
580 * ..
581 * .. Array Arguments ..
582  COMPLEX*16 a( lda, * ), u( ldu, * ), v( ldv, * ),
583  $ cwork( lwork )
584  DOUBLE PRECISION sva( n ), rwork( lrwork )
585  INTEGER iwork( * )
586  CHARACTER*1 joba, jobp, jobr, jobt, jobu, jobv
587 * ..
588 *
589 * ===========================================================================
590 *
591 * .. Local Parameters ..
592  DOUBLE PRECISION zero, one
593  parameter( zero = 0.0d0, one = 1.0d0 )
594  COMPLEX*16 czero, cone
595  parameter( czero = ( 0.0d0, 0.0d0 ), cone = ( 1.0d0, 0.0d0 ) )
596 * ..
597 * .. Local Scalars ..
598  COMPLEX*16 ctemp
599  DOUBLE PRECISION aapp, aaqq, aatmax, aatmin, big, big1,
600  $ cond_ok, condr1, condr2, entra, entrat, epsln,
601  $ maxprj, scalem, sconda, sfmin, small, temp1,
602  $ uscal1, uscal2, xsc
603  INTEGER ierr, n1, nr, numrank, p, q, warning
604  LOGICAL almort, defr, errest, goscal, jracc, kill, lquery,
605  $ lsvec, l2aber, l2kill, l2pert, l2rank, l2tran, noscal,
606  $ rowpiv, rsvec, transp
607 *
608  INTEGER optwrk, minwrk, minrwrk, miniwrk
609  INTEGER lwcon, lwlqf, lwqp3, lwqrf, lwunmlq, lwunmqr, lwunmqrm,
610  $ lwsvdj, lwsvdjv, lrwqp3, lrwcon, lrwsvdj, iwoff
611  INTEGER lwrk_zgelqf, lwrk_zgeqp3, lwrk_zgeqp3n, lwrk_zgeqrf,
612  $ lwrk_zgesvj, lwrk_zgesvjv, lwrk_zgesvju, lwrk_zunmlq,
613  $ lwrk_zunmqr, lwrk_zunmqrm
614 * ..
615 * .. Local Arrays
616  COMPLEX*16 cdummy(1)
617  DOUBLE PRECISION rdummy(1)
618 *
619 * .. Intrinsic Functions ..
620  INTRINSIC abs, dcmplx, conjg, dlog, max, min, dble, nint, sqrt
621 * ..
622 * .. External Functions ..
623  DOUBLE PRECISION dlamch, dznrm2
624  INTEGER idamax, izamax
625  LOGICAL lsame
626  EXTERNAL idamax, izamax, lsame, dlamch, dznrm2
627 * ..
628 * .. External Subroutines ..
629  EXTERNAL dlassq, zcopy, zgelqf, zgeqp3, zgeqrf, zlacpy, zlapmr,
632  $ xerbla
633 *
634  EXTERNAL zgesvj
635 * ..
636 *
637 * Test the input arguments
638 *
639  lsvec = lsame( jobu, 'U' ) .OR. lsame( jobu, 'F' )
640  jracc = lsame( jobv, 'J' )
641  rsvec = lsame( jobv, 'V' ) .OR. jracc
642  rowpiv = lsame( joba, 'F' ) .OR. lsame( joba, 'G' )
643  l2rank = lsame( joba, 'R' )
644  l2aber = lsame( joba, 'A' )
645  errest = lsame( joba, 'E' ) .OR. lsame( joba, 'G' )
646  l2tran = lsame( jobt, 'T' ) .AND. ( m .EQ. n )
647  l2kill = lsame( jobr, 'R' )
648  defr = lsame( jobr, 'N' )
649  l2pert = lsame( jobp, 'P' )
650 *
651  lquery = ( lwork .EQ. -1 ) .OR. ( lrwork .EQ. -1 )
652 *
653  IF ( .NOT.(rowpiv .OR. l2rank .OR. l2aber .OR.
654  $ errest .OR. lsame( joba, 'C' ) )) then
655  info = - 1
656  ELSE IF ( .NOT.( lsvec .OR. lsame( jobu, 'N' ) .OR.
657  $ ( lsame( jobu, 'W' ) .AND. rsvec .AND. l2tran ) ) ) then
658  info = - 2
659  ELSE IF ( .NOT.( rsvec .OR. lsame( jobv, 'N' ) .OR.
660  $ ( lsame( jobv, 'W' ) .AND. lsvec .AND. l2tran ) ) ) then
661  info = - 3
662  ELSE IF ( .NOT. ( l2kill .OR. defr ) ) then
663  info = - 4
664  ELSE IF ( .NOT. ( lsame(jobt,'T') .OR. lsame(jobt,'N') ) ) then
665  info = - 5
666  ELSE IF ( .NOT. ( l2pert .OR. lsame( jobp, 'N' ) ) ) then
667  info = - 6
668  ELSE IF ( m .LT. 0 ) then
669  info = - 7
670  ELSE IF ( ( n .LT. 0 ) .OR. ( n .GT. m ) ) then
671  info = - 8
672  ELSE IF ( lda .LT. m ) then
673  info = - 10
674  ELSE IF ( lsvec .AND. ( ldu .LT. m ) ) then
675  info = - 13
676  ELSE IF ( rsvec .AND. ( ldv .LT. n ) ) then
677  info = - 15
678  else
679 * #:)
680  info = 0
681  END IF
682 *
683  IF ( info .EQ. 0 ) THEN
684 * .. compute the minimal and the optimal workspace lengths
685 * [[The expressions for computing the minimal and the optimal
686 * values of LCWORK, LRWORK are written with a lot of redundancy and
687 * can be simplified. However, this verbose form is useful for
688 * maintenance and modifications of the code.]]
689 *
690 * .. minimal workspace length for ZGEQP3 of an M x N matrix,
691 * ZGEQRF of an N x N matrix, ZGELQF of an N x N matrix,
692 * ZUNMLQ for computing N x N matrix, ZUNMQR for computing N x N
693 * matrix, ZUNMQR for computing M x N matrix, respectively.
694  lwqp3 = n+1
695  lwqrf = max( 1, n )
696  lwlqf = max( 1, n )
697  lwunmlq = max( 1, n )
698  lwunmqr = max( 1, n )
699  lwunmqrm = max( 1, m )
700 * .. minimal workspace length for ZPOCON of an N x N matrix
701  lwcon = 2 * n
702 * .. minimal workspace length for ZGESVJ of an N x N matrix,
703 * without and with explicit accumulation of Jacobi rotations
704  lwsvdj = max( 2 * n, 1 )
705  lwsvdjv = max( 2 * n, 1 )
706 * .. minimal REAL workspace length for ZGEQP3, ZPOCON, ZGESVJ
707  lrwqp3 = n
708  lrwcon = n
709  lrwsvdj = n
710  IF ( lquery ) THEN
711  CALL zgeqp3( m, n, a, lda, iwork, cdummy, cdummy, -1,
712  $ rdummy, ierr )
713  lwrk_zgeqp3 = cdummy(1)
714  CALL zgeqrf( n, n, a, lda, cdummy, cdummy,-1, ierr )
715  lwrk_zgeqrf = cdummy(1)
716  CALL zgelqf( n, n, a, lda, cdummy, cdummy,-1, ierr )
717  lwrk_zgelqf = cdummy(1)
718  END IF
719  minwrk = 2
720  optwrk = 2
721  miniwrk = n
722  IF ( .NOT. (lsvec .OR. rsvec ) ) then
723 * .. minimal and optimal sizes of the complex workspace if
724 * only the singular values are requested
725  IF ( errest ) THEN
726  minwrk = max( n+lwqp3, n**2+lwcon, n+lwqrf, lwsvdj )
727  else
728  minwrk = max( n+lwqp3, n+lwqrf, lwsvdj )
729  END IF
730  IF ( lquery ) THEN
731  CALL zgesvj( 'L', 'N', 'N', n, n, a, lda, sva, n, v,
732  $ ldv, cdummy, -1, rdummy, -1, ierr )
733  lwrk_zgesvj = cdummy(1)
734  IF ( errest ) THEN
735  optwrk = max( n+lwrk_zgeqp3, n**2+lwcon,
736  $ n+lwrk_zgeqrf, lwrk_zgesvj )
737  else
738  optwrk = max( n+lwrk_zgeqp3, n+lwrk_zgeqrf,
739  $ lwrk_zgesvj )
740  END IF
741  END IF
742  IF ( l2tran .OR. rowpiv ) THEN
743  IF ( errest ) THEN
744  minrwrk = max( 7, 2*m, lrwqp3, lrwcon, lrwsvdj )
745  else
746  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
747  END IF
748  else
749  IF ( errest ) THEN
750  minrwrk = max( 7, lrwqp3, lrwcon, lrwsvdj )
751  else
752  minrwrk = max( 7, lrwqp3, lrwsvdj )
753  END IF
754  END IF
755  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
756  ELSE IF ( rsvec .AND. (.NOT.lsvec) ) then
757 * .. minimal and optimal sizes of the complex workspace if the
758 * singular values and the right singular vectors are requested
759  IF ( errest ) THEN
760  minwrk = max( n+lwqp3, lwcon, lwsvdj, n+lwlqf,
761  $ 2*n+lwqrf, n+lwsvdj, n+lwunmlq )
762  else
763  minwrk = max( n+lwqp3, lwsvdj, n+lwlqf, 2*n+lwqrf,
764  $ n+lwsvdj, n+lwunmlq )
765  END IF
766  IF ( lquery ) then
767  CALL zgesvj( 'L', 'U', 'N', n,n, u, ldu, sva, n, a,
768  $ lda, cdummy, -1, rdummy, -1, ierr )
769  lwrk_zgesvj = cdummy(1)
770  CALL zunmlq( 'L', 'C', n, n, n, a, lda, cdummy,
771  $ v, ldv, cdummy, -1, ierr )
772  lwrk_zunmlq = cdummy(1)
773  IF ( errest ) THEN
774  optwrk = max( n+lwrk_zgeqp3, lwcon, lwrk_zgesvj,
775  $ n+lwrk_zgelqf, 2*n+lwrk_zgeqrf,
776  $ n+lwrk_zgesvj, n+lwrk_zunmlq )
777  else
778  optwrk = max( n+lwrk_zgeqp3, lwrk_zgesvj,n+lwrk_zgelqf,
779  $ 2*n+lwrk_zgeqrf, n+lwrk_zgesvj,
780  $ n+lwrk_zunmlq )
781  END IF
782  END IF
783  IF ( l2tran .OR. rowpiv ) THEN
784  IF ( errest ) THEN
785  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
786  else
787  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
788  END IF
789  else
790  IF ( errest ) THEN
791  minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
792  else
793  minrwrk = max( 7, lrwqp3, lrwsvdj )
794  END IF
795  END IF
796  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
797  ELSE IF ( lsvec .AND. (.NOT.rsvec) ) THEN
798 * .. minimal and optimal sizes of the complex workspace if the
799 * singular values and the left singular vectors are requested
800  IF ( errest ) then
801  minwrk = n + max( lwqp3,lwcon,n+lwqrf,lwsvdj,lwunmqrm )
802  else
803  minwrk = n + max( lwqp3, n+lwqrf, lwsvdj, lwunmqrm )
804  END IF
805  IF ( lquery ) then
806  CALL zgesvj( 'L', 'U', 'N', n,n, u, ldu, sva, n, a,
807  $ lda, cdummy, -1, rdummy, -1, ierr )
808  lwrk_zgesvj = cdummy(1)
809  CALL zunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
810  $ ldu, cdummy, -1, ierr )
811  lwrk_zunmqrm = cdummy(1)
812  IF ( errest ) then
813  optwrk = n + max( lwrk_zgeqp3, lwcon, n+lwrk_zgeqrf,
814  $ lwrk_zgesvj, lwrk_zunmqrm )
815  else
816  optwrk = n + max( lwrk_zgeqp3, n+lwrk_zgeqrf,
817  $ lwrk_zgesvj, lwrk_zunmqrm )
818  END IF
819  END IF
820  IF ( l2tran .OR. rowpiv ) THEN
821  IF ( errest ) THEN
822  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
823  else
824  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj )
825  END IF
826  else
827  IF ( errest ) THEN
828  minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
829  else
830  minrwrk = max( 7, lrwqp3, lrwsvdj )
831  END IF
832  END IF
833  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
834  else
835 * .. minimal and optimal sizes of the complex workspace if the
836 * full SVD is requested
837  IF ( .NOT. jracc ) THEN
838  IF ( errest ) THEN
839  minwrk = max( n+lwqp3, n+lwcon, 2*n+n**2+lwcon,
840  $ 2*n+lwqrf, 2*n+lwqp3,
841  $ 2*n+n**2+n+lwlqf, 2*n+n**2+n+n**2+lwcon,
842  $ 2*n+n**2+n+lwsvdj, 2*n+n**2+n+lwsvdjv,
843  $ 2*n+n**2+n+lwunmqr,2*n+n**2+n+lwunmlq,
844  $ n+n**2+lwsvdj, n+lwunmqrm )
845  else
846  minwrk = max( n+lwqp3, 2*n+n**2+lwcon,
847  $ 2*n+lwqrf, 2*n+lwqp3,
848  $ 2*n+n**2+n+lwlqf, 2*n+n**2+n+n**2+lwcon,
849  $ 2*n+n**2+n+lwsvdj, 2*n+n**2+n+lwsvdjv,
850  $ 2*n+n**2+n+lwunmqr,2*n+n**2+n+lwunmlq,
851  $ n+n**2+lwsvdj, n+lwunmqrm )
852  END IF
853  miniwrk = miniwrk + n
854  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
855  else
856  IF ( errest ) THEN
857  minwrk = max( n+lwqp3, n+lwcon, 2*n+lwqrf,
858  $ 2*n+n**2+lwsvdjv, 2*n+n**2+n+lwunmqr,
859  $ n+lwunmqrm )
860  else
861  minwrk = max( n+lwqp3, 2*n+lwqrf,
862  $ 2*n+n**2+lwsvdjv, 2*n+n**2+n+lwunmqr,
863  $ n+lwunmqrm )
864  END IF
865  IF ( rowpiv .OR. l2tran ) miniwrk = miniwrk + m
866  END IF
867  IF ( lquery ) then
868  CALL zunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
869  $ ldu, cdummy, -1, ierr )
870  lwrk_zunmqrm = cdummy(1)
871  CALL zunmqr( 'L', 'N', n, n, n, a, lda, cdummy, u,
872  $ ldu, cdummy, -1, ierr )
873  lwrk_zunmqr = cdummy(1)
874  IF ( .NOT. jracc ) then
875  CALL zgeqp3( n,n, a, lda, iwork, cdummy,cdummy, -1,
876  $ rdummy, ierr )
877  lwrk_zgeqp3n = cdummy(1)
878  CALL zgesvj( 'L', 'U', 'N', n, n, u, ldu, sva,
879  $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
880  lwrk_zgesvj = cdummy(1)
881  CALL zgesvj( 'U', 'U', 'N', n, n, u, ldu, sva,
882  $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
883  lwrk_zgesvju = cdummy(1)
884  CALL zgesvj( 'L', 'U', 'V', n, n, u, ldu, sva,
885  $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
886  lwrk_zgesvjv = cdummy(1)
887  CALL zunmlq( 'L', 'C', n, n, n, a, lda, cdummy,
888  $ v, ldv, cdummy, -1, ierr )
889  lwrk_zunmlq = cdummy(1)
890  IF ( errest ) THEN
891  optwrk = max( n+lwrk_zgeqp3, n+lwcon,
892  $ 2*n+n**2+lwcon, 2*n+lwrk_zgeqrf,
893  $ 2*n+lwrk_zgeqp3n,
894  $ 2*n+n**2+n+lwrk_zgelqf,
895  $ 2*n+n**2+n+n**2+lwcon,
896  $ 2*n+n**2+n+lwrk_zgesvj,
897  $ 2*n+n**2+n+lwrk_zgesvjv,
898  $ 2*n+n**2+n+lwrk_zunmqr,
899  $ 2*n+n**2+n+lwrk_zunmlq,
900  $ n+n**2+lwrk_zgesvju,
901  $ n+lwrk_zunmqrm )
902  else
903  optwrk = max( n+lwrk_zgeqp3,
904  $ 2*n+n**2+lwcon, 2*n+lwrk_zgeqrf,
905  $ 2*n+lwrk_zgeqp3n,
906  $ 2*n+n**2+n+lwrk_zgelqf,
907  $ 2*n+n**2+n+n**2+lwcon,
908  $ 2*n+n**2+n+lwrk_zgesvj,
909  $ 2*n+n**2+n+lwrk_zgesvjv,
910  $ 2*n+n**2+n+lwrk_zunmqr,
911  $ 2*n+n**2+n+lwrk_zunmlq,
912  $ n+n**2+lwrk_zgesvju,
913  $ n+lwrk_zunmqrm )
914  END IF
915  else
916  CALL zgesvj( 'L', 'U', 'V', n, n, u, ldu, sva,
917  $ n, v, ldv, cdummy, -1, rdummy, -1, ierr )
918  lwrk_zgesvjv = cdummy(1)
919  CALL zunmqr( 'L', 'N', n, n, n, cdummy, n, cdummy,
920  $ v, ldv, cdummy, -1, ierr )
921  lwrk_zunmqr = cdummy(1)
922  CALL zunmqr( 'L', 'N', m, n, n, a, lda, cdummy, u,
923  $ ldu, cdummy, -1, ierr )
924  lwrk_zunmqrm = cdummy(1)
925  IF ( errest ) THEN
926  optwrk = max( n+lwrk_zgeqp3, n+lwcon,
927  $ 2*n+lwrk_zgeqrf, 2*n+n**2,
928  $ 2*n+n**2+lwrk_zgesvjv,
929  $ 2*n+n**2+n+lwrk_zunmqr,n+lwrk_zunmqrm )
930  else
931  optwrk = max( n+lwrk_zgeqp3, 2*n+lwrk_zgeqrf,
932  $ 2*n+n**2, 2*n+n**2+lwrk_zgesvjv,
933  $ 2*n+n**2+n+lwrk_zunmqr,
934  $ n+lwrk_zunmqrm )
935  END IF
936  END IF
937  END IF
938  IF ( l2tran .OR. rowpiv ) THEN
939  minrwrk = max( 7, 2*m, lrwqp3, lrwsvdj, lrwcon )
940  else
941  minrwrk = max( 7, lrwqp3, lrwsvdj, lrwcon )
942  END IF
943  END IF
944  minwrk = max( 2, minwrk )
945  optwrk = max( 2, optwrk )
946  IF ( lwork .LT. minwrk .AND. (.NOT.lquery) ) info = - 17
947  IF ( lrwork .LT. minrwrk .AND. (.NOT.lquery) ) info = - 19
948  END IF
949 *
950  IF ( info .NE. 0 ) then
951 * #:(
952  CALL xerbla( 'ZGEJSV', - info )
953  return
954  ELSE IF ( lquery ) then
955  cwork(1) = optwrk
956  cwork(2) = minwrk
957  rwork(1) = minrwrk
958  iwork(1) = max( 4, miniwrk )
959  RETURN
960  END IF
961 *
962 * Quick return for void matrix (Y3K safe)
963 * #:)
964  IF ( ( m .EQ. 0 ) .OR. ( n .EQ. 0 ) ) then
965  iwork(1:4) = 0
966  rwork(1:7) = 0
967  return
968  ENDIF
969 *
970 * Determine whether the matrix U should be M x N or M x M
971 *
972  IF ( lsvec ) then
973  n1 = n
974  IF ( lsame( jobu, 'F' ) ) n1 = m
975  END IF
976 *
977 * Set numerical parameters
978 *
979 *! NOTE: Make sure DLAMCH() does not fail on the target architecture.
980 *
981  epsln = dlamch('Epsilon')
982  sfmin = dlamch('SafeMinimum')
983  small = sfmin / epsln
984  big = dlamch('O')
985 * BIG = ONE / SFMIN
986 *
987 * Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
988 *
989 *(!) If necessary, scale SVA() to protect the largest norm from
990 * overflow. It is possible that this scaling pushes the smallest
991 * column norm left from the underflow threshold (extreme case).
992 *
993  scalem = one / sqrt(dble(m)*dble(n))
994  noscal = .true.
995  goscal = .true.
996  DO 1874 p = 1, n
997  aapp = zero
998  aaqq = one
999  CALL zlassq( m, a(1,p), 1, aapp, aaqq )
1000  IF ( aapp .GT. big ) then
1001  info = - 9
1002  CALL xerbla( 'ZGEJSV', -info )
1003  return
1004  END IF
1005  aaqq = sqrt(aaqq)
1006  IF ( ( aapp .LT. (big / aaqq) ) .AND. noscal ) then
1007  sva(p) = aapp * aaqq
1008  else
1009  noscal = .false.
1010  sva(p) = aapp * ( aaqq * scalem )
1011  IF ( goscal ) then
1012  goscal = .false.
1013  CALL dscal( p-1, scalem, sva, 1 )
1014  END IF
1015  END IF
1016  1874 continue
1017 *
1018  IF ( noscal ) scalem = one
1019 *
1020  aapp = zero
1021  aaqq = big
1022  DO 4781 p = 1, n
1023  aapp = max( aapp, sva(p) )
1024  IF ( sva(p) .NE. zero ) aaqq = min( aaqq, sva(p) )
1025  4781 continue
1026 *
1027 * Quick return for zero M x N matrix
1028 * #:)
1029  IF ( aapp .EQ. zero ) then
1030  IF ( lsvec ) CALL zlaset( 'G', m, n1, czero, cone, u, ldu )
1031  IF ( rsvec ) CALL zlaset( 'G', n, n, czero, cone, v, ldv )
1032  rwork(1) = one
1033  rwork(2) = one
1034  IF ( errest ) rwork(3) = one
1035  IF ( lsvec .AND. rsvec ) then
1036  rwork(4) = one
1037  rwork(5) = one
1038  END IF
1039  IF ( l2tran ) then
1040  rwork(6) = zero
1041  rwork(7) = zero
1042  END IF
1043  iwork(1) = 0
1044  iwork(2) = 0
1045  iwork(3) = 0
1046  iwork(4) = -1
1047  return
1048  END IF
1049 *
1050 * Issue warning if denormalized column norms detected. Override the
1051 * high relative accuracy request. Issue licence to kill nonzero columns
1052 * (set them to zero) whose norm is less than sigma_max / BIG (roughly).
1053 * #:(
1054  warning = 0
1055  IF ( aaqq .LE. sfmin ) then
1056  l2rank = .true.
1057  l2kill = .true.
1058  warning = 1
1059  END IF
1060 *
1061 * Quick return for one-column matrix
1062 * #:)
1063  IF ( n .EQ. 1 ) then
1064 *
1065  IF ( lsvec ) then
1066  CALL zlascl( 'G',0,0,sva(1),scalem, m,1,a(1,1),lda,ierr )
1067  CALL zlacpy( 'A', m, 1, a, lda, u, ldu )
1068 * computing all M left singular vectors of the M x 1 matrix
1069  IF ( n1 .NE. n ) then
1070  CALL zgeqrf( m, n, u,ldu, cwork, cwork(n+1),lwork-n,ierr )
1071  CALL zungqr( m,n1,1, u,ldu,cwork,cwork(n+1),lwork-n,ierr )
1072  CALL zcopy( m, a(1,1), 1, u(1,1), 1 )
1073  END IF
1074  END IF
1075  IF ( rsvec ) then
1076  v(1,1) = cone
1077  END IF
1078  IF ( sva(1) .LT. (big*scalem) ) then
1079  sva(1) = sva(1) / scalem
1080  scalem = one
1081  END IF
1082  rwork(1) = one / scalem
1083  rwork(2) = one
1084  IF ( sva(1) .NE. zero ) then
1085  iwork(1) = 1
1086  IF ( ( sva(1) / scalem) .GE. sfmin ) then
1087  iwork(2) = 1
1088  else
1089  iwork(2) = 0
1090  END IF
1091  else
1092  iwork(1) = 0
1093  iwork(2) = 0
1094  END IF
1095  iwork(3) = 0
1096  iwork(4) = -1
1097  IF ( errest ) rwork(3) = one
1098  IF ( lsvec .AND. rsvec ) then
1099  rwork(4) = one
1100  rwork(5) = one
1101  END IF
1102  IF ( l2tran ) then
1103  rwork(6) = zero
1104  rwork(7) = zero
1105  END IF
1106  return
1107 *
1108  END IF
1109 *
1110  transp = .false.
1111 *
1112  aatmax = -one
1113  aatmin = big
1114  IF ( rowpiv .OR. l2tran ) then
1115 *
1116 * Compute the row norms, needed to determine row pivoting sequence
1117 * (in the case of heavily row weighted A, row pivoting is strongly
1118 * advised) and to collect information needed to compare the
1119 * structures of A * A^* and A^* * A (in the case L2TRAN.EQ..TRUE.).
1120 *
1121  IF ( l2tran ) then
1122  DO 1950 p = 1, m
1123  xsc = zero
1124  temp1 = one
1125  CALL zlassq( n, a(p,1), lda, xsc, temp1 )
1126 * ZLASSQ gets both the ell_2 and the ell_infinity norm
1127 * in one pass through the vector
1128  rwork(m+p) = xsc * scalem
1129  rwork(p) = xsc * (scalem*sqrt(temp1))
1130  aatmax = max( aatmax, rwork(p) )
1131  IF (rwork(p) .NE. zero)
1132  $ aatmin = min(aatmin,rwork(p))
1133  1950 continue
1134  else
1135  DO 1904 p = 1, m
1136  rwork(m+p) = scalem*abs( a(p,izamax(n,a(p,1),lda)) )
1137  aatmax = max( aatmax, rwork(m+p) )
1138  aatmin = min( aatmin, rwork(m+p) )
1139  1904 continue
1140  END IF
1141 *
1142  END IF
1143 *
1144 * For square matrix A try to determine whether A^* would be better
1145 * input for the preconditioned Jacobi SVD, with faster convergence.
1146 * The decision is based on an O(N) function of the vector of column
1147 * and row norms of A, based on the Shannon entropy. This should give
1148 * the right choice in most cases when the difference actually matters.
1149 * It may fail and pick the slower converging side.
1150 *
1151  entra = zero
1152  entrat = zero
1153  IF ( l2tran ) then
1154 *
1155  xsc = zero
1156  temp1 = one
1157  CALL dlassq( n, sva, 1, xsc, temp1 )
1158  temp1 = one / temp1
1159 *
1160  entra = zero
1161  DO 1113 p = 1, n
1162  big1 = ( ( sva(p) / xsc )**2 ) * temp1
1163  IF ( big1 .NE. zero ) entra = entra + big1 * dlog(big1)
1164  1113 continue
1165  entra = - entra / dlog(dble(n))
1166 *
1167 * Now, SVA().^2/Trace(A^* * A) is a point in the probability simplex.
1168 * It is derived from the diagonal of A^* * A. Do the same with the
1169 * diagonal of A * A^*, compute the entropy of the corresponding
1170 * probability distribution. Note that A * A^* and A^* * A have the
1171 * same trace.
1172 *
1173  entrat = zero
1174  DO 1114 p = 1, m
1175  big1 = ( ( rwork(p) / xsc )**2 ) * temp1
1176  IF ( big1 .NE. zero ) entrat = entrat + big1 * dlog(big1)
1177  1114 continue
1178  entrat = - entrat / dlog(dble(m))
1179 *
1180 * Analyze the entropies and decide A or A^*. Smaller entropy
1181 * usually means better input for the algorithm.
1182 *
1183  transp = ( entrat .LT. entra )
1184 *
1185 * If A^* is better than A, take the adjoint of A. This is allowed
1186 * only for square matrices, M=N.
1187  IF ( transp ) then
1188 * In an optimal implementation, this trivial transpose
1189 * should be replaced with faster transpose.
1190  DO 1115 p = 1, n - 1
1191  a(p,p) = conjg(a(p,p))
1192  DO 1116 q = p + 1, n
1193  ctemp = conjg(a(q,p))
1194  a(q,p) = conjg(a(p,q))
1195  a(p,q) = ctemp
1196  1116 continue
1197  1115 continue
1198  a(n,n) = conjg(a(n,n))
1199  DO 1117 p = 1, n
1200  rwork(m+p) = sva(p)
1201  sva(p) = rwork(p)
1202 * previously computed row 2-norms are now column 2-norms
1203 * of the transposed matrix
1204  1117 continue
1205  temp1 = aapp
1206  aapp = aatmax
1207  aatmax = temp1
1208  temp1 = aaqq
1209  aaqq = aatmin
1210  aatmin = temp1
1211  kill = lsvec
1212  lsvec = rsvec
1213  rsvec = kill
1214  IF ( lsvec ) n1 = n
1215 *
1216  rowpiv = .true.
1217  END IF
1218 *
1219  END IF
1220 * END IF L2TRAN
1221 *
1222 * Scale the matrix so that its maximal singular value remains less
1223 * than SQRT(BIG) -- the matrix is scaled so that its maximal column
1224 * has Euclidean norm equal to SQRT(BIG/N). The only reason to keep
1225 * SQRT(BIG) instead of BIG is the fact that ZGEJSV uses LAPACK and
1226 * BLAS routines that, in some implementations, are not capable of
1227 * working in the full interval [SFMIN,BIG] and that they may provoke
1228 * overflows in the intermediate results. If the singular values spread
1229 * from SFMIN to BIG, then ZGESVJ will compute them. So, in that case,
1230 * one should use ZGESVJ instead of ZGEJSV.
1231 * >> change in the April 2016 update: allow bigger range, i.e. the
1232 * largest column is allowed up to BIG/N and ZGESVJ will do the rest.
1233  big1 = sqrt( big )
1234  temp1 = sqrt( big / dble(n) )
1235 * TEMP1 = BIG/DBLE(N)
1236 *
1237  CALL dlascl( 'G', 0, 0, aapp, temp1, n, 1, sva, n, ierr )
1238  IF ( aaqq .GT. (aapp * sfmin) ) then
1239  aaqq = ( aaqq / aapp ) * temp1
1240  else
1241  aaqq = ( aaqq * temp1 ) / aapp
1242  END IF
1243  temp1 = temp1 * scalem
1244  CALL zlascl( 'G', 0, 0, aapp, temp1, m, n, a, lda, ierr )
1245 *
1246 * To undo scaling at the end of this procedure, multiply the
1247 * computed singular values with USCAL2 / USCAL1.
1248 *
1249  uscal1 = temp1
1250  uscal2 = aapp
1251 *
1252  IF ( l2kill ) then
1253 * L2KILL enforces computation of nonzero singular values in
1254 * the restricted range of condition number of the initial A,
1255 * sigma_max(A) / sigma_min(A) approx. SQRT(BIG)/SQRT(SFMIN).
1256  xsc = sqrt( sfmin )
1257  else
1258  xsc = small
1259 *
1260 * Now, if the condition number of A is too big,
1261 * sigma_max(A) / sigma_min(A) .GT. SQRT(BIG/N) * EPSLN / SFMIN,
1262 * as a precaution measure, the full SVD is computed using ZGESVJ
1263 * with accumulated Jacobi rotations. This provides numerically
1264 * more robust computation, at the cost of slightly increased run
1265 * time. Depending on the concrete implementation of BLAS and LAPACK
1266 * (i.e. how they behave in presence of extreme ill-conditioning) the
1267 * implementor may decide to remove this switch.
1268  IF ( ( aaqq.LT.sqrt(sfmin) ) .AND. lsvec .AND. rsvec ) then
1269  jracc = .true.
1270  END IF
1271 *
1272  END IF
1273  IF ( aaqq .LT. xsc ) then
1274  DO 700 p = 1, n
1275  IF ( sva(p) .LT. xsc ) then
1276  CALL zlaset( 'A', m, 1, czero, czero, a(1,p), lda )
1277  sva(p) = zero
1278  END IF
1279  700 continue
1280  END IF
1281 *
1282 * Preconditioning using QR factorization with pivoting
1283 *
1284  IF ( rowpiv ) then
1285 * Optional row permutation (Bjoerck row pivoting):
1286 * A result by Cox and Higham shows that the Bjoerck's
1287 * row pivoting combined with standard column pivoting
1288 * has similar effect as Powell-Reid complete pivoting.
1289 * The ell-infinity norms of A are made nonincreasing.
1290  IF ( ( lsvec .AND. rsvec ) .AND. .NOT.( jracc ) ) THEN
1291  iwoff = 2*n
1292  else
1293  iwoff = n
1294  END IF
1295  DO 1952 p = 1, m - 1
1296  q = idamax( m-p+1, rwork(m+p), 1 ) + p - 1
1297  iwork(iwoff+p) = q
1298  IF ( p .NE. q ) then
1299  temp1 = rwork(m+p)
1300  rwork(m+p) = rwork(m+q)
1301  rwork(m+q) = temp1
1302  END IF
1303  1952 continue
1304  CALL zlaswp( n, a, lda, 1, m-1, iwork(iwoff+1), 1 )
1305  END IF
1306 *
1307 * End of the preparation phase (scaling, optional sorting and
1308 * transposing, optional flushing of small columns).
1309 *
1310 * Preconditioning
1311 *
1312 * If the full SVD is needed, the right singular vectors are computed
1313 * from a matrix equation, and for that we need theoretical analysis
1314 * of the Businger-Golub pivoting. So we use ZGEQP3 as the first RR QRF.
1315 * In all other cases the first RR QRF can be chosen by other criteria
1316 * (eg speed by replacing global with restricted window pivoting, such
1317 * as in xGEQPX from TOMS # 782). Good results will be obtained using
1318 * xGEQPX with properly (!) chosen numerical parameters.
1319 * Any improvement of ZGEQP3 improves overal performance of ZGEJSV.
1320 *
1321 * A * P1 = Q1 * [ R1^* 0]^*:
1322  DO 1963 p = 1, n
1323 * .. all columns are free columns
1324  iwork(p) = 0
1325  1963 continue
1326  CALL zgeqp3( m, n, a, lda, iwork, cwork, cwork(n+1), lwork-n,
1327  $ rwork, ierr )
1328 *
1329 * The upper triangular matrix R1 from the first QRF is inspected for
1330 * rank deficiency and possibilities for deflation, or possible
1331 * ill-conditioning. Depending on the user specified flag L2RANK,
1332 * the procedure explores possibilities to reduce the numerical
1333 * rank by inspecting the computed upper triangular factor. If
1334 * L2RANK or L2ABER are up, then ZGEJSV will compute the SVD of
1335 * A + dA, where ||dA|| <= f(M,N)*EPSLN.
1336 *
1337  nr = 1
1338  IF ( l2aber ) then
1339 * Standard absolute error bound suffices. All sigma_i with
1340 * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
1341 * agressive enforcement of lower numerical rank by introducing a
1342 * backward error of the order of N*EPSLN*||A||.
1343  temp1 = sqrt(dble(n))*epsln
1344  DO 3001 p = 2, n
1345  IF ( abs(a(p,p)) .GE. (temp1*abs(a(1,1))) ) then
1346  nr = nr + 1
1347  else
1348  GO TO 3002
1349  END IF
1350  3001 continue
1351  3002 continue
1352  ELSE IF ( l2rank ) then
1353 * .. similarly as above, only slightly more gentle (less agressive).
1354 * Sudden drop on the diagonal of R1 is used as the criterion for
1355 * close-to-rank-deficient.
1356  temp1 = sqrt(sfmin)
1357  DO 3401 p = 2, n
1358  IF ( ( abs(a(p,p)) .LT. (epsln*abs(a(p-1,p-1))) ) .OR.
1359  $ ( abs(a(p,p)) .LT. small ) .OR.
1360  $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3402
1361  nr = nr + 1
1362  3401 continue
1363  3402 continue
1364 *
1365  else
1366 * The goal is high relative accuracy. However, if the matrix
1367 * has high scaled condition number the relative accuracy is in
1368 * general not feasible. Later on, a condition number estimator
1369 * will be deployed to estimate the scaled condition number.
1370 * Here we just remove the underflowed part of the triangular
1371 * factor. This prevents the situation in which the code is
1372 * working hard to get the accuracy not warranted by the data.
1373  temp1 = sqrt(sfmin)
1374  DO 3301 p = 2, n
1375  IF ( ( abs(a(p,p)) .LT. small ) .OR.
1376  $ ( l2kill .AND. (abs(a(p,p)) .LT. temp1) ) ) GO TO 3302
1377  nr = nr + 1
1378  3301 continue
1379  3302 continue
1380 *
1381  END IF
1382 *
1383  almort = .false.
1384  IF ( nr .EQ. n ) then
1385  maxprj = one
1386  DO 3051 p = 2, n
1387  temp1 = abs(a(p,p)) / sva(iwork(p))
1388  maxprj = min( maxprj, temp1 )
1389  3051 continue
1390  IF ( maxprj**2 .GE. one - dble(n)*epsln ) almort = .true.
1391  END IF
1392 *
1393 *
1394  sconda = - one
1395  condr1 = - one
1396  condr2 = - one
1397 *
1398  IF ( errest ) then
1399  IF ( n .EQ. nr ) then
1400  IF ( rsvec ) then
1401 * .. V is available as workspace
1402  CALL zlacpy( 'U', n, n, a, lda, v, ldv )
1403  DO 3053 p = 1, n
1404  temp1 = sva(iwork(p))
1405  CALL zdscal( p, one/temp1, v(1,p), 1 )
1406  3053 continue
1407  IF ( lsvec )then
1408  CALL zpocon( 'U', n, v, ldv, one, temp1,
1409  $ cwork(n+1), rwork, ierr )
1410  else
1411  CALL zpocon( 'U', n, v, ldv, one, temp1,
1412  $ cwork, rwork, ierr )
1413  END IF
1414 *
1415  ELSE IF ( lsvec ) then
1416 * .. U is available as workspace
1417  CALL zlacpy( 'U', n, n, a, lda, u, ldu )
1418  DO 3054 p = 1, n
1419  temp1 = sva(iwork(p))
1420  CALL zdscal( p, one/temp1, u(1,p), 1 )
1421  3054 continue
1422  CALL zpocon( 'U', n, u, ldu, one, temp1,
1423  $ cwork(n+1), rwork, ierr )
1424  else
1425  CALL zlacpy( 'U', n, n, a, lda, cwork, n )
1426 *[] CALL ZLACPY( 'U', N, N, A, LDA, CWORK(N+1), N )
1427 * Change: here index shifted by N to the left, CWORK(1:N)
1428 * not needed for SIGMA only computation
1429  DO 3052 p = 1, n
1430  temp1 = sva(iwork(p))
1431 *[] CALL ZDSCAL( p, ONE/TEMP1, CWORK(N+(p-1)*N+1), 1 )
1432  CALL zdscal( p, one/temp1, cwork((p-1)*n+1), 1 )
1433  3052 continue
1434 * .. the columns of R are scaled to have unit Euclidean lengths.
1435 *[] CALL ZPOCON( 'U', N, CWORK(N+1), N, ONE, TEMP1,
1436 *[] $ CWORK(N+N*N+1), RWORK, IERR )
1437  CALL zpocon( 'U', n, cwork, n, one, temp1,
1438  $ cwork(n*n+1), rwork, ierr )
1439 *
1440  END IF
1441  IF ( temp1 .NE. zero ) THEN
1442  sconda = one / sqrt(temp1)
1443  else
1444  sconda = - one
1445  END IF
1446 * SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1).
1447 * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
1448  else
1449  sconda = - one
1450  END IF
1451  END IF
1452 *
1453  l2pert = l2pert .AND. ( abs( a(1,1)/a(nr,nr) ) .GT. sqrt(big1) )
1454 * If there is no violent scaling, artificial perturbation is not needed.
1455 *
1456 * Phase 3:
1457 *
1458  IF ( .NOT. ( rsvec .OR. lsvec ) ) then
1459 *
1460 * Singular Values only
1461 *
1462 * .. transpose A(1:NR,1:N)
1463  DO 1946 p = 1, min( n-1, nr )
1464  CALL zcopy( n-p, a(p,p+1), lda, a(p+1,p), 1 )
1465  CALL zlacgv( n-p+1, a(p,p), 1 )
1466  1946 continue
1467  IF ( nr .EQ. n ) a(n,n) = conjg(a(n,n))
1468 *
1469 * The following two DO-loops introduce small relative perturbation
1470 * into the strict upper triangle of the lower triangular matrix.
1471 * Small entries below the main diagonal are also changed.
1472 * This modification is useful if the computing environment does not
1473 * provide/allow FLUSH TO ZERO underflow, for it prevents many
1474 * annoying denormalized numbers in case of strongly scaled matrices.
1475 * The perturbation is structured so that it does not introduce any
1476 * new perturbation of the singular values, and it does not destroy
1477 * the job done by the preconditioner.
1478 * The licence for this perturbation is in the variable L2PERT, which
1479 * should be .FALSE. if FLUSH TO ZERO underflow is active.
1480 *
1481  IF ( .NOT. almort ) then
1482 *
1483  IF ( l2pert ) then
1484 * XSC = SQRT(SMALL)
1485  xsc = epsln / dble(n)
1486  DO 4947 q = 1, nr
1487  ctemp = dcmplx(xsc*abs(a(q,q)),zero)
1488  DO 4949 p = 1, n
1489  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1490  $ .OR. ( p .LT. q ) )
1491 * $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1492  $ a(p,q) = ctemp
1493  4949 continue
1494  4947 continue
1495  else
1496  CALL zlaset( 'U', nr-1,nr-1, czero,czero, a(1,2),lda )
1497  END IF
1498 *
1499 * .. second preconditioning using the QR factorization
1500 *
1501  CALL zgeqrf( n,nr, a,lda, cwork, cwork(n+1),lwork-n, ierr )
1502 *
1503 * .. and transpose upper to lower triangular
1504  DO 1948 p = 1, nr - 1
1505  CALL zcopy( nr-p, a(p,p+1), lda, a(p+1,p), 1 )
1506  CALL zlacgv( nr-p+1, a(p,p), 1 )
1507  1948 continue
1508 *
1509  END IF
1510 *
1511 * Row-cyclic Jacobi SVD algorithm with column pivoting
1512 *
1513 * .. again some perturbation (a "background noise") is added
1514 * to drown denormals
1515  IF ( l2pert ) then
1516 * XSC = SQRT(SMALL)
1517  xsc = epsln / dble(n)
1518  DO 1947 q = 1, nr
1519  ctemp = dcmplx(xsc*abs(a(q,q)),zero)
1520  DO 1949 p = 1, nr
1521  IF ( ( (p.GT.q) .AND. (abs(a(p,q)).LE.temp1) )
1522  $ .OR. ( p .LT. q ) )
1523 * $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) )
1524  $ a(p,q) = ctemp
1525  1949 continue
1526  1947 continue
1527  else
1528  CALL zlaset( 'U', nr-1, nr-1, czero, czero, a(1,2), lda )
1529  END IF
1530 *
1531 * .. and one-sided Jacobi rotations are started on a lower
1532 * triangular matrix (plus perturbation which is ignored in
1533 * the part which destroys triangular form (confusing?!))
1534 *
1535  CALL zgesvj( 'L', 'N', 'N', nr, nr, a, lda, sva,
1536  $ n, v, ldv, cwork, lwork, rwork, lrwork, info )
1537 *
1538  scalem = rwork(1)
1539  numrank = nint(rwork(2))
1540 *
1541 *
1542  ELSE IF ( ( rsvec .AND. ( .NOT. lsvec ) .AND. ( .NOT. jracc ) )
1543  $ .OR.
1544  $ ( jracc .AND. ( .NOT. lsvec ) .AND. ( nr .NE. n ) ) ) then
1545 *
1546 * -> Singular Values and Right Singular Vectors <-
1547 *
1548  IF ( almort ) then
1549 *
1550 * .. in this case NR equals N
1551  DO 1998 p = 1, nr
1552  CALL zcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1553  CALL zlacgv( n-p+1, v(p,p), 1 )
1554  1998 continue
1555  CALL zlaset( 'U', nr-1,nr-1, czero, czero, v(1,2), ldv )
1556 *
1557  CALL zgesvj( 'L','U','N', n, nr, v, ldv, sva, nr, a, lda,
1558  $ cwork, lwork, rwork, lrwork, info )
1559  scalem = rwork(1)
1560  numrank = nint(rwork(2))
1561 
1562  else
1563 *
1564 * .. two more QR factorizations ( one QRF is not enough, two require
1565 * accumulated product of Jacobi rotations, three are perfect )
1566 *
1567  CALL zlaset( 'L', nr-1,nr-1, czero, czero, a(2,1), lda )
1568  CALL zgelqf( nr,n, a, lda, cwork, cwork(n+1), lwork-n, ierr)
1569  CALL zlacpy( 'L', nr, nr, a, lda, v, ldv )
1570  CALL zlaset( 'U', nr-1,nr-1, czero, czero, v(1,2), ldv )
1571  CALL zgeqrf( nr, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1572  $ lwork-2*n, ierr )
1573  DO 8998 p = 1, nr
1574  CALL zcopy( nr-p+1, v(p,p), ldv, v(p,p), 1 )
1575  CALL zlacgv( nr-p+1, v(p,p), 1 )
1576  8998 continue
1577  CALL zlaset('U', nr-1, nr-1, czero, czero, v(1,2), ldv)
1578 *
1579  CALL zgesvj( 'L', 'U','N', nr, nr, v,ldv, sva, nr, u,
1580  $ ldu, cwork(n+1), lwork-n, rwork, lrwork, info )
1581  scalem = rwork(1)
1582  numrank = nint(rwork(2))
1583  IF ( nr .LT. n ) then
1584  CALL zlaset( 'A',n-nr, nr, czero,czero, v(nr+1,1), ldv )
1585  CALL zlaset( 'A',nr, n-nr, czero,czero, v(1,nr+1), ldv )
1586  CALL zlaset( 'A',n-nr,n-nr,czero,cone, v(nr+1,nr+1),ldv )
1587  END IF
1588 *
1589  CALL zunmlq( 'L', 'C', n, n, nr, a, lda, cwork,
1590  $ v, ldv, cwork(n+1), lwork-n, ierr )
1591 *
1592  END IF
1593 * .. permute the rows of V
1594 * DO 8991 p = 1, N
1595 * CALL ZCOPY( N, V(p,1), LDV, A(IWORK(p),1), LDA )
1596 * 8991 CONTINUE
1597 * CALL ZLACPY( 'All', N, N, A, LDA, V, LDV )
1598  CALL zlapmr( .false., n, n, v, ldv, iwork )
1599 *
1600  IF ( transp ) then
1601  CALL zlacpy( 'A', n, n, v, ldv, u, ldu )
1602  END IF
1603 *
1604  ELSE IF ( jracc .AND. (.NOT. lsvec) .AND. ( nr.EQ. n ) ) THEN
1605 *
1606  CALL zlaset( 'L', n-1,n-1, czero, czero, a(2,1), lda )
1607 *
1608  CALL zgesvj( 'U','N','V', n, n, a, lda, sva, n, v, ldv,
1609  $ cwork, lwork, rwork, lrwork, info )
1610  scalem = rwork(1)
1611  numrank = nint(rwork(2))
1612  CALL zlapmr( .false., n, n, v, ldv, iwork )
1613 *
1614  ELSE IF ( lsvec .AND. ( .NOT. rsvec ) ) then
1615 *
1616 * .. Singular Values and Left Singular Vectors ..
1617 *
1618 * .. second preconditioning step to avoid need to accumulate
1619 * Jacobi rotations in the Jacobi iterations.
1620  DO 1965 p = 1, nr
1621  CALL zcopy( n-p+1, a(p,p), lda, u(p,p), 1 )
1622  CALL zlacgv( n-p+1, u(p,p), 1 )
1623  1965 continue
1624  CALL zlaset( 'U', nr-1, nr-1, czero, czero, u(1,2), ldu )
1625 *
1626  CALL zgeqrf( n, nr, u, ldu, cwork(n+1), cwork(2*n+1),
1627  $ lwork-2*n, ierr )
1628 *
1629  DO 1967 p = 1, nr - 1
1630  CALL zcopy( nr-p, u(p,p+1), ldu, u(p+1,p), 1 )
1631  CALL zlacgv( n-p+1, u(p,p), 1 )
1632  1967 continue
1633  CALL zlaset( 'U', nr-1, nr-1, czero, czero, u(1,2), ldu )
1634 *
1635  CALL zgesvj( 'L', 'U', 'N', nr,nr, u, ldu, sva, nr, a,
1636  $ lda, cwork(n+1), lwork-n, rwork, lrwork, info )
1637  scalem = rwork(1)
1638  numrank = nint(rwork(2))
1639 *
1640  IF ( nr .LT. m ) then
1641  CALL zlaset( 'A', m-nr, nr,czero, czero, u(nr+1,1), ldu )
1642  IF ( nr .LT. n1 ) then
1643  CALL zlaset( 'A',nr, n1-nr, czero, czero, u(1,nr+1),ldu )
1644  CALL zlaset( 'A',m-nr,n1-nr,czero,cone,u(nr+1,nr+1),ldu )
1645  END IF
1646  END IF
1647 *
1648  CALL zunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
1649  $ ldu, cwork(n+1), lwork-n, ierr )
1650 *
1651  IF ( rowpiv )
1652  $ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
1653 *
1654  DO 1974 p = 1, n1
1655  xsc = one / dznrm2( m, u(1,p), 1 )
1656  CALL zdscal( m, xsc, u(1,p), 1 )
1657  1974 continue
1658 *
1659  IF ( transp ) then
1660  CALL zlacpy( 'A', n, n, u, ldu, v, ldv )
1661  END IF
1662 *
1663  else
1664 *
1665 * .. Full SVD ..
1666 *
1667  IF ( .NOT. jracc ) then
1668 *
1669  IF ( .NOT. almort ) then
1670 *
1671 * Second Preconditioning Step (QRF [with pivoting])
1672 * Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1673 * equivalent to an LQF CALL. Since in many libraries the QRF
1674 * seems to be better optimized than the LQF, we do explicit
1675 * transpose and use the QRF. This is subject to changes in an
1676 * optimized implementation of ZGEJSV.
1677 *
1678  DO 1968 p = 1, nr
1679  CALL zcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
1680  CALL zlacgv( n-p+1, v(p,p), 1 )
1681  1968 continue
1682 *
1683 * .. the following two loops perturb small entries to avoid
1684 * denormals in the second QR factorization, where they are
1685 * as good as zeros. This is done to avoid painfully slow
1686 * computation with denormals. The relative size of the perturbation
1687 * is a parameter that can be changed by the implementer.
1688 * This perturbation device will be obsolete on machines with
1689 * properly implemented arithmetic.
1690 * To switch it off, set L2PERT=.FALSE. To remove it from the
1691 * code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1692 * The following two loops should be blocked and fused with the
1693 * transposed copy above.
1694 *
1695  IF ( l2pert ) then
1696  xsc = sqrt(small)
1697  DO 2969 q = 1, nr
1698  ctemp = dcmplx(xsc*abs( v(q,q) ),zero)
1699  DO 2968 p = 1, n
1700  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
1701  $ .OR. ( p .LT. q ) )
1702 * $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
1703  $ v(p,q) = ctemp
1704  IF ( p .LT. q ) v(p,q) = - v(p,q)
1705  2968 continue
1706  2969 continue
1707  else
1708  CALL zlaset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
1709  END IF
1710 *
1711 * Estimate the row scaled condition number of R1
1712 * (If R1 is rectangular, N > NR, then the condition number
1713 * of the leading NR x NR submatrix is estimated.)
1714 *
1715  CALL zlacpy( 'L', nr, nr, v, ldv, cwork(2*n+1), nr )
1716  DO 3950 p = 1, nr
1717  temp1 = dznrm2(nr-p+1,cwork(2*n+(p-1)*nr+p),1)
1718  CALL zdscal(nr-p+1,one/temp1,cwork(2*n+(p-1)*nr+p),1)
1719  3950 continue
1720  CALL zpocon('L',nr,cwork(2*n+1),nr,one,temp1,
1721  $ cwork(2*n+nr*nr+1),rwork,ierr)
1722  condr1 = one / sqrt(temp1)
1723 * .. here need a second oppinion on the condition number
1724 * .. then assume worst case scenario
1725 * R1 is OK for inverse <=> CONDR1 .LT. DBLE(N)
1726 * more conservative <=> CONDR1 .LT. SQRT(DBLE(N))
1727 *
1728  cond_ok = sqrt(sqrt(dble(nr)))
1729 *[TP] COND_OK is a tuning parameter.
1730 *
1731  IF ( condr1 .LT. cond_ok ) then
1732 * .. the second QRF without pivoting. Note: in an optimized
1733 * implementation, this QRF should be implemented as the QRF
1734 * of a lower triangular matrix.
1735 * R1^* = Q2 * R2
1736  CALL zgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
1737  $ lwork-2*n, ierr )
1738 *
1739  IF ( l2pert ) then
1740  xsc = sqrt(small)/epsln
1741  DO 3959 p = 2, nr
1742  DO 3958 q = 1, p - 1
1743  ctemp=dcmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1744  $ zero)
1745  IF ( abs(v(q,p)) .LE. temp1 )
1746 * $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1747  $ v(q,p) = ctemp
1748  3958 continue
1749  3959 continue
1750  END IF
1751 *
1752  IF ( nr .NE. n )
1753  $ CALL zlacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1754 * .. save ...
1755 *
1756 * .. this transposed copy should be better than naive
1757  DO 1969 p = 1, nr - 1
1758  CALL zcopy( nr-p, v(p,p+1), ldv, v(p+1,p), 1 )
1759  CALL zlacgv(nr-p+1, v(p,p), 1 )
1760  1969 continue
1761  v(nr,nr)=conjg(v(nr,nr))
1762 *
1763  condr2 = condr1
1764 *
1765  else
1766 *
1767 * .. ill-conditioned case: second QRF with pivoting
1768 * Note that windowed pivoting would be equaly good
1769 * numerically, and more run-time efficient. So, in
1770 * an optimal implementation, the next call to ZGEQP3
1771 * should be replaced with eg. CALL ZGEQPX (ACM TOMS #782)
1772 * with properly (carefully) chosen parameters.
1773 *
1774 * R1^* * P2 = Q2 * R2
1775  DO 3003 p = 1, nr
1776  iwork(n+p) = 0
1777  3003 continue
1778  CALL zgeqp3( n, nr, v, ldv, iwork(n+1), cwork(n+1),
1779  $ cwork(2*n+1), lwork-2*n, rwork, ierr )
1780 ** CALL ZGEQRF( N, NR, V, LDV, CWORK(N+1), CWORK(2*N+1),
1781 ** $ LWORK-2*N, IERR )
1782  IF ( l2pert ) then
1783  xsc = sqrt(small)
1784  DO 3969 p = 2, nr
1785  DO 3968 q = 1, p - 1
1786  ctemp=dcmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1787  $ zero)
1788  IF ( abs(v(q,p)) .LE. temp1 )
1789 * $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) )
1790  $ v(q,p) = ctemp
1791  3968 continue
1792  3969 continue
1793  END IF
1794 *
1795  CALL zlacpy( 'A', n, nr, v, ldv, cwork(2*n+1), n )
1796 *
1797  IF ( l2pert ) then
1798  xsc = sqrt(small)
1799  DO 8970 p = 2, nr
1800  DO 8971 q = 1, p - 1
1801  ctemp=dcmplx(xsc*min(abs(v(p,p)),abs(v(q,q))),
1802  $ zero)
1803 * V(p,q) = - TEMP1*( V(q,p) / ABS(V(q,p)) )
1804  v(p,q) = - ctemp
1805  8971 continue
1806  8970 continue
1807  else
1808  CALL zlaset( 'L',nr-1,nr-1,czero,czero,v(2,1),ldv )
1809  END IF
1810 * Now, compute R2 = L3 * Q3, the LQ factorization.
1811  CALL zgelqf( nr, nr, v, ldv, cwork(2*n+n*nr+1),
1812  $ cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr, ierr )
1813 * .. and estimate the condition number
1814  CALL zlacpy( 'L',nr,nr,v,ldv,cwork(2*n+n*nr+nr+1),nr )
1815  DO 4950 p = 1, nr
1816  temp1 = dznrm2( p, cwork(2*n+n*nr+nr+p), nr )
1817  CALL zdscal( p, one/temp1, cwork(2*n+n*nr+nr+p), nr )
1818  4950 continue
1819  CALL zpocon( 'L',nr,cwork(2*n+n*nr+nr+1),nr,one,temp1,
1820  $ cwork(2*n+n*nr+nr+nr*nr+1),rwork,ierr )
1821  condr2 = one / sqrt(temp1)
1822 *
1823 *
1824  IF ( condr2 .GE. cond_ok ) then
1825 * .. save the Householder vectors used for Q3
1826 * (this overwrittes the copy of R2, as it will not be
1827 * needed in this branch, but it does not overwritte the
1828 * Huseholder vectors of Q2.).
1829  CALL zlacpy( 'U', nr, nr, v, ldv, cwork(2*n+1), n )
1830 * .. and the rest of the information on Q3 is in
1831 * WORK(2*N+N*NR+1:2*N+N*NR+N)
1832  END IF
1833 *
1834  END IF
1835 *
1836  IF ( l2pert ) then
1837  xsc = sqrt(small)
1838  DO 4968 q = 2, nr
1839  ctemp = xsc * v(q,q)
1840  DO 4969 p = 1, q - 1
1841 * V(p,q) = - TEMP1*( V(p,q) / ABS(V(p,q)) )
1842  v(p,q) = - ctemp
1843  4969 continue
1844  4968 continue
1845  else
1846  CALL zlaset( 'U', nr-1,nr-1, czero,czero, v(1,2), ldv )
1847  END IF
1848 *
1849 * Second preconditioning finished; continue with Jacobi SVD
1850 * The input matrix is lower trinagular.
1851 *
1852 * Recover the right singular vectors as solution of a well
1853 * conditioned triangular matrix equation.
1854 *
1855  IF ( condr1 .LT. cond_ok ) then
1856 *
1857  CALL zgesvj( 'L','U','N',nr,nr,v,ldv,sva,nr,u, ldu,
1858  $ cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,rwork,
1859  $ lrwork, info )
1860  scalem = rwork(1)
1861  numrank = nint(rwork(2))
1862  DO 3970 p = 1, nr
1863  CALL zcopy( nr, v(1,p), 1, u(1,p), 1 )
1864  CALL zdscal( nr, sva(p), v(1,p), 1 )
1865  3970 continue
1866 
1867 * .. pick the right matrix equation and solve it
1868 *
1869  IF ( nr .EQ. n ) then
1870 * :)) .. best case, R1 is inverted. The solution of this matrix
1871 * equation is Q2*V2 = the product of the Jacobi rotations
1872 * used in ZGESVJ, premultiplied with the orthogonal matrix
1873 * from the second QR factorization.
1874  CALL ztrsm('L','U','N','N', nr,nr,cone, a,lda, v,ldv)
1875  else
1876 * .. R1 is well conditioned, but non-square. Adjoint of R2
1877 * is inverted to get the product of the Jacobi rotations
1878 * used in ZGESVJ. The Q-factor from the second QR
1879 * factorization is then built in explicitly.
1880  CALL ztrsm('L','U','C','N',nr,nr,cone,cwork(2*n+1),
1881  $ n,v,ldv)
1882  IF ( nr .LT. n ) then
1883  CALL zlaset('A',n-nr,nr,czero,czero,v(nr+1,1),ldv)
1884  CALL zlaset('A',nr,n-nr,czero,czero,v(1,nr+1),ldv)
1885  CALL zlaset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1886  END IF
1887  CALL zunmqr('L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1888  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr)
1889  END IF
1890 *
1891  ELSE IF ( condr2 .LT. cond_ok ) then
1892 *
1893 * The matrix R2 is inverted. The solution of the matrix equation
1894 * is Q3^* * V3 = the product of the Jacobi rotations (appplied to
1895 * the lower triangular L3 from the LQ factorization of
1896 * R2=L3*Q3), pre-multiplied with the transposed Q3.
1897  CALL zgesvj( 'L', 'U', 'N', nr, nr, v, ldv, sva, nr, u,
1898  $ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1899  $ rwork, lrwork, info )
1900  scalem = rwork(1)
1901  numrank = nint(rwork(2))
1902  DO 3870 p = 1, nr
1903  CALL zcopy( nr, v(1,p), 1, u(1,p), 1 )
1904  CALL zdscal( nr, sva(p), u(1,p), 1 )
1905  3870 continue
1906  CALL ztrsm('L','U','N','N',nr,nr,cone,cwork(2*n+1),n,
1907  $ u,ldu)
1908 * .. apply the permutation from the second QR factorization
1909  DO 873 q = 1, nr
1910  DO 872 p = 1, nr
1911  cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1912  872 continue
1913  DO 874 p = 1, nr
1914  u(p,q) = cwork(2*n+n*nr+nr+p)
1915  874 continue
1916  873 continue
1917  IF ( nr .LT. n ) then
1918  CALL zlaset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1919  CALL zlaset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1920  CALL zlaset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1921  END IF
1922  CALL zunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1923  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1924  else
1925 * Last line of defense.
1926 * #:( This is a rather pathological case: no scaled condition
1927 * improvement after two pivoted QR factorizations. Other
1928 * possibility is that the rank revealing QR factorization
1929 * or the condition estimator has failed, or the COND_OK
1930 * is set very close to ONE (which is unnecessary). Normally,
1931 * this branch should never be executed, but in rare cases of
1932 * failure of the RRQR or condition estimator, the last line of
1933 * defense ensures that ZGEJSV completes the task.
1934 * Compute the full SVD of L3 using ZGESVJ with explicit
1935 * accumulation of Jacobi rotations.
1936  CALL zgesvj( 'L', 'U', 'V', nr, nr, v, ldv, sva, nr, u,
1937  $ ldu, cwork(2*n+n*nr+nr+1), lwork-2*n-n*nr-nr,
1938  $ rwork, lrwork, info )
1939  scalem = rwork(1)
1940  numrank = nint(rwork(2))
1941  IF ( nr .LT. n ) then
1942  CALL zlaset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
1943  CALL zlaset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
1944  CALL zlaset('A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv)
1945  END IF
1946  CALL zunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
1947  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
1948 *
1949  CALL zunmlq( 'L', 'C', nr, nr, nr, cwork(2*n+1), n,
1950  $ cwork(2*n+n*nr+1), u, ldu, cwork(2*n+n*nr+nr+1),
1951  $ lwork-2*n-n*nr-nr, ierr )
1952  DO 773 q = 1, nr
1953  DO 772 p = 1, nr
1954  cwork(2*n+n*nr+nr+iwork(n+p)) = u(p,q)
1955  772 continue
1956  DO 774 p = 1, nr
1957  u(p,q) = cwork(2*n+n*nr+nr+p)
1958  774 continue
1959  773 continue
1960 *
1961  END IF
1962 *
1963 * Permute the rows of V using the (column) permutation from the
1964 * first QRF. Also, scale the columns to make them unit in
1965 * Euclidean norm. This applies to all cases.
1966 *
1967  temp1 = sqrt(dble(n)) * epsln
1968  DO 1972 q = 1, n
1969  DO 972 p = 1, n
1970  cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
1971  972 continue
1972  DO 973 p = 1, n
1973  v(p,q) = cwork(2*n+n*nr+nr+p)
1974  973 continue
1975  xsc = one / dznrm2( n, v(1,q), 1 )
1976  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
1977  $ CALL zdscal( n, xsc, v(1,q), 1 )
1978  1972 continue
1979 * At this moment, V contains the right singular vectors of A.
1980 * Next, assemble the left singular vector matrix U (M x N).
1981  IF ( nr .LT. m ) then
1982  CALL zlaset('A', m-nr, nr, czero, czero, u(nr+1,1), ldu)
1983  IF ( nr .LT. n1 ) then
1984  CALL zlaset('A',nr,n1-nr,czero,czero,u(1,nr+1),ldu)
1985  CALL zlaset('A',m-nr,n1-nr,czero,cone,
1986  $ u(nr+1,nr+1),ldu)
1987  END IF
1988  END IF
1989 *
1990 * The Q matrix from the first QRF is built into the left singular
1991 * matrix U. This applies to all cases.
1992 *
1993  CALL zunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
1994  $ ldu, cwork(n+1), lwork-n, ierr )
1995 
1996 * The columns of U are normalized. The cost is O(M*N) flops.
1997  temp1 = sqrt(dble(m)) * epsln
1998  DO 1973 p = 1, nr
1999  xsc = one / dznrm2( m, u(1,p), 1 )
2000  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2001  $ CALL zdscal( m, xsc, u(1,p), 1 )
2002  1973 continue
2003 *
2004 * If the initial QRF is computed with row pivoting, the left
2005 * singular vectors must be adjusted.
2006 *
2007  IF ( rowpiv )
2008  $ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2009 *
2010  else
2011 *
2012 * .. the initial matrix A has almost orthogonal columns and
2013 * the second QRF is not needed
2014 *
2015  CALL zlacpy( 'U', n, n, a, lda, cwork(n+1), n )
2016  IF ( l2pert ) then
2017  xsc = sqrt(small)
2018  DO 5970 p = 2, n
2019  ctemp = xsc * cwork( n + (p-1)*n + p )
2020  DO 5971 q = 1, p - 1
2021 * CWORK(N+(q-1)*N+p)=-TEMP1 * ( CWORK(N+(p-1)*N+q) /
2022 * $ ABS(CWORK(N+(p-1)*N+q)) )
2023  cwork(n+(q-1)*n+p)=-ctemp
2024  5971 continue
2025  5970 continue
2026  else
2027  CALL zlaset( 'L',n-1,n-1,czero,czero,cwork(n+2),n )
2028  END IF
2029 *
2030  CALL zgesvj( 'U', 'U', 'N', n, n, cwork(n+1), n, sva,
2031  $ n, u, ldu, cwork(n+n*n+1), lwork-n-n*n, rwork, lrwork,
2032  $ info )
2033 *
2034  scalem = rwork(1)
2035  numrank = nint(rwork(2))
2036  DO 6970 p = 1, n
2037  CALL zcopy( n, cwork(n+(p-1)*n+1), 1, u(1,p), 1 )
2038  CALL zdscal( n, sva(p), cwork(n+(p-1)*n+1), 1 )
2039  6970 continue
2040 *
2041  CALL ztrsm( 'L', 'U', 'N', 'N', n, n,
2042  $ cone, a, lda, cwork(n+1), n )
2043  DO 6972 p = 1, n
2044  CALL zcopy( n, cwork(n+p), n, v(iwork(p),1), ldv )
2045  6972 continue
2046  temp1 = sqrt(dble(n))*epsln
2047  DO 6971 p = 1, n
2048  xsc = one / dznrm2( n, v(1,p), 1 )
2049  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2050  $ CALL zdscal( n, xsc, v(1,p), 1 )
2051  6971 continue
2052 *
2053 * Assemble the left singular vector matrix U (M x N).
2054 *
2055  IF ( n .LT. m ) then
2056  CALL zlaset( 'A', m-n, n, czero, czero, u(n+1,1), ldu )
2057  IF ( n .LT. n1 ) then
2058  CALL zlaset('A',n, n1-n, czero, czero, u(1,n+1),ldu)
2059  CALL zlaset( 'A',m-n,n1-n, czero, cone,u(n+1,n+1),ldu)
2060  END IF
2061  END IF
2062  CALL zunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
2063  $ ldu, cwork(n+1), lwork-n, ierr )
2064  temp1 = sqrt(dble(m))*epsln
2065  DO 6973 p = 1, n1
2066  xsc = one / dznrm2( m, u(1,p), 1 )
2067  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2068  $ CALL zdscal( m, xsc, u(1,p), 1 )
2069  6973 continue
2070 *
2071  IF ( rowpiv )
2072  $ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2073 *
2074  END IF
2075 *
2076 * end of the >> almost orthogonal case << in the full SVD
2077 *
2078  else
2079 *
2080 * This branch deploys a preconditioned Jacobi SVD with explicitly
2081 * accumulated rotations. It is included as optional, mainly for
2082 * experimental purposes. It does perfom well, and can also be used.
2083 * In this implementation, this branch will be automatically activated
2084 * if the condition number sigma_max(A) / sigma_min(A) is predicted
2085 * to be greater than the overflow threshold. This is because the
2086 * a posteriori computation of the singular vectors assumes robust
2087 * implementation of BLAS and some LAPACK procedures, capable of working
2088 * in presence of extreme values, e.g. when the singular values spread from
2089 * the underflow to the overflow threshold.
2090 *
2091  DO 7968 p = 1, nr
2092  CALL zcopy( n-p+1, a(p,p), lda, v(p,p), 1 )
2093  CALL zlacgv( n-p+1, v(p,p), 1 )
2094  7968 continue
2095 *
2096  IF ( l2pert ) then
2097  xsc = sqrt(small/epsln)
2098  DO 5969 q = 1, nr
2099  ctemp = dcmplx(xsc*abs( v(q,q) ),zero)
2100  DO 5968 p = 1, n
2101  IF ( ( p .GT. q ) .AND. ( abs(v(p,q)) .LE. temp1 )
2102  $ .OR. ( p .LT. q ) )
2103 * $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) )
2104  $ v(p,q) = ctemp
2105  IF ( p .LT. q ) v(p,q) = - v(p,q)
2106  5968 continue
2107  5969 continue
2108  else
2109  CALL zlaset( 'U', nr-1, nr-1, czero, czero, v(1,2), ldv )
2110  END IF
2111 
2112  CALL zgeqrf( n, nr, v, ldv, cwork(n+1), cwork(2*n+1),
2113  $ lwork-2*n, ierr )
2114  CALL zlacpy( 'L', n, nr, v, ldv, cwork(2*n+1), n )
2115 *
2116  DO 7969 p = 1, nr
2117  CALL zcopy( nr-p+1, v(p,p), ldv, u(p,p), 1 )
2118  CALL zlacgv( nr-p+1, u(p,p), 1 )
2119  7969 continue
2120 
2121  IF ( l2pert ) then
2122  xsc = sqrt(small/epsln)
2123  DO 9970 q = 2, nr
2124  DO 9971 p = 1, q - 1
2125  ctemp = dcmplx(xsc * min(abs(u(p,p)),abs(u(q,q))),
2126  $ zero)
2127 * U(p,q) = - TEMP1 * ( U(q,p) / ABS(U(q,p)) )
2128  u(p,q) = - ctemp
2129  9971 continue
2130  9970 continue
2131  else
2132  CALL zlaset('U', nr-1, nr-1, czero, czero, u(1,2), ldu )
2133  END IF
2134 
2135  CALL zgesvj( 'L', 'U', 'V', nr, nr, u, ldu, sva,
2136  $ n, v, ldv, cwork(2*n+n*nr+1), lwork-2*n-n*nr,
2137  $ rwork, lrwork, info )
2138  scalem = rwork(1)
2139  numrank = nint(rwork(2))
2140 
2141  IF ( nr .LT. n ) then
2142  CALL zlaset( 'A',n-nr,nr,czero,czero,v(nr+1,1),ldv )
2143  CALL zlaset( 'A',nr,n-nr,czero,czero,v(1,nr+1),ldv )
2144  CALL zlaset( 'A',n-nr,n-nr,czero,cone,v(nr+1,nr+1),ldv )
2145  END IF
2146 
2147  CALL zunmqr( 'L','N',n,n,nr,cwork(2*n+1),n,cwork(n+1),
2148  $ v,ldv,cwork(2*n+n*nr+nr+1),lwork-2*n-n*nr-nr,ierr )
2149 *
2150 * Permute the rows of V using the (column) permutation from the
2151 * first QRF. Also, scale the columns to make them unit in
2152 * Euclidean norm. This applies to all cases.
2153 *
2154  temp1 = sqrt(dble(n)) * epsln
2155  DO 7972 q = 1, n
2156  DO 8972 p = 1, n
2157  cwork(2*n+n*nr+nr+iwork(p)) = v(p,q)
2158  8972 continue
2159  DO 8973 p = 1, n
2160  v(p,q) = cwork(2*n+n*nr+nr+p)
2161  8973 continue
2162  xsc = one / dznrm2( n, v(1,q), 1 )
2163  IF ( (xsc .LT. (one-temp1)) .OR. (xsc .GT. (one+temp1)) )
2164  $ CALL zdscal( n, xsc, v(1,q), 1 )
2165  7972 continue
2166 *
2167 * At this moment, V contains the right singular vectors of A.
2168 * Next, assemble the left singular vector matrix U (M x N).
2169 *
2170  IF ( nr .LT. m ) then
2171  CALL zlaset( 'A', m-nr, nr, czero, czero, u(nr+1,1), ldu )
2172  IF ( nr .LT. n1 ) then
2173  CALL zlaset('A',nr, n1-nr, czero, czero, u(1,nr+1),ldu)
2174  CALL zlaset('A',m-nr,n1-nr, czero, cone,u(nr+1,nr+1),ldu)
2175  END IF
2176  END IF
2177 *
2178  CALL zunmqr( 'L', 'N', m, n1, n, a, lda, cwork, u,
2179  $ ldu, cwork(n+1), lwork-n, ierr )
2180 *
2181  IF ( rowpiv )
2182  $ CALL zlaswp( n1, u, ldu, 1, m-1, iwork(iwoff+1), -1 )
2183 *
2184 *
2185  END IF
2186  IF ( transp ) then
2187 * .. swap U and V because the procedure worked on A^*
2188  DO 6974 p = 1, n
2189  CALL zswap( n, u(1,p), 1, v(1,p), 1 )
2190  6974 continue
2191  END IF
2192 *
2193  END IF
2194 * end of the full SVD
2195 *
2196 * Undo scaling, if necessary (and possible)
2197 *
2198  IF ( uscal2 .LE. (big/sva(1))*uscal1 ) then
2199  CALL dlascl( 'G', 0, 0, uscal1, uscal2, nr, 1, sva, n, ierr )
2200  uscal1 = one
2201  uscal2 = one
2202  END IF
2203 *
2204  IF ( nr .LT. n ) then
2205  DO 3004 p = nr+1, n
2206  sva(p) = zero
2207  3004 continue
2208  END IF
2209 *
2210  rwork(1) = uscal2 * scalem
2211  rwork(2) = uscal1
2212  IF ( errest ) rwork(3) = sconda
2213  IF ( lsvec .AND. rsvec ) then
2214  rwork(4) = condr1
2215  rwork(5) = condr2
2216  END IF
2217  IF ( l2tran ) then
2218  rwork(6) = entra
2219  rwork(7) = entrat
2220  END IF
2221 *
2222  iwork(1) = nr
2223  iwork(2) = numrank
2224  iwork(3) = warning
2225  IF ( transp ) then
2226  iwork(4) = 1
2227  else
2228  iwork(4) = -1
2229  END IF
2230 
2231 *
2232  return
2233 * ..
2234 * .. END OF ZGEJSV
2235 * ..
double precision function dlamch(CMACH)
DLAMCH
Definition: dlamch.f:65
integer function idamax(N, DX, INCX)
IDAMAX
Definition: idamax.f:53
subroutine zcopy(N, ZX, INCX, ZY, INCY)
ZCOPY
Definition: zcopy.f:52
subroutine zlaswp(N, A, LDA, K1, K2, IPIV, INCX)
ZLASWP performs a series of row interchanges on a general rectangular matrix.
Definition: zlaswp.f:116
subroutine dlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
DLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: dlascl.f:145
subroutine zunmlq(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
ZUNMLQ
Definition: zunmlq.f:169
subroutine zswap(N, ZX, INCX, ZY, INCY)
ZSWAP
Definition: zswap.f:52
subroutine zlascl(TYPE, KL, KU, CFROM, CTO, M, N, A, LDA, INFO)
ZLASCL multiplies a general rectangular matrix by a real scalar defined as cto/cfrom.
Definition: zlascl.f:145
subroutine zlacpy(UPLO, M, N, A, LDA, B, LDB)
ZLACPY copies all or part of one two-dimensional array to another.
Definition: zlacpy.f:105
subroutine zungqr(M, N, K, A, LDA, TAU, WORK, LWORK, INFO)
ZUNGQR
Definition: zungqr.f:130
subroutine zgeqrf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
ZGEQRF VARIANT: left-looking Level 3 BLAS of the algorithm.
Definition: zgeqrf.f:151
subroutine dlassq(N, X, INCX, SCALE, SUMSQ)
DLASSQ updates a sum of squares represented in scaled form.
Definition: dlassq.f:105
integer function izamax(N, ZX, INCX)
IZAMAX
Definition: izamax.f:53
double precision function dznrm2(N, X, INCX)
DZNRM2
Definition: dznrm2.f:56
subroutine zunmqr(SIDE, TRANS, M, N, K, A, LDA, TAU, C, LDC, WORK, LWORK, INFO)
ZUNMQR
Definition: zunmqr.f:169
subroutine zlassq(N, X, INCX, SCALE, SUMSQ)
ZLASSQ updates a sum of squares represented in scaled form.
Definition: zlassq.f:108
subroutine xerbla(SRNAME, INFO)
XERBLA
Definition: xerbla.f:62
subroutine zlaset(UPLO, M, N, ALPHA, BETA, A, LDA)
ZLASET initializes the off-diagonal elements and the diagonal elements of a matrix to given values...
Definition: zlaset.f:108
logical function lsame(CA, CB)
LSAME
Definition: lsame.f:55
subroutine dscal(N, DA, DX, INCX)
DSCAL
Definition: dscal.f:55
subroutine zgeqp3(M, N, A, LDA, JPVT, TAU, WORK, LWORK, RWORK, INFO)
ZGEQP3
Definition: zgeqp3.f:161
subroutine zlacgv(N, X, INCX)
ZLACGV conjugates a complex vector.
Definition: zlacgv.f:76
subroutine zlapmr(FORWRD, M, N, X, LDX, K)
ZLAPMR rearranges rows of a matrix as specified by a permutation vector.
Definition: zlapmr.f:106
subroutine zdscal(N, DA, ZX, INCX)
ZDSCAL
Definition: zdscal.f:54
subroutine zgelqf(M, N, A, LDA, TAU, WORK, LWORK, INFO)
ZGELQF
Definition: zgelqf.f:137
subroutine zpocon(UPLO, N, A, LDA, ANORM, RCOND, WORK, RWORK, INFO)
ZPOCON
Definition: zpocon.f:123
subroutine zgesvj(JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, LDV, CWORK, LWORK, RWORK, LRWORK, INFO)
ZGESVJ
Definition: zgesvj.f:351
subroutine ztrsm(SIDE, UPLO, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
ZTRSM
Definition: ztrsm.f:182

Here is the call graph for this function:

Here is the caller graph for this function: