5.2. Upload Internals

When a user executes the file upload procedure, the trace file from the compute nodes are copied to a server at SDSC. The user executes make -C /opt/app-trace/collect/Makefile sdsc and the Makefile has the contents:

TRACEID = $(shell date +%s.%N)

local:
        cluster-fork '/opt/app-trace/bin/collect.sh $(TRACEID) local'

sdsc:
        cluster-fork '/opt/app-trace/bin/collect.sh $(TRACEID) sdsc'

Note

The TRACEID is the unique value that will be used to create a directory on SDSC's server to hold all the trace files. It is important to note that this value is just the date (to the nanosecond) and doesn't contain any information about the cluster (e.g., there is no IP or MAC address information in TRACEID).

In the Makefile above, file upload is carried out by a script on the compute node called /opt/app-trace/bin/collect.sh. Note that the TRACEID is passed to the next stage in the --id parameter.

     1  #!/bin/bash
     2  #
     3  # $Id: internals.sgml,v 1.2 2005/11/30 21:48:40 bruno Exp $
     4  #
     5  # @Copyright@
     6  # 
     7  #                               Rocks
     8  #                        www.rocksclusters.org
     9  #                          version 4.1 (fuji)
    10  # 
    11  # Copyright (c) 2005 The Regents of the University of California. All
    12  # rights reserved.
    13  # 
    14  # Redistribution and use in source and binary forms, with or without
    15  # modification, are permitted provided that the following conditions are
    16  # met:
    17  # 
    18  # 1. Redistributions of source code must retain the above copyright
    19  # notice, this list of conditions and the following disclaimer.
    20  # 
    21  # 2. Redistributions in binary form must reproduce the above copyright
    22  # notice, this list of conditions and the following disclaimer in the
    23  # documentation and/or other materials provided with the distribution.
    24  # 
    25  # 3. All advertising materials mentioning features or use of this
    26  # software must display the following acknowledgement: 
    27  # 
    28  #       "This product includes software developed by the Rocks 
    29  #       Cluster Group at the San Diego Supercomputer Center and
    30  #       its contributors."
    31  # 
    32  # 4. Neither the name or logo of this software nor the names of its
    33  # authors may be used to endorse or promote products derived from this
    34  # software without specific prior written permission.  The name of the
    35  # software includes the following terms, and any derivatives thereof:
    36  # "Rocks", "Rocks Clusters", and "Avalanche Installer".
    37  # 
    38  # THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS''
    39  # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
    40  # THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
    41  # PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS
    42  # BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
    43  # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
    44  # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
    45  # BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
    46  # WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
    47  # OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
    48  # IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
    49  # 
    50  # @Copyright@
    51  #
    52  # $Log: internals.sgml,v $
    52  # Revision 1.2  2005/11/30 21:48:40  bruno
    52  # touch ups
    52  #
    53  # Revision 1.3  2005/11/30 01:41:35  bruno
    54  # a bit more
    55  #
    56  # Revision 1.2  2005/11/18 00:54:54  bruno
    57  # only upload app trace files
    58  #
    59  # Revision 1.1  2005/11/17 16:55:56  bruno
    60  # first rev
    61  #
    62  #
    63  #
    64
    65  . /opt/app-trace/conf/variables
    66
    67  echo "" > /proc/sys/debug/traced-apps
    68
    69  for i in `find $TRACEDIR -type f -name "*app-trace.out*"`
    70  do
    71          file $i | grep 'bzip2 compressed' > /dev/null 2>&1
    72          notcompressed=$?
    73
    74          if [ $notcompressed == 1 ]
    75          then
    76                  bzip2 --force $i
    77                  filename=$i.bz2
    78          else
    79                  filename=$i
    80          fi
    81
    82          if [ $2 == 'local' ]
    83          then
    84                  CHOST=$COLLECTHOSTLOCAL
    85          else
    86                  CHOST=$COLLECTHOSTSDSC
    87          fi
    88
    89          /opt/app-trace/bin/upload.py \
    90                  --upload-server=$CHOST --filename=$filename --id=$1
    91  done
    92

For each file in TRACEDIR (line 69), it compresses the file (if not already compressed), then calls /opt/app-trace/bin/upload.py which uses HTTPS POST to copy the compressed file to your frontend (when executing make -f /opt/app-trace/collect/Makefile local) or to the the SDSC server (when executing make -f /opt/app-trace/collect/Makefile sdsc).

Here is the contents of /opt/app-trace/bin/upload.py:

     1  #!/opt/rocks/usr/bin/python
     2  #
     3  # $Id: internals.sgml,v 1.2 2005/11/30 21:48:40 bruno Exp $
     4  #
     5  # @Copyright@
     6  # 
     7  #                               Rocks
     8  #                        www.rocksclusters.org
     9  #                          version 4.1 (fuji)
    10  # 
    11  # Copyright (c) 2005 The Regents of the University of California. All
    12  # rights reserved.
    13  # 
    14  # Redistribution and use in source and binary forms, with or without
    15  # modification, are permitted provided that the following conditions are
    16  # met:
    17  # 
    18  # 1. Redistributions of source code must retain the above copyright
    19  # notice, this list of conditions and the following disclaimer.
    20  # 
    21  # 2. Redistributions in binary form must reproduce the above copyright
    22  # notice, this list of conditions and the following disclaimer in the
    23  # documentation and/or other materials provided with the distribution.
    24  # 
    25  # 3. All advertising materials mentioning features or use of this
    26  # software must display the following acknowledgement: 
    27  # 
    28  #       "This product includes software developed by the Rocks 
    29  #       Cluster Group at the San Diego Supercomputer Center and
    30  #       its contributors."
    31  # 
    32  # 4. Neither the name or logo of this software nor the names of its
    33  # authors may be used to endorse or promote products derived from this
    34  # software without specific prior written permission.  The name of the
    35  # software includes the following terms, and any derivatives thereof:
    36  # "Rocks", "Rocks Clusters", and "Avalanche Installer".
    37  # 
    38  # THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS''
    39  # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
    40  # THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
    41  # PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS
    42  # BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
    43  # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
    44  # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
    45  # BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
    46  # WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
    47  # OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
    48  # IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
    49  # 
    50  # @Copyright@
    51  #
    52  # $Log: internals.sgml,v $
    52  # Revision 1.2  2005/11/30 21:48:40  bruno
    52  # touch ups
    52  #
    53  # Revision 1.1  2005/11/17 16:55:56  bruno
    54  # first rev
    55  #
    56  #
    57
    58  import getopt
    59  import sys
    60  import httplib
    61  import os
    62  import os.path
    63
    64  boundary = '------------Rocks-Trace-File-Upload------------'
    65
    66
    67  def file_header(filename):
    68          str = '--' + boundary + '\n'
    69          str += 'Content-Disposition: form-data; '
    70          str += 'name="filename"; '
    71          str += 'filename="%s"\n\n' % (os.path.basename(filename))
    72          return str
    73
    74
    75  def file_trailer():
    76          str = '\n--' + boundary + '\n\n\r\n'
    77          return str
    78
    79
    80  def encode_fields(fields):
    81          str = ''
    82
    83          for (key, value) in fields:
    84                  str += '--' + boundary + '\n'
    85                  str += 'Content-Disposition: form-data; ' + \
    86                                                  'name="%s"\n\n' % (key)
    87                  str += value + '\n'
    88
    89          return str
    90
    91
    92
    93  def send_file(host, req, fields, filename):
    94          str_fields = encode_fields(fields)
    95          header = file_header(filename)
    96          trailer = file_trailer()
    97          filesize = os.path.getsize(filename)
    98
    99          h = httplib.HTTPSConnection(host)
   100
   101          h.putrequest('POST', req)
   102
   103          h.putheader('content-type', 
   104                          'multipart/form-data; boundary=%s' % (boundary))
   105
   106          bodysize = len(str_fields) + len(header) + len(trailer) + filesize
   107          h.putheader('content-length', '%d' % (bodysize))
   108
   109          h.endheaders()
   110
   111          #
   112          # send the body of the message
   113          #
   114          h.send(str_fields)
   115          h.send(header)
   116
   117          file = open(filename, 'r')
   118          line = file.readline()
   119          while line:
   120                  h.send(line)
   121                  line = file.readline()
   122          file.close()
   123
   124          h.send(trailer)
   125
   126          response = h.getresponse()
   127
   128          return response
   129
   130  #
   131  # main
   132  #
   133
   134  #
   135  # get the command line arguments
   136  #
   137  opts, args = getopt.getopt(sys.argv[1:], '', ['upload-server=',
   138          'filename=', 'id=' ])
   139
   140  #
   141  # IP address of the node we care about
   142  #
   143  upload_server = ''
   144  filename = ''
   145  id = ''
   146
   147  for c in opts:
   148          if c[0] == '--upload-server':
   149                  upload_server = c[1]
   150          if c[0] == '--filename':
   151                  filename = c[1]
   152          if c[0] == '--id':
   153                  id = c[1]
   154
   155
   156  req = '/traces/sbin/store-trace.cgi'
   157  fields = [ ('username', 'rocks'), ('id', id) ]
   158
   159  send_file(upload_server, req, fields, filename)

Line 99 shows the HTTPS connection and line 101 shows the POST command.

Note

It's important to note that all trace files are transferred using a secure link. That is, if an outside program is eavesdropping on the connection, they will only see ciphertext and not cleartext.

The program /opt/app-trace/bin/upload.py calls the CGI program /traces/sbin/store-trace.cgi on SDSC's server. The contents of the CGI is:

     1  #!/opt/rocks/usr/bin/python
     2
     3  import cgi
     4  import os
     5
     6  tracedir = '/state/partition1/traces/'
     7
     8
     9  def savefile(infile, outfilename):
    10          outfile = open('%s' % (outfilename), 'w+')
    11
    12          if infile:
    13                  line = infile.readline()
    14                  while line:
    15                          outfile.write(line)
    16                          line = infile.readline()
    17          else:
    18                  outfile.write(f.value)
    19
    20          outfile.close()
    21          return
    22
    23  #
    24  # main
    25  #
    26  form = cgi.FieldStorage()
    27
    28  username = ''
    29  filename = ''
    30  id = ''
    31
    32  file = open('/tmp/store-file.cgi', 'w+')
    33  for key in form.keys():
    34
    35          if key == 'username':
    36                  username = form.getvalue(key)
    37                  file.write('username (%s)\n' % (username))
    38                  file.write('\n')
    39          elif key == 'id':
    40                  id = form.getvalue(key)
    41                  file.write('id (%s)\n' % (id))
    42                  file.write('\n')
    43          elif key == 'filename' and form[key].filename:
    44                  filename = form[key].filename
    45
    46  file.close()
    47
    48  if username == 'rocks' and id != '' and filename != '':
    49          dirname = os.path.join('/state/partition1/traces/', id)
    50          if not os.path.exists(dirname):
    51                  os.system('mkdir -p %s' % (dirname))
    52
    53          outfilename = os.path.join(dirname, filename)
    54
    55          savefile(form['filename'].file, outfilename)
    56
    57  print 'Content-type: application/octet-stream'
    58  print 'Content-length: %d' % (len(''))
    59  print ''                
    60  print ''

The CGI above creates a new directory based on the id passed to it by upload.py.

Note

Again, no cluster-specific information is stored on the SDSC server.