Advanced Job Submission and File Conversion Techniques

undefined
S
u
b
m
i
t
t
i
n
g
 
J
o
b
s
(
a
n
d
 
h
o
w
 
t
o
 
f
i
n
d
 
t
h
e
m
)
J
o
h
n
 
(
T
J
)
 
K
n
o
e
l
l
e
r
C
e
n
t
e
r
 
f
o
r
 
H
i
g
h
 
T
h
r
o
u
g
h
p
u
t
 
C
o
m
p
u
t
i
n
g
Development of an
 advanced submit file
Using as many techniques and tricks as
possible. 
Custom print formats
A few
 more random tricks (time permitting)
O
v
e
r
v
i
e
w
2
I have a lot of media files that I have
collected over the years.
I want to convert them all to .mp4
(Sounds like a high-throughput problem…)
T
h
e
 
S
t
o
r
y
3
Executable = ffmpeg
Transfer_executable = false
Should_transfer_files = YES
file = S1E2 The Train Job.wmv
Transfer_input_files = $(file)
Args = "-i '$(file)' '$(file).mp4'"
Queue
B
a
s
i
c
 
s
u
b
m
i
t
 
f
i
l
e
 
f
o
r
 
c
o
n
v
e
r
s
i
o
n
4
Transfer_input_files = $(file)
Args = "-i '$(file)' '$(file).mp4' "
Queue FILE from (
 S1E1 Serenity.wmv
 S1E2 The Train Job.wmv
 S1E3 Bushwhacked.wmv
 S1E4 Shindig.wmv
)
C
o
n
v
e
r
t
i
n
g
 
a
 
s
e
t
 
o
f
 
f
i
l
e
s
5
Output is 
$(file).mp4. 
So output files
are named
 S1E1 Serenity
.wmv
.mp4
 S1E2 The Train Job
.wmv
.mp4
 S1E3 Bushwhacked
.wmv
.mp4
 S1E4 Shindig
.wmv
.mp4
O
u
t
p
u
t
 
f
i
l
e
n
a
m
e
 
p
r
o
b
l
e
m
s
6
$Fqpdnxba() expands to parts of a filename
file = "./Video/Firefly/S1E4 Shindig.wmv"
$Fp(file)  -> ./Video/Firefly/
$Fqp(file) -> "./Video/Firefly"
$Fqpa(file)-> './Video/Firefly'
$Fd(file)  -> Firefly/
$Fdb(file) -> Firefly
$Fn(file)  -> S1E4 Shindig
$Fx(file)  -> .wmv
$Fnx(file) -> S1E4 Shindig.wmv
$
F
(
)
 
t
o
 
t
h
e
 
r
e
s
c
u
e
7
Transfer_Input_Files = $(file)
Args = "-i '$Fnx(file)' '$Fn(file).mp4'"
Resulting files are now
 S1E1 Serenity.mp4
 S1E2 The Train Job.mp4
 S1E3 Bushwhacked.mp4
 S1E4 Shindig.mp4
$
F
n
(
)
 
i
s
 
n
a
m
e
 
w
i
t
h
o
u
t
 
e
x
t
e
n
s
i
o
n
8
$Fq(file) expands to quoted "filename"
h
Gives "parse error" with Arguments statement
For Args use 
'
$F(file)
'
 instead.
h
Becomes 'filename' on LINUX
h
Becomes "filename" on Windows
In 8.6 you can use $Fqa(file) instead
$
F
q
(
)
 
a
n
d
 
A
r
g
u
m
e
n
t
s
9
FILE = The Train Job.wmv
Args = "-i '$Fnx(file)' -w640 '$Fn(file).mp4' "
# Tool Tip* see it before you submit it.
condor_submit test.sub -dump test.ads
condor_status -ads test.ads -af Arguments
-i The' 'Train' 'Job.wmv -w640 The' 'Train' 'Job.mp4
# On *nix the job sees
-i 'The Train Job.wmv' –w640 'The Train Job.mp4‘
# on Windows the job sees
-i "The Train Job.wmv" –w640 "The Train Job.mp4"
"
n
e
w
"
 
A
r
g
s
 
p
r
e
s
e
r
v
e
s
 
s
p
a
c
e
s
10
Argument quoting not portable across
operating systems
h
LINUX needs space and ' escaped
h
Windows needs double quotes around
filenames that have space or ^
What the job sees can be hard to predict
Transfer_input_files will not transfer a file
with a comma in the name.
S
o
m
e
t
i
m
e
s
 
y
o
u
 
c
a
n
'
t
 
u
s
e
 
A
r
g
s
11
Use a script as your executable
Use custom job attributes to pass
information to the script
  # these both set CustomAttr in the job ad
  +CustomAttr = "value"
  MY.CustomAttr = "value"
You can refer to custom attributes in $()
expansion in your submit file
  transfer_input_files = $F(My.CustomAttr)
A
l
t
e
r
n
a
t
i
v
e
 
t
o
 
A
r
g
s
12
Executable = xcode.pl
Args = -s 640x360
Transfer_executable = true
Should_transfer_files = true
# +WantIOProxy = true
MY.SourceDir = $Fqp(FILE)
MY.SourceFile = $Fqnx(FILE)
+OutFile = "$Fn(FILE).mp4"
Batch_name = $Fdb(FILE)
Queue FILE matching files Firefly/*.wmv
A
d
d
 
c
u
s
t
o
m
 
a
t
t
r
i
b
u
t
e
s
 
t
o
 
t
h
e
 
j
o
b
13
#!/usr/bin/env perl
# xcode.pl
# Pull filenames from job ad
my $src = `condor_status -ads .job.ad -af SourceFile`;
my $out = `condor_status -ads .job.ad -af OutFile`;
# find condor_chirp (also need +WantIOProxy in job)
my $lib = `condor_config_val libexec`;
chomp $src; chomp $out; chomp $lib;
# fetch the input file
system("$lib/condor_chirp fetch '$src' '$src'")
# do the conversion
system("ffmpeg -i '$src' @ARGV '$out'");
U
s
e
 
a
 
s
c
r
i
p
t
 
t
o
 
q
u
e
r
y
 
t
h
e
 
.
j
o
b
.
a
d
14
#!/usr/bin/env python
# xcode.py
# load the job ad and get the source filename
job = classad.parseOne(open('.job.ad').read())
src = job['SourceFile']
srcpath = job['SourceDir'] + src
# fetch the input file
chirp.fetch(srcpath, src)
# do the conversion and return the exit code
exit(os.system("ffmpeg -i {0} {2} {1}".format(
               src, job['OutFile'], job['Arguments']));
U
s
e
 
p
y
t
h
o
n
 
t
o
 
q
u
e
r
y
 
t
h
e
 
.
j
o
b
.
a
d
15
condor_q
OWNER BATCH_NAME … DONE RUN IDLE TOTAL JOB_IDS
Tj    Firefly         _   2    2     _ 104.0-4
condor_q -af:jh JobStatus
 SourceFile SourceDir
 ID     JobStatus SourceFile         SourceDir
 104.0  2         S1E1 Serenity      Firefly/
 104.1  2         S1E2 The Train Job Firefly/
 104.2  1         S1E3 Bushwhacked   Firefly/
 104.4  1
         S134 Shindig       Firefly/
S
e
e
 
h
o
w
 
i
t
'
s
 
g
o
i
n
g
.
.
16
A c++ class within HTCondor that can print
information from a ClassAd
Used by condor_status / q / history for most
of the standard output
-format and -autoformat control it directly
-print-format <format-file> gives the most
control
T
h
e
 
C
l
a
s
s
A
d
 
"
P
r
e
t
t
y
 
P
r
i
n
t
e
r
"
17
-print-format <format-file>
h
control attributes, headings, format, constraint
h
like -autoformat on steroids
h
condor_status, condor_q, and condor_history
h
Config to make it your default output
h
An "experimental" feature, but stable
   htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=ExperimentalFeatures
U
s
e
 
a
 
c
u
s
t
o
m
 
p
r
i
n
t
 
f
o
r
m
a
t
18
SELECT [LABEL [SEPARATOR <string>]] \
       [FIELDPREFIX <string>] \
       [FIELDSUFFIX <string>] \
       [RECORDPREFIX] <string>] \
       [RECORDSUFFIX] <string>]
<expr> [AS <label>][  PRINTF <fmt> \
                    | PRINTAS <function> \
                    | WIDTH [AUTO | <int>]] \
       [LEFT | RIGHT] [NOPREFIX] [NOSUFFIX]
.. repeat, as needed
[GROUP BY <sort-expr> [ASCENDING | DECENDING]
C
u
s
t
o
m
 
p
r
i
n
t
 
f
o
r
m
a
t
 
s
y
n
t
a
x
19
SELECT
 ClusterId AS " ID" PRINTAS JOB_ID
 JobStatus AS ST    PRINTAS JOB_STATUS
 (time()-EnteredCurrentStatus)/60.0 AS MIN PRINTF %7.2f
 JobBatchName AS BATCH
 SourceFile   AS SOURCE
 RemoteHost ?: "_" AS SLOT
# ignore jobs without the custom SourceFile attribute
WHERE SourceFile
 isnt 
undefined
SUMMARY STANDARD
 ID   ST   MIN  BATCH    SOURCE             SLOT
104.0 R   5.02  Firefly  S1E2 The Train Job slot1@crane
C
u
s
t
o
m
 
p
r
i
n
t
 
f
o
r
m
a
t
 
x
c
o
d
e
.
c
p
f
20
In your personal config
h
~/.condor/user_config
h
%USERPROFILE%\.condor\user_config
Save the xcode.cpf file and add this knob
# uncomment one of these depending on Linux/Windows
# PERSONAL = $ENV(HOME)/.condor
# PERSONAL = $ENV(USERPROFILE)\.condor
Q_DEFAULT_PRINT_FORMAT_FILE=$(PERSONAL)/xcode.cpf
M
a
k
e
 
i
t
 
y
o
u
r
 
d
e
f
a
u
l
t
 
o
u
t
p
u
t
21
              And now a few random tips...
 
22
condor_submit xcod.sub -q FILE matching *.wmv
  
(xcod.sub must NOT have a queue line)
pick.sh | condor_submit x.sub -q FILE from -
P
u
t
 
Q
u
e
u
e
 
o
n
 
c
o
m
m
a
n
d
 
l
i
n
e
23
Queue FILE from (
 S1E1 Serenity.wmv
# S1E2 The Train Job.wmv
# S1E3 Bushwhacked.wmv
)
  
use a python-style slice to define a subset
Queue FILE matching files [:1] *.wmv
T
e
s
t
 
u
s
i
n
g
 
a
 
s
u
b
s
e
t
 
o
f
 
j
o
b
s
24
Put 
$(slice)
 in your submit file
Queue FILE matching files $(slice) *.wmv
Then control the slice from the command line
condor_submit 'slice=[:1]' firefly.sub
E
v
e
n
 
e
a
s
i
e
r
 
i
f
 
y
o
u
 
p
r
e
p
a
r
e
25
# transfer files starting with file001
sequence = $(ProcId)+1
transfer_input_files = file$INT(sequence,%03d)
# Use the submit dir and cluster as the batch name
batch_name = $Ffdb(SUBMIT_FILE)_$(ClusterId)
# use the same random value for all jobs in this submit
include command : /bin/echo rval=$RANDOM_INTEGER(1,100)
Arguments = $(rval)
V
a
r
i
a
b
l
e
 
T
r
i
c
k
s
26
# HTCondor 8.7.9 or later for this code
sub = htcondor.Submit(open('xcode.sub').read())
# override some things
sub['MY.SourceDir'] = '"/media/Firefly/"'
sub['FILE'] = '$(Item)'
files = ['S1E1 Serenity.wmv', 'S1E2 The Train Job.wmv',
         'S1E3 Bushwhacked.wmv', 'S1E4 Shindig.wmv' ]
# submit using the files list
with schedd.transaction() as txn:
    cluster = sub.queue_with_iter(txn, 1, iter(files))
# cluster object has ClusterId, range of ProcId's
# and "common" job ClassAd
S
u
b
m
i
t
 
f
r
o
m
 
p
y
t
h
o
n
27
Q
u
e
s
t
i
o
n
s
?
28
Slide Note
Embed
Share

The world of high-throughput computing with advanced techniques for job submission and file conversion. Learn from expert John (TJ) Knoeller at the Center for High Throughput Computing as he demonstrates the development of advanced submit files, custom print formats, and more. Dive into converting a set of media files efficiently and solving output filename problems. Discover how to manage file paths and names effectively for a seamless workflow.

  • Computing
  • High-Throughput
  • Job Submission
  • File Conversion
  • Techniques

Uploaded on Feb 19, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Submitting Jobs (and how to find them) John (TJ) Knoeller Center for High Throughput Computing

  2. Overview Development of an advanced submit file Using as many techniques and tricks as possible. Custom print formats A few more random tricks (time permitting) 2

  3. The Story I have a lot of media files that I have collected over the years. I want to convert them all to .mp4 (Sounds like a high-throughput problem ) 3

  4. Basic submit file for conversion Executable = ffmpeg Transfer_executable = false Should_transfer_files = YES file = S1E2 The Train Job.wmv Transfer_input_files = $(file) Args = "-i '$(file)' '$(file).mp4'" Queue 4

  5. Converting a set of files Transfer_input_files = $(file) Args = "-i '$(file)' '$(file).mp4' " Queue FILE from ( S1E1 Serenity.wmv S1E2 The Train Job.wmv S1E3 Bushwhacked.wmv S1E4 Shindig.wmv ) 5

  6. Output filename problems Output is $(file).mp4. So output files are named S1E1 Serenity.wmv.mp4 S1E2 The Train Job.wmv.mp4 S1E3 Bushwhacked.wmv.mp4 S1E4 Shindig.wmv.mp4 6

  7. $F() to the rescue $Fqpdnxba() expands to parts of a filename file = "./Video/Firefly/S1E4 Shindig.wmv" $Fp(file) -> ./Video/Firefly/ $Fqp(file) -> "./Video/Firefly" $Fqpa(file)-> './Video/Firefly' $Fd(file) -> Firefly/ $Fdb(file) -> Firefly $Fn(file) -> S1E4 Shindig $Fx(file) -> .wmv $Fnx(file) -> S1E4 Shindig.wmv 7

  8. $Fn() is name without extension Transfer_Input_Files = $(file) Args = "-i '$Fnx(file)' '$Fn(file).mp4'" Resulting files are now S1E1 Serenity.mp4 S1E2 The Train Job.mp4 S1E3 Bushwhacked.mp4 S1E4 Shindig.mp4 8

  9. $Fq() and Arguments $Fq(file) expands to quoted "filename" Gives "parse error" with Arguments statement For Args use '$F(file)' instead. Becomes 'filename' on LINUX Becomes "filename" on Windows In 8.6 you can use $Fqa(file) instead 9

  10. "new" Args preserves spaces FILE = The Train Job.wmv Args = "-i '$Fnx(file)' -w640 '$Fn(file).mp4' " # Tool Tip* see it before you submit it. condor_submit test.sub -dump test.ads condor_status -ads test.ads -af Arguments -i The' 'Train' 'Job.wmv -w640 The' 'Train' 'Job.mp4 # On *nix the job sees -i 'The Train Job.wmv' w640 'The Train Job.mp4 # on Windows the job sees -i "The Train Job.wmv" w640 "The Train Job.mp4" 10

  11. Sometimes you can't use Args Argument quoting not portable across operating systems LINUX needs space and ' escaped Windows needs double quotes around filenames that have space or ^ What the job sees can be hard to predict Transfer_input_files will not transfer a file with a comma in the name. 11

  12. Alternative to Args Use a script as your executable Use custom job attributes to pass information to the script # these both set CustomAttr in the job ad +CustomAttr = "value" MY.CustomAttr = "value" You can refer to custom attributes in $() expansion in your submit file transfer_input_files = $F(My.CustomAttr) 12

  13. Add custom attributes to the job Executable = xcode.pl Args = -s 640x360 Transfer_executable = true Should_transfer_files = true # +WantIOProxy = true MY.SourceDir = $Fqp(FILE) MY.SourceFile = $Fqnx(FILE) +OutFile = "$Fn(FILE).mp4" Batch_name = $Fdb(FILE) Queue FILE matching files Firefly/*.wmv 13

  14. Use a script to query the .job.ad #!/usr/bin/env perl # xcode.pl # Pull filenames from job ad my $src = `condor_status -ads .job.ad -af SourceFile`; my $out = `condor_status -ads .job.ad -af OutFile`; # find condor_chirp (also need +WantIOProxy in job) my $lib = `condor_config_val libexec`; chomp $src; chomp $out; chomp $lib; # fetch the input file system("$lib/condor_chirp fetch '$src' '$src'") # do the conversion system("ffmpeg -i '$src' @ARGV '$out'"); 14

  15. Use python to query the .job.ad #!/usr/bin/env python # xcode.py # load the job ad and get the source filename job = classad.parseOne(open('.job.ad').read()) src = job['SourceFile'] srcpath = job['SourceDir'] + src # fetch the input file chirp.fetch(srcpath, src) # do the conversion and return the exit code exit(os.system("ffmpeg -i {0} {2} {1}".format( src, job['OutFile'], job['Arguments'])); 15

  16. See how it's going.. condor_q OWNER BATCH_NAME DONE RUN IDLE TOTAL JOB_IDS Tj Firefly _ 2 2 _ 104.0-4 condor_q -af:jh JobStatus SourceFile SourceDir ID JobStatus SourceFile SourceDir 104.0 2 S1E1 Serenity Firefly/ 104.1 2 S1E2 The Train Job Firefly/ 104.2 1 S1E3 Bushwhacked Firefly/ 104.4 1 S134 Shindig Firefly/ 16

  17. The ClassAd "Pretty Printer" A c++ class within HTCondor that can print information from a ClassAd Used by condor_status / q / history for most of the standard output -format and -autoformat control it directly -print-format <format-file> gives the most control 17

  18. Use a custom print format -print-format <format-file> control attributes, headings, format, constraint like -autoformat on steroids condor_status, condor_q, and condor_history Config to make it your default output An "experimental" feature, but stable htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=ExperimentalFeatures 18

  19. Custom print format syntax SELECT [LABEL [SEPARATOR <string>]] \ [FIELDPREFIX <string>] \ [FIELDSUFFIX <string>] \ [RECORDPREFIX] <string>] \ [RECORDSUFFIX] <string>] <expr> [AS <label>][ PRINTF <fmt> \ | PRINTAS <function> \ | WIDTH [AUTO | <int>]] \ [LEFT | RIGHT] [NOPREFIX] [NOSUFFIX] .. repeat, as needed [GROUP BY <sort-expr> [ASCENDING | DECENDING] 19

  20. Custom print format xcode.cpf SELECT ClusterId AS " ID" PRINTAS JOB_ID JobStatus AS ST PRINTAS JOB_STATUS (time()-EnteredCurrentStatus)/60.0 AS MIN PRINTF %7.2f JobBatchName AS BATCH SourceFile AS SOURCE RemoteHost ?: "_" AS SLOT # ignore jobs without the custom SourceFile attribute WHERE SourceFile isnt undefined SUMMARY STANDARD ID ST MIN BATCH SOURCE SLOT 104.0 R 5.02 Firefly S1E2 The Train Job slot1@crane 20

  21. Make it your default output In your personal config ~/.condor/user_config %USERPROFILE%\.condor\user_config Save the xcode.cpf file and add this knob # uncomment one of these depending on Linux/Windows # PERSONAL = $ENV(HOME)/.condor # PERSONAL = $ENV(USERPROFILE)\.condor Q_DEFAULT_PRINT_FORMAT_FILE=$(PERSONAL)/xcode.cpf 21

  22. And now a few random tips... 22

  23. Put Queue on command line condor_submit xcod.sub -q FILE matching *.wmv (xcod.sub must NOT have a queue line) pick.sh | condor_submit x.sub -q FILE from - 23

  24. Test using a subset of jobs Queue FILE from ( S1E1 Serenity.wmv # S1E2 The Train Job.wmv # S1E3 Bushwhacked.wmv ) use a python-style slice to define a subset Queue FILE matching files [:1] *.wmv 24

  25. Even easier if you prepare Put $(slice) in your submit file Queue FILE matching files $(slice) *.wmv Then control the slice from the command line condor_submit 'slice=[:1]' firefly.sub 25

  26. Variable Tricks # transfer files starting with file001 sequence = $(ProcId)+1 transfer_input_files = file$INT(sequence,%03d) # Use the submit dir and cluster as the batch name batch_name = $Ffdb(SUBMIT_FILE)_$(ClusterId) # use the same random value for all jobs in this submit include command : /bin/echo rval=$RANDOM_INTEGER(1,100) Arguments = $(rval) 26

  27. Submit from python # HTCondor 8.7.9 or later for this code sub = htcondor.Submit(open('xcode.sub').read()) # override some things sub['MY.SourceDir'] = '"/media/Firefly/"' sub['FILE'] = '$(Item)' files = ['S1E1 Serenity.wmv', 'S1E2 The Train Job.wmv', 'S1E3 Bushwhacked.wmv', 'S1E4 Shindig.wmv' ] # submit using the files list with schedd.transaction() as txn: cluster = sub.queue_with_iter(txn, 1, iter(files)) # cluster object has ClusterId, range of ProcId's # and "common" job ClassAd 27

  28. Questions? 28

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#