Technical How-To’s & Notes
Technical How-To’s & Notes
Stata Temporary Files and Stata Tmp
Stata Temporary Files and Stata Tmp
The Problem:
On not-so-rare occasions, Stata creates problems on our cluster due its usage of temporary
files. When launching Stata, when running do/ado files, and when opening and writing
files, Stata creates a number of files on the /tmp
volume that is local to each compute node. This volume is shared by all jobs on that
node, is essential for the health of the node, offers a significant speed advantage
over using storage on the network, and is usually not very large. This becomes a problem
if the /tmp
volume is too small for a person's code pattern when writing and/or merging a number
of files a once, if Stata exits abnormally, or if a large number of Stata jobs land
on one node and are doing lots of file work.
Does This Affect You?
If you meet the following criteria:
- You plan to open/write a number of files simultaneously
- Some of the file sizes may be large ( > several GB)
- You will submit a large number of jobs (> 10) to run at the same time
- You may still testing your code, your workflow, or changing parameters, which may require killing your jobs
The Solution(s):
Part of this can be remedied by making the /tmp
volumes larger. We have, for the most part, increased the size to 1.7 - 2 TB (from
0.5 TB). This is not a sure guarantee of future problems.
Another approach is adjusting one's code and file usage patterns to have fewer simultaneous open files for writing. Sometimes this is possible; others not. Another unsure guarantee.
Our additional solution is to redirect where Stata stores its temporary files. The
shell environment variable STATATMP
, which must be set before Stata launches, can be set to any location, and Stata will
save its temporary files there for that session (or permanently, if set in your account's
login scripts .bashrc/.bash_profile
; but this is not recommended). This then allows two choices for file redirection
and potential manual temporary file cleanup:
-
One can continue to use the local
/tmp
volume, though if your scripts fail, your jobs are killed for some reason, or you manually kill your jobs, you should manually remove the temporary files. -
Alternatively, redirect your temporary file usage to the larger
/export/scratch
filesystem, and manually remove the files.
In either case, if your scripts fail, your jobs are killed for some reason (e.g time limit, memory problems), or you manually kill your jobs, you will need to clean up the temporary files manually (see below).
We recommend the following steps:
1. Create a profile.do
file at ~/ado/personal/
with the following lines:
local tmpdir = c(tmpdir) // retrieve tmpdir location
capture confirm file "`tmpdir'" // check if tmpdir exists
if _rc { // if tmpdir does not exist...
!mkdir -p "`tmpdir'" // create it!
}
This will ensure that the temporary location and folder path exists as Stata is launching. This will help with step #2 or when using regular, default Stata temporary file usage.
2. Prefix all Stata job submissions (default wrapper submissions or custom submissions) with the Bash directive that changes the Stata temporary file location for that one job / application launch. Here are a few examples:
STATATMP=/tmp/\$USER/\$LSB_JOBID xstata-mp4-5g
STATATMP=/export/scratch/stata_temp/\$USER/\$LSB_JOBID stata-mp4-5g -b do myfile.do
bsub -q short -n 4 \
-R "rusage[mem=5000]" -M 5000 -hl \
STATATMP=/export/scratch/stata_temp/\$USER/\$LSB_JOBID \
stata-mp4 -b do myfile.do
(the \ at the end of the line (absolutely nothing after!) and spaces on the next line allow you to write one command over multiple lines)
The STATATMP
expression before the Stata command (in whatever form) will specify both a user-specific
and jobID-specific folder on the /export/scratch
volume. Then the script in step #1 will run at Stata launch and create this specific
folders/path if they don't already exists, which in this case should not.
Note: you will not be able to use the Application drop-down menus in NoMachine to run Stata and also redirect temporary file use for that session, as there is no
way to set the STATATMP
value when executing those specific wrapper scripts.
TManual Cleanup of Temporary Files:
If your scripts fail, your jobs are killed for some reason (e.g time limit, memory problems), or you manually kill your jobs, you will need to clean up the temporary files manually, as the files will remain for 5 to 15 days and take up valuable disk storage. Here are two remedies:
1. If you redirected the temporary files to /export/scratch
as in the examples above, issue the following command in the terminal:
rm -rf /export/scratch/stata_temp/$USER
Note that you'll need to adjust this path if you did not use the stata_temp/
folder.
2. If you redirected files to the local /tmp
volume, deleting them take a bit more work. First, one must find which compute node(s)
the jobs ran on. In the terminal, use the bhist
command with one or more jobIDs:
bhist -l jobID
In the details that are reported, pick out the Dispatched
hosts, the nodes which ran the jobs:
Job <612217>, User <jharvard>, Project <default>, Command
</usr/local/app/stata15-mp16/stata-mp -b do ../../Code/Analysis/analysis.do>, Esub <mem>
Tue Dec 31 16:56:14: Submitted from host <rhrcscli1>, to Queue <test>, CWD
</export/home/dor/jharvard/projects/bigdata1/runlogs/191231-1656-analysis>, Output
File <%J_analysis.out>, Error File <%J_analysis.err>, Notify when job begins/ends, 8
Task(s), Requested Resources <rusage[mem=60000]>, memory/swap limit enforced
per-job/per-host;
Tue Dec 31 16:56:15: Dispatched 8 Task(s) on Host(s) <8*rhrcsnod4>, Allocated 8
Slot(s) on Host(s) <8*rhrcsnod4>, Effective RES_REQ <select[type == local]
order[r15s:pg] rusage[mem=60000] span[hosts=1] affinity[core(1)*1] >;
Tue Dec 31 16:56:15: Starting (Pid 15613);
...
One can see that rhrcsnod4
ran this particular job. And now we can delete the temporary files by submitting
a job that deletes the files on that host:
bsub -m rhrcsnod4 -q short rm -rf /tmp/stata_temp/$USER
You will need to repeat these steps for all jobs that failed. Two simple examples for grabbing execution host information:
# grab execution host data for one or more jobIDs that failed
bhist -e -l jobIDs | grep -i dispatched
OR
# grab execution host data for recently jobs that failed
bhist -e -l | grep -i dispatched
If there are multiple hosts, you can use a Bash loop to submit multiple jobs to delete the files:
for host in rhrcsnod3 rhrcsnod4 rhrcsnod5; do
bsub -m $host -q short rm -rf /tmp/stata_temp/$USER
done
Of course, if you have any questions, please contact RCS.
Updated 4/7/2020