Re: log file

From: Alexei Rossokhin (alrossokhin_at_yahoo.com)
Date: Fri Jan 24 2020 - 01:11:08 CST

 Hi Andrew,
Thanks for the help. I started to run my MD tasks as you advised me with
charmrun + p ++ mpiexec namd2
and now it works correctly. However, the system reported that GPUs are minimally used in the process.
Could you please advise how to make the use of the GPU more efficient?
Best wishes,Alexey

    On Thursday, January 16, 2020, 12:15:37 AM GMT+3, Pang, Yui Tik <andrewpang_at_gatech.edu> wrote:
 
 Hi Alexey,
I am not familiar with sbatch, but it seems like you are using a special script "ompi" to launch your MPI jobs. I would suggest you to contact your cluster support team to figure out the correct way to launch NAMD on the specific cluster. Given the build you are using is verbs-smp, I believe this paragraph I copied from "notes.txt" will be most helpful for you and your cluster support:

For MPI-based SMP builds one would specify any mpiexec options needed
for the required number of processes and pass +ppn to the NAMD binary as:

  mpiexec -n 4 namd2 +ppn 3 <configfile>

MPI-based SMP builds have worse performance than verbs or ibverbs and
are not recommended, particularly for GPU-accelerated builds.

Hope it helps!
Best,Andrew From: owner-namd-l_at_ks.uiuc.edu <owner-namd-l_at_ks.uiuc.edu> on behalf of Alexei Rossokhin <alrossokhin_at_REMOVE_yahoo.com>
Sent: Wednesday, January 15, 2020 5:44 AM
To: namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu>
Subject: Re: namd-l: log file Hi Andrew,

Thank you for your attempt to help me.
>Did you see one line of that "Info: Running on ..." line, or multiple instances? If you see multiple of them, it is very likely you have started multiple instance of NAMD accidentally.

No, I don't have such line in my log files.With "Running on ... " I have the following message

Charm++> Running on 1 unique compute nodes (28-way SMP).Charm++> cpu topology info is gathered in 0.019 seconds.
Charm++> Running on 1 unique compute nodes (28-way SMP).
Charm++> Running on 1 unique compute nodes (28-way SMP).
Info: NAMD 2.11 for Linux-x86_64-verbs-smp
Below is a command line I use to start NAMD calculation
sbatch -n 16 --time=0-22:30:00 ompi --bind-to none namd2 <configfile>
Is it right?Thank you.Alexey

On Wednesday, January 15, 2020, 3:15:37 AM GMT+3, Pang, Yui Tik <andrewpang_at_gatech.edu> wrote:

Hi Alexey,
We will need more information to help you out. Did you see one line of that "Info: Running on ..." line, or multiple instances? If you see multiple of them, it is very likely you have started multiple instance of NAMD accidentally. 
Please be aware that NAMD built for different platforms should be launched in different ways. For example, if you are using Linux-x86_64-multicore built, you should launch NAMD by:
  namd2 +p<procs> <configfile>

However, if the NAMD built you have is Linux-x86_64-ibverbs, you should launch it by:
  charmrun namd2 ++local +p<procs> <configfile>
If you are using Linux-x86_64-verbs instead, the command to use is:
  charmrun +p<procs> ++mpiexec namd2 <configfile>

(All the above examples assumed you are running NAMD on a single Linux node.)Please refer to "notes.txt" that come with the NAMD binaries for details. If you are confused which NAMD built you are, you should able to find it by looking at the first "Info:" line of your log file. In my case, it wrote: "Info: NAMD 2.11 for Linux-x86_64-ibverbs".

Let us know if you still have difficulties.
AndrewFrom: owner-namd-l_at_ks.uiuc.edu <owner-namd-l_at_ks.uiuc.edu> on behalf of Alexei Rossokhin <alrossokhin_at_REMOVE_yahoo.com>
Sent: Tuesday, January 14, 2020 11:10 AM
To: namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu>
Subject: Re: namd-l: log file Hi,thanks for the quick response.
I have the following lines in my log file

Info: Running on 1 processors, 1 nodes, 1 physical nodes.
and I am not trying to launche multiple copies.
Alexey

On Tuesday, January 14, 2020, 6:24:27 PM GMT+3, Giacomo Fiorin <giacomo.fiorin_at_gmail.com> wrote:

I usually get this when I have multiple NAMD instances writing to the same file.  Can you check how you launched it to make sure that it's consistent with what you want?  Look for this line in the output:

Info: Running on XX processors, YY nodes, ZZ physical nodes.

If you are explicitly running multiple copies with +replicas, use the +stdout flag as well.
Giacomo

On Tue, Jan 14, 2020 at 9:39 AM Alexei Rossokhin <alrossokhin_at_remove_yahoo.com> wrote:

Dear NAMD experts,
can anybody expalin me why at the minimzation/anneling process I got such jumps step by step in my log file (for exmple, see below 34200, 45920, 43560, 33800, 43880). Thank you in advance.Alexey

PRESSURE: 34200 -131.805 -3.38526 -181.954 -114.61 -17.7996 60.7462 -131.747 0.842376 119.959
GPRESSURE: 34200 -77.9873 37.8937 -173.265 -145.494 6.57063 67.6804 -141.456 25.7605 147.873
ENERGY:   34200      4172.4443     15131.3118     23859.0366       488.5060        -496028.8436     35114.8672      1045.4922         0.0000     33364.0599        -382853.1256       112.7096   -416217.1855   -382641.2713       112.7096             -9.8818        25.4854   1375535.7696        -9.8818        25.4854

PRESSURE: 45920 -92.5175 -16.0369 -41.5325 -83.8353 -152.168 28.1227 -33.9996 28.2141 151.485
GPRESSURE: 45920 -91.4706 20.6981 -30.7003 -84.405 -91.3729 41.7199 -36.1052 11.5266 201.678
PRESSAVG: 45920 0.875517 3.18486 -120.187 -57.3614 -46.7854 -5.78555 -104.491 -3.40833 13.9777
GPRESSAVG: 45920 0.380058 -1.93937 -119.743 -54.8274 -50.7971 -5.32708 -103.372 -4.36064 18.9772
TIMING: 45920  CPU: 71613.8, 2.38445/step  Wall: 71590.9, 2.3836/step, 0.185391 hours remaining, 747.523438 MB of memory in use.
ENERGY:   45920      4580.4501     16499.6773     24046.8464       529.9106        -491740.4083     33722.1165      1058.4929         0.0000     39177.6548        -372125.2597       132.3489   -411302.9145   -371873.7401       132.4044            -31.0671         6.2783   1367961.6321       -10.6441       -10.4799

PRESSURE: 43560 96.8541 -39.7641 -25.2649 5.00658 318.426 -63.9513 0.783388 -93.6069 -11.7887
GPRESSURE: 43560 157.321 -23.8416 -68.6609 30.0552 339.915 -16.382 17.7373 -129.45 8.88396
PRESSAVG: 43560 -47.9912 -34.4189 28.8564 -27.2364 84.094 -61.7098 30.6092 -50.0621 -28.549
GPRESSAVG: 43560 -43.7836 -29.9769 31.0341 -28.0596 82.7295 -58.5285 32.8886 -54.4655 -32.2422
TIMING: 43560  CPU: 71610.6, 1.92252/step  Wall: 71590, 1.92194/step, 0.128129 hours remaining, 746.316406 MB of memory in use.
ENERGY:   43560      4541.2762     16094.5270     23995.0947       520.0559        -492124.8618     33756.3212      1090.0505         0.0000     38174.0456        -373953.4907       128.9586   -412127.5363   -373709.6920       128.9583            134.4972       168.7067   1368951.1044         2.5179         2.2346

PRESSURE: 33800 -79.6388 132.96 160.817 204.468 -85.1759 23.1872 140.642 38.0828 42.6884
GPRESSURE: 33800 -47.9999 126.61 186.276 204.676 -43.6118 14.7897 122.071 91.1083 78.5532
PRESSAVG: 33800 24.3857 141.494 33.4736 188.615 -43.053 64.7807 6.54744 121.856 13.4976
GPRESSAVG: 33800 24.3734 147.377 32.1721 186.81 -43.3077 67.3887 10.6156 119.282 14.0064
TIMING: 33800  CPU: 71621.1, 2.37152/step  Wall: 71595.6, 2.37067/step, 0.263408 hours remaining, 731.367188 MB of memory in use.
ENERGY:   33800      4120.2594     15040.2501     23814.4334       493.7457        -496069.9001     34946.4178      1063.4852         0.0000     33246.9564        -383344.3523       112.3140   -416591.3087   -383135.0758       112.0731            -40.7088        -4.3528   1371890.3038        -1.7232        -1.6426

PRESSURE: 43880 131.555 -126.426 97.3805 -65.9241 81.7644 139.926 -4.57359 -68.5444 -43.4502
GPRESSURE: 43880 181.451 -122.94 102.786 -58.7188 109.383 115.383 10.557 -64.6083 -17.0218
PRESSAVG: 43880 3.52469 -101.27 36.8295 -13.2392 -11.7467 121.355 25.0006 -36.616 -18.0008
GPRESSAVG: 43880 5.11272 -101.453 33.2715 -12.502 -10.8696 124.847 25.92 -41.2756 -21.1324
TIMING: 43880  CPU: 71621.2, 1.92101/step  Wall: 71598.1, 1.92032/step, 0.27738 hours remaining, 745.691406 MB of memory in use.
ENERGY:   43880      4504.5678     16140.9938     24031.9612       519.6740        -492895.7751     34264.549

-- 
Giacomo FiorinAssociate Professor of Research, Temple University, Philadelphia, PA
Research collaborator, National Institutes of Health, Bethesda, MD
http://goo.gl/Q3TBQU
https://github.com/giacomofiorin
  

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:12 CST