WEBLOG WEBLOG – My IT experience

CentOS Nagios Dockerfile with a start script

Here is a short Dockerfile that can set up a CentOS + Nagios + Apache instance in few minutes inside a container.

Of course one can always prefer to work with an image instead, however Dockerfiles have proven to be more secure, clear, and easy to change or follow.

Keep in mind to change the version of Nagios to the latest one:

Read More

SuperMicro JBOD ipmitool how to

SuperMicro JBOD’s can be attached through a straight patch cable to its server.

If not set the JBOD, tries to find a DHCP server, if it can’t, it automatically sets as a default address 192.168.1.99.

In order to access and configure the JBOD you need to add an IP address to the interface, to which it is attached, and bring it up:

ip addr add 192.168.1.98/25 dev eno3
  ip l set dev eno3 up
Read More

Centos Docker container su: cannot open session: Permission denied

Issue: After an software update in a Centos 7 docker container `su – user` is no longer possible with the following error

bash-4.2# su - username
Last login: Wed Sep 13 13:20:31 UTC 2017
su: cannot open session: Permission denied

Cause:
Inappropriate settings of nofile in either in

/etc/security/limits.conf

or

/etc/security/limits.d/*.conf

Solution:
There are several solution, which suggest removing nofile unlimited like editing limits.conf and Redhat proposed solution.

However there are also files under /etc/security/limits.d/, where you need to fix nofile references as well. Where you need to change it from unlimited or a number like 500000 to 65536 or less.

bash-4.2# cat /etc/security/limits.d/50-open-files.conf
*         hard    nofile      500000
*         soft    nofile      500000

Need to be edited to become:

bash-4.2# cat /etc/security/limits.d/50-open-files.conf
*         hard    nofile      65536
*         soft    nofile      65536

Docker inspect list ports, volumes and etc.

The `docker inspect` returns useful information about Docker containers.

To filter the returned input one can request a return format.

Main documentation can be found here:
https://docs.docker.com/engine/reference/commandline/inspect/

A peace of information that is not immediately visible, but is always of interest is the list of volumes bonded, which can be extracted as follows:

docker inspect --format='{{json .HostConfig.Binds}}' container_name

To retrieve a list of the binds separated by a new line:

docker inspect --format='{{json .HostConfig.Binds}}' container_name | \
sed 's/"//g;s/\[//g;s/\]//g'| \
tr ',' '\n' 

For the network settings:

docker inspect --format='{{json .NetworkSettings.Ports}}' container_name

To find the command which was used when the docker was started:

docker inspect  -f "{{.Name}} {{.Config.Cmd}}" container_name

To find the environment variables with which the container was started:

docker inspect --format "{{.Config.Env}}" container_name

Many more useful inspect format magics could be found here:

Docker Inspect Template Magic

HTCondor Docker universe throws core.STARTER

This is a problem observed when using HTCondor in the Docker universe.

After re configuring HTCondor and Docker on one processing node, every time a job is sent the following errors are dumped in the corresponding slot’s StarterLog.slot1_N:

(pid:24877) Found 33 entries in docker image cache.
Stack dump for process 24877 at timestamp 1497439919 (13 frames)
/lib64/libcondor_utils_8_7_1.so(dprintf_dump_stack+0x72)[0x7fbbc3e6f0b2]
/lib64/libcondor_utils_8_7_1.so(_Z18linux_sig_coredumpi+0x24)[0x7fbbc3ffb534]
/lib64/libpthread.so.0(+0xf370)[0x7fbbc25bc370]
/lib64/libstdc++.so.6(_ZNSt8__detail15_List_node_base9_M_unhookEv+0x7)[0x7fbbc2f76077]
/lib64/libcondor_utils_8_7_1.so(_ZN9DockerAPI3runERN14compat_classad7ClassAdES2_RKSsS4_S4_RK7ArgListRK3EnvS4_St4listISsSaISsEERiPiR11CondorError+0x42e)[0x7fbbc3e31cee]
condor_starter(_ZN10DockerProc8StartJobEv+0xb66)[0x454656]
condor_starter(_ZN8CStarter8SpawnJobEv+0xc3)[0x45b753]
condor_starter(_ZN8CStarter14SpawnPreScriptEv+0x197)[0x459757]
/lib64/libcondor_utils_8_7_1.so(_ZN12TimerManager7TimeoutEPiPd+0x182)[0x7fbbc3ff9952]
/lib64/libcondor_utils_8_7_1.so(_ZN10DaemonCore6DriverEv+0x9cb)[0x7fbbc3fdb59b]
/lib64/libcondor_utils_8_7_1.so(_Z7dc_mainiPPc+0x13e8)[0x7fbbc3ffebe8]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fbbc220db35]

There is also a core.STARTER generated and the output of `gdb /var/log/condor/core.STARTER <<< “where”` is:

Core was generated by `condor_starter -f -a slot1_1 fqdn.domain.com’.

(gdb) Python Exception <class ‘gdb.MemoryError’> Cannot access memory at address 0xb1340bc0:

The lead to that was a bug in the Docker thinpool storage driver, which led to the use of overlay2 driver alongside with a Docker reinstall.

Solution:
There are ‘hidden’ dot files in the condor log directory, they contain cache information that might mess up with you job submission, to fix that one needs to stop condor, remove those files and start condor again. Once done the node start accepting Docker Universe jobs again.

systemctl stop condor
cp /var/log/condor/.s* /tmp/
rm -f /var/log/condor/.s*
systemctl start condor

Categories