[Cialug] Columns of Data

Todd Walton tdwalton at gmail.com
Fri Jul 31 15:17:25 UTC 2020


Happy SysAdmin Day, everyone!

Here is an example bit of text coughed up by a Kubernetes command-line tool:

9m49s    Normal    Updated
 machine/oo-r6sr3-worker-us-east-1d-nvfzh    Updated machine
oo-r6sr3-worker-us-east-1d-nvfzh
9m47s    Normal    Updated            machine/oo-r6sr3-master-1
      Updated machine oo-r6sr3-master-1
9m46s    Normal    Updated
 machine/oo-r6sr3-worker-us-east-1b-hsmsx    Updated machine
oo-r6sr3-worker-us-east-1b-hsmsx
9m46s    Normal    Updated
 machine/oo-r6sr3-worker-us-east-1a-wk5cs    Updated machine
oo-r6sr3-worker-us-east-1a-wk5cs
9m46s    Normal    Updated
 machine/oo-r6sr3-worker-us-east-1e-z9xlb    Updated machine
oo-r6sr3-worker-us-east-1e-z9xlb
9m44s    Normal    Updated            machine/oo-r6sr3-master-0
      Updated machine oo-r6sr3-master-0
9m43s    Normal    Updated            machine/oo-r6sr3-master-2
      Updated machine oo-r6sr3-master-2
9m43s    Normal    Updated
 machine/oo-r6sr3-worker-us-east-1d-tfg6x    Updated machine
oo-r6sr3-worker-us-east-1d-tfg6x
9m43s    Normal    Updated
 machine/oo-r6sr3-worker-us-east-1c-6l42j    Updated machine
oo-r6sr3-worker-us-east-1c-6l42j
59s      Normal    SuccessfulUpdate   clusterautoscaler/default
      Updated ClusterAutoscaler deployment:
machine-api/cluster-autoscaler-default
4m7s     Normal    Pulled
pod/gateway-laravel-schedule-1296080-h43n6  Container image
"dockerregistry:4567/group/gateway/master:alpine-nodejs-fpm" already
present on machine

For the purpose of this email, don't mind about the semantics. This could
be anything. But do notice that the output is arranged into neat columns.
The first four columns are strings of non-space characters. The fifth
column, however, gives us trouble. It seeks to undermine the movement from
within, throwing a wrench into the works. Fifth columns, amiright?

Here's another example, this one taken from my /var/log/messages:

Jun 28 02:50:42 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
(wlp1s0): set-hw-addr: set MAC address to 9:7:8:9:2:F (scanning)
Jun 28 02:50:42 ilm01-ll-ttwalto kernel: IPv6: ADDRCONF(NETDEV_UP): wlp1s0:
link is not ready
Jun 28 02:50:42 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
(wlp1s0): supplicant interface state: inactive -> disabled
Jun 28 02:50:42 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
(wlp1s0): supplicant interface state: disabled -> inactive
Jun 28 02:55:57 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
(wlp1s0): set-hw-addr: set MAC address to 62:0F:6E:7A:B3:2C (scanning)
Jun 28 02:55:57 ilm01-ll-ttwalto kernel: IPv6: ADDRCONF(NETDEV_UP): wlp1s0:
link is not ready
Jun 28 02:55:57 ilm01-ll-ttwalto NetworkManager[2238]: <info>  device
(wlp1s0): supplicant interface state: inactive -> disabled

Here again the fifth column is making things difficult. Also the first
three could certainly stand to be one column, but at least they're
standard, predictable, and manipulable. Manipulable being what I'm looking
for.

This happens frequently, where a command or log outputs text in columns
that are not quite usefully arranged. How does one deal with columnar data
like this? I can't use 'cut'. What would I cut on that would capture the
first columns *and* keep the last one intact? I'm not sure how one would
easily use awk for this. Is there something like '{ print $5- }'? Meaning,
from column 5 onwards? I can't use "column -t" because that screws
everything up royally.

Another thing that trips me up. Sometimes I'll have a nice set of
comma-separated values but there'll be a comma in one of the fields. The
typical way of dealing with this in CSV files is to quote the entire field.
But that doesn't help me, the bash scripter.

Any suggestions for how to deal with stuff like this?

--
Todd


More information about the Cialug mailing list