Re-using the last bash command argument

Tired of re-typing the same argument twice for different commands? For bash there is an easy solution:

mkdir testdir
cd !$

The ‘!$’ maps to the last argument of the previous command, a real time saver!

Extracting mp4 files from AVCHD without transcoding

My new Sony digital camera stores movies inside an AVCHD container. Luckily this format is supported on OSX natively and you can at least browse all clips inside with ease. However, if you want to export clips there is a limitation: OSX forces you to transcode the video when exporting to .mp4. This is slow and introduces quality loss. I started wondering if there is a better way and as it turns out there is :)

Internally the AVCHD container has a number of .MTS files, which in my case contain perfectly fine H264 video and AAC audio. It should be enough to re-multiplex (meaning ‘copy data stream but don’t re-encode’) these streams into an MP4 container format. MP4 is widely supported by most devices (and OSX itself). The go-to tool in these kinds of situation is ffmpeg and we will use this to re-multiplex the streams. As an added bonus, the timestamp of the MP4 file will be set to the original .MTS timestamp.

To get ffmpeg on Linux, just install the ‘ffmpeg’ package. On OSX an easy way to get ffmpeg is to install it via Homebrew

Here is the code, it should be run from the AVCHD/BDMV/STREAM directory level:

#!/bin/bash

WORKDIR=`basename $PWD`

# Safety check
if ! [[ "$WORKDIR" == "STREAM" ]]; then
        echo "This script is supposed to run from AVCHD/BDMV/STREAM directory level, exiting now"
        exit 1
fi

# Is ffmpeg installed
if ! [ -x "$(command -v ffmpeg)" ]; then
  echo "ffmpeg is not installed, exiting now"
  exit 1
fi

# Re-mux all .MTS files into an mp4 container and set the timestamp of the mp4 to the same as the .MTS file
for i in *.MTS; do ffmpeg -i $i -vcodec copy -acodec copy -f mp4 ../../../`basename $i .MTS`.mp4 && touch -r $i ../../../`basename $i .MTS`.mp4 ; done

The resulting MP4 files will be put in the same directory as the AVCHD folder. For added convenience, you can download the file here.

Simple UDP relay with NAT latching in Python

When you’re building a VOIP server you soon encounter the problem that a client is behind a NAT (instead of a directly reachable public IP). In this scenario the server can’t send packets directly to a client.

However there is a way around this and this is called ‘NAT latching’. Most NAT configurations automatically forward any reply that is addressed to the same port number that was used in sending back to the right client automatically.

So by configuring our application to receive on the same port number as it is using for sending UDP, once one packet is sent out from the client to the server we can set up bi-directional communication with this client (as long as the NAT binding stays open) by remembering which public ip/port combination it was sending from.

On the server side, we need something called a ‘relay channel’. This channel is nothing more that a pair of sockets that remember the origin of each data stream and use that as a destination for forwarding packets to the other side. It works like this (we use RTP in this example but it can be any UDP protocol):

Precondition: Client A and B are behind a NAT (so they have a non-public IP).

  1. Client A starts sending RTP from a specific UDP port X and simultaneously binds on this same port number X to receive RTP.
  2. Client B starts sending RTP from a specific UDP port Y and simultaneously binds on this same port number Y to receive RTP.
  3. Client A sends at least one packet from port X to the ‘left’ side of the relay channel.
  4. The relay server remembers the ip/port combination that the packet originated from (external_ip_of_client_a/port_X).
  5. Client B sends at least one packet from port Y to the ‘right’ side of the relay channel.
  6. The relay server remembers the ip/port combination that the packet originated from (external_ip_of_client_b/port_Y).

Now, if a packet comes in on the ‘left’ side of the relay channel, the server knows that it can be forwarded to external_ip_of_client_b/port_Y. And vice versa, if a packet comes in on the ‘right’ side of the relay channel, the server knows that it can be forwarded to external_ip_of_client_A/port_X.

The whole trick here is that a client needs to send at least 1 packet and then things will work fine :)

Because it can be a hassle to set up a full relay server when developing, I wrote this python script that implements the same functionality. It’s not recommended for production use but for development it works fine! Make sure to run it on a server that has a public IP.

#!/usr/bin/env python

# Simple script that implements an UDP relay channel
# Assumes that both sides are sending and receiving from the same port number
# Anything that comes in on left side will be forwarded to right side (once right side origin is known)
# Anything that comes in on right side will be forwarded to left side (once left side origin is known)

# Inspired by https://github.com/EtiennePerot/misc-scripts/blob/master/udp-relay.py

import sys, socket, select

def fail(reason):
        sys.stderr.write(reason + '\n')
        sys.exit(1)

if len(sys.argv) != 2 or len(sys.argv[1].split(':')) != 2:
        fail('Usage: udp-relay.py leftPort:rightPort')

leftPort, rightPort = sys.argv[1].split(':')

try:
        leftPort = int(leftPort)
except:
        fail('Invalid port number: ' + str(leftPort))
try:
        rightPort = int(rightPort)
except:
        fail('Invalid port number: ' + str(rightPort))

try:
        sl = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        sl.bind(('', leftPort))
except:
        fail('Failed to bind on port ' + str(leftPort))

try:
        sr = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        sr.bind(('', rightPort))
except:
        fail('Failed to bind on port ' + str(rightPort))


leftSource = None
rightSource = None
sys.stderr.write('All set.\n')
while True:
        ready_socks,_,_ = select.select([sl, sr], [], [])
        for sock in ready_socks:
                data, addr = sock.recvfrom(32768)
                if sock.fileno() == sl.fileno():
                        print "Received on left socket from " , addr
                        leftSource = addr;
                        if rightSource is not None:
                                print "Forwarding left to right ", rightSource
                                sr.sendto(data, rightSource)
                else :
                        if sock.fileno() == sr.fileno():
                                print "Received on right socket from " , addr
                                rightSource = addr;
                                if leftSource is not None:
                                        print "Forwarding right to left ", leftSource
                                        sl.sendto(data, leftSource)

For added convenience, you can download the file here. Happy hacking!

Magnet handler script for Firefox on OSX

One thing I was missing when downloading torrents with ‘magnet:’ links was an easy way to transfer this link to my bittorent client (which is running on a different server). After copy-pasting many magnet links I finally decided to do something about this and write a small helper application that Firefox can call when it encounters a magnet link.

This example script will save the URL to a file in your home directory called torrents.txt but it should serve as an example to invoke other commands using the shell.

Here we go!

Step 1 - Script

Open the ‘Script Editor’ application and choose ‘Create new document’. Paste this script:

on open location this_URL
   #In case you want to display the URL that is being passed, uncomment the following line
   #display dialog "I'm doing something with this URL: " & return & this_URL

   tell application "Terminal"
      activate
      # Create a shell command to append the URL to ~/torrents.txt and exit
      set run_cmd to "echo \"" & this_URL & "\" >> ~/torrents.txt && exit"
      # Execute shell command
      do script run_cmd
   end tell

   # These three lines switch you back to Firefox, might want to change to your preferred browser
   tell application "Firefox"
      activate
   end tell
end open location

Now save the file, for example on your Desktop and with an example name of “My magnet handler”. Be sure to choose ‘File format: Application’ in the dropdown.

Step 2 - Hack the app file so it registers as a protocol handler

OSX doesn’t know yet that this new app can handler the ‘magnet:’ links so we have to hack the Info.plist that is inside the app.

  1. Go to your Desktop
  2. Right click on ‘My magnet handler.app’ and choose ‘Show Package Contents’
  3. Navigate to the ‘Contents’ folder
  4. Right click the ‘Info.plist’ file and open it with ‘Other’ –> ‘TextEdit.app’
  5. At the bottom of the file (but before the final ‘</dict>’ and ‘</plist>’ tags) add another key/array pair by pasting this block:
<key>CFBundleURLTypes</key>
<array>
   <dict>
      <key>CFBundleURLName</key>
      <string>My magnet handler</string>
      <key>CFBundleURLSchemes</key>
      <array>
         <string>magnet</string>
      </array>
   </dict>
</array>

This tells Finder that our app can handle URL’s starting with ‘magnet:’. Save the file and exit TextEdit.

Step 3 – Make finder aware of our app

This step is very counter-intuitive but locate your ‘My magnet handler.app’ file on your Desktop and move it to another folder, perhaps your home folder. Moving the file will let Finder re-read the Info.plist file and register it as a protocol handler.

Step 4 - Try it in your browser

Open your favorite torrent site and locate a magnet link. Click on it and if all went well you should be greeted with the ‘Launch Application’ dialog that already lists your application in the ‘Send to’ list. Select it, and press OK.

Your torrent URL should now be listed in a file called ‘torrents.txt’ located in your home directory!

Further expansion

Instead of echo’ing to a file, you can also run any other command you like. In my case, it’s logging (using SSH keys to prevent a password prompt) into my server and calling ‘deluge-console add’ to queue the torrent. In case you’re wondering, it looks a bit like this:

set run_ssh to "ssh 1.2.3.4 \"deluge-console add " & this_URL & "\" && exit"
do script run_ssh

Happy downloading!

A better solution to C++ enums

One of the more popular posts on this blog is about textual enums in C++. You can find it here.

I’ve received a very friendly e-mail this weekend from Anton Bachin, the author of the better enums library. Some time has passed since I originally wrote the post and C++ has improved quite a lot in the meantime, his library seems a much nicer solution! So feel free to read along but if you have a need for this functionality, definitely consider using his library instead. Thanks Anton for bringing it to my attention!

The original post has been updated with this remark as well.

Logging port access with iptables and logwatch

I’ve recently installed a program (let’s call it Foo) on my home server that requires one port (let’s call that 12345) to be forwarded from the public interface on my ADSL modem to my internal server (via NAT translation). I’m always a bit hesistant to do this kind of action so why not ease my fears and log who’s accessing this port?

This idea requires two steps:

  1. Configuring iptables to log ‘socket open’ actions
  2. Making sure my daily ‘logwatch’ run does a DNS lookup on the found addresses

Step 1 - iptables configuration

Setting iptables up to log socket access is actually quite straightforward:

#log incoming Foo connections
iptables -I INPUT -p tcp --dport 12345 -m state --state NEW -j LOG --log-prefix "Foo inbound: "

This line logs any new TCP connection to port 12345 to the kernel log and /var/log/messages.

Execute the above command in a terminal (as root) and check that the rule is working with ‘nc’:

benjamin@nas:~$ nc localhost 12345
<some garbage indicating that socket was opened>
^C
benjamin@nas:~$ dmesg -T|grep "Foo inbound"|tail -n 1
[Thu Oct 22 08:30:03 2015] Foo inbound: IN=lo OUT= MAC=00:00:00:00:00:00:00:00:00:00:00:00:08:00 SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=23309 DF PROTO=TCP SPT=37456 DPT=12345 WINDOW=43690 RES=0x00 SYN URGP=0

This message indicates that the iptables rule is working! Once you’re satisfied, you can persist this rule by adding it to your ‘/etc/rc.local’ file. There are probably nicer ways to do that but this works fine :)

Step 2 - logwatch configuration

Logwatch is an excellent tool to get a daily report about your server status. Imagine my surprise that the iptable rule automatically was processed into a neat report:

--------------------- iptables firewall Begin ------------------------

Listed by source hosts:
Logged 4 packets on interface br0
  From 1.2.3.4 - 2 packets to tcp(12345)
  From 5.6.7.8 - 2 packets to tcp(12345)

---------------------- iptables firewall End -------------------------

However, wouldn’t it be nice to see actual DNS hostnames (if available) for those addresses? After a lot of troubleshooting I found out that the ‘iptables’ service of logwatch doesn’t do lookups by default (probably for performance reasons).

Following the steps on this page you can fix that, in short it comes down to this:

# Copy default iptables module config to proper /etc directory
sudo cp /usr/share/logwatch/default.conf/services/iptables.conf /etc/logwatch/conf/services/

Now edit ‘/etc/logwatch/conf/services/iptables.conf’, search for ‘iptables_ip_lookup’ and make sure it looks like this:

# Set this to yes to lookup IPs in kernel firewall report
$iptables_ip_lookup = Yes

Now re-run logwatch manually and verify the results:

benjamin@nas:~$ sudo /usr/sbin/logwatch --hostformat split
<cut out a lot of stuff for this example>

 --------------------- iptables firewall Begin ------------------------

 Listed by source hosts:
 Logged 4 packets on interface br0
   From 1.2.3.4 (bla.blah.com) - 2 packets to tcp(12345)
   From 5.6.7.8 (some.otherdomain.com) - 2 packets to tcp(12345)

 ---------------------- iptables firewall End -------------------------

Mission accomplished, happy hunting :)

Easy chroot jail creation

While setting up an SSH jump host I had the need for a small chroot environment that users would end up in. The ‘regular’ way is to create a jail directory somewhere, set up basic directories (/bin /etc and so on) and proceed with copying the desired binaries into the jail. The next step is to use ‘ldd’ to figure out which dynamic libraries need to be copied into the jail. This is a lot of work!

Luckily (instead of getting some random script online and hoping it works fine) Debian includes a package called makejail. Makejail reads a small python file, this is an example (let’s call it test.py):

chroot="/jail"
cleanJailFirst=1
testCommandsInsideJail=["bash", "nc" , "nologin"]

Now run this command:

makejail test.py

Makejail will now create the jail in ‘/jail’ (and clean any existing stuff in there if it exists already), copy ‘bash’ ‘nc’ and ‘nologin’ into the jail and figure out the library dependencies. Easy!

Running autossh with OSX automator

On my work OSX laptop I have a need to have some ports forwarded to my NAS at home, until now I’ve been manually running the ssh command (using a script) but this becomes very annoying when connections drop etc. In an effort to automate things, I wanted to run autossh automatically in the background.

I followed this guide and everything was working, however now I got stuck with a rotating wheel icon in the status area (near the clock). That became annoying quickly so I found this stackexchange answer to guide me in the right direction.

Instead of running the a shell script action in automator (as the initial guide suggested), I now have an apple script that executes autossh directly (and in the background). Here it is for completeness:

on run {input, parameters}
   ignoring application responses
      do shell script "/opt/local/bin/autossh -M 20000 [rest of ssh parameters] -N [hostname to connect to]& &>/dev/null"
   end ignoring
end run

This runs the script in the background, you can check with ‘ps’ if autossh is actually running. No more spinning wheel!

Fixing bash tab completion in XFCE

On my headless Linux NAS I’m running a VNC server to run the occasional X11 program remotely. Because I don’t need a full desktop environment, I used XFCE. However, when using a terminal session I noticed that tab completion in bash was not working.

As it turns out, XFCE maps the tab as a ‘switch window key’ preventing tab completion from working properly. Luckily this post on the ubuntu forums shows how to fix it (paraphrased here in case the original post disappears):

  • Edit the file ~/.config/xfce4/xfconf/xfce-perchannel-xml/xfce4-keyboard-shortcuts.xml
  • Find this line:
<property name="&lt;Super&gt;Tab" type="string" value="switch_window_key"/>
  • Change it to this:
<property name="&lt;Super&gt;Tab" type="empty"/>
  • Restart the VNC server:
vncserver -kill :1
vncserver

Now things should be working again!

Removing partial duplicate file names with awk

I needed to clean up a bunch of files recently that contained both a common and unique part, something like this:

Show_Episode1-ID_12345.mp4
Show_Episode1-ID_67890.mp4

Note that there are two copies of ‘Episode1’ with a diferent ID part. Obviously I would only like to keep one of each episodes and ignore the whole -ID... part. This is how I solved it:

for i in `ls -t *mp4|awk 'BEGIN{FS="-"}{if (++dup[$1] >= 2) print}'`; do mv -v $i dup; done

So what happened here?

  • The directory listing is sorted by timestamp (newest first) so it favors the most recent versions.
  • The awk FS (field separator) is set to “-” to use the common part of the file name as the first field.
  • Now awk loops over each file name. It uses the common part of the file name (“Show_Episode1”) as an index into an array. The default counter value is 0 and any repeated file names will increase it to a value of >= 2.
  • If the counter value is >= 2, awk prints the complete file name (using the ‘print’ command). Note that this part only prints duplicates, the first file is never printed.
  • The output of the above steps are fed into a ‘for’ loop to serve as input to the ‘mv’ command that moves only the duplicate files to a separate ‘dup’ dir.

Notes on ZFS

I’ve recently upgraded my NAS to a HP N54L microserver and I decided it was time to migrate to ZFS. Luckily enough, ZFS on Linux became stable enough with version 0.6.3 to be used in production so this was good timing. ZFS is an interesting file system, it uses quite a bit of RAM but it is very flexible and provides per-block checksumming. A nice presentation can be found here: http://www.cs.utexas.edu/users/dahlin/Classes/GradOS/papers/zfs_lc_preso.pdf

To get me started, I followed this guide: http://www.andybotting.com/zfs-on-linux. It contains the basic setup commands and also provides a fix for the potential problems you can encounter with 4096-byte sector harddisks (most modern drives have this). Be aware that this guide doesn’t set a default mountpoint for the pool, this means specifying each filesystem mounptoint yourself (or just enable the default pool mountpoint). Some additional tips/notes can be found here: http://www.allanjude.com/bsd/zfs-zpool.html.

To get more in-depth information, there is an excellent manual provided by Oracle (never thought I’d ever say that..) here: http://docs.oracle.com/cd/E19253-01/819-5461/index.html. It covers most scenarios and contains a lot of examples. In my case, I started out with a pool on 1 drive, moved my data to it and then converted the pool to RAID-1 using the ‘zpool attach’ syntax. All this is covered in the manual.

Overall, I’m pretty satisfied with ZFS. I’ve skipped the native ‘exportnfs’ and ‘exportsmb’ functionality and just configured my /etc/exports and /etc/samba/smb.conf files myself, I heard there are still some bugs to be worked out in this department so I went the manual route. Also, the ability to specify that some filesystems should store two copies of each file (under the hood) is pretty cool and especially valuable for important data :)

Don’t forget to ‘cron’ a weeky ‘zpool scrub’ and not to fill the pool over 80/90% (opinions vary it seems).

Find the longest filename in a directory tree

Ever wondered what the longest filename is in a directory tree? This command will tell you:

ben@xyz:/srv/blog$ ls -R | awk '{ print length, $0 }' | sort -rn | head -1
73 reorganising_large_directories_with_efficient_remote_rsync_update.doctree

On a similar note, this commands prints the longest path (so directories+filename) length:

ben@xyz:/srv/blog$ find | awk '{ print length, $0 }' | sort -rn | head -1
101 ./blog/html/_sources/2013/12/04/reorganising_large_directories_with_efficient_remote_rsync_update.txt

Moving large directories

I had a need to move a large directory tree on my Linux server. For this there are a number of options:

Using ‘mv’

Of course you can just issue ‘mv /sourcedir /destinationdir’ and be done with it. The downside is that if you interrupt the process, both source and target directories will be left in an inconsistent state. There is no easy way to resume the process.

Using rsync

Rsync is a Swiss army knife for a number of file-related operations and of course you can use it for local move operations as well. Rsync offers the big improvement of being able to interrupt and resume the move process in a smart and safe way. One of the limitations hower is that, even though it can delete the source files, it will leave you with a source directory full of empty subdirs. First of all, let’s move all files:

rsync -avr --remove-source-files /sourcedir/ /destinationdir/

Note the ‘–remove-source-files’, it does exactly what you think it does (after files have been successfully transfered). So what to do afterwards with the tree of empty subdirs? This is a nice trick I learned:

rsync -av --delete  `mktemp -d`/ /sourcedir/

This effectively syncs an empty directory to your sourcedir and from my (and other peoples experience) this is actually the quickest way to delete a large directory tree, even if there are files in it. It is supposed to be 15% quicker than ‘rm -rf’ due to ordering advantages but I’ll let you decide this for yourself.

Using tar and rm

While the rsync solution seems nice, it sometimes is a bit slow between two local disks. You can of course do ‘cp -a /sourcedir /targetdir’ beforehand and rsync afterwards but it seems to be even quicker to use tar for this purpose:

(cd /sourcedir ; tar cf - . ) | (cd /destinationdir ; tar xvpf -)

I read this trick on Stackoverflow and it seems to be a bit quicker indeed. I’ll let you decide this for yourself as well :)

Conclusion

For my moving task, I actually decided to combine both the ‘tar’ and ‘rsync’ tricks. This made for a quick copy, followed by rsync checking if everything was in sync and deleting the source files. Afterwards I used the ‘rsync to empty dir’ method to quickly delete all empty subdirs in the source directory.

Two git tricks

Two tricks I needed today and definitely want to save for future reference :)

Trick 1: How to reset a ‘master’ branch to a different branch and push it to the remote repository

Nice instructions can be found here: http://stackoverflow.com/a/3790682

Note 1: The above instructions force-push all branches to your specifc version, in step 4 it would be useful to specify that you only want the ‘master’ branch pushed :)

Note 2: You might have to do a ‘git reset –hard origin/master’ afterwards on other working copies that previously checked out the ‘master’ branch to resolve the merge conflict hell that can arise :)

Trick 2: Undo a force-pushed action on the remote repo

And as a result from the first note in the previous point, here’s how to use the reflog to undo a change you already pushed to remote: http://stackoverflow.com/a/12569664

Bonus trick: Diff the same file between two different branches

Use git difftool and note the ‘–’ separator which indicates filenames will be specified starting from that point.

git difftool branchname_1 branchname_2 -- Some/Directory/File.txt

Reorganising large directories with efficient remote rsync update

I’ve recently ran into a scenario where I wanted to re-organise my photo collection (basically move some files around). This folder is mirrored to a remote server with rsync for backup purposes. Rsync unfortunately has no way of detecting file moves and will gladly proceed to re-uploading any files you moved. Pushing 40GB of redundant updates through my home ADSL was painful, I wish I had known about this beforehand :)

However, for future reference here is a nice guide on how to prepare for this scenario and let rsync actually detect the moves via an intermediate step involving hardlinks: https://lincolnloop.com/blog/detecting-file-moves-renames-rsync/