Tail Recursion

Adventures in Erlang

erlang library manager

If you’ve ever wanted to install and update all your favorite erlang libraries with a single command, you should check out elm.

I’ve grown tired of manually checking for updates and writing verbose install scripts for my servers. All you have to do is store your repository urls in a single file, and elm can figure out how to pull the source and build it. Check out the readme for complete instructions and a full list of examples.

erlsom easy install

I’ve been using erlsom on my production servers for almost a year now – it’s parsed billions of xml documents in that time without an issue. My only problem with the library is how unpleasant it is to install. The build system is overly complex, the files are all dos formatted, it compiles with warnings, it’s hosted on sourceforge, etc. I cleaned up all of these issues and added the code to github. Please see the readme for more information about what I changed.

node communication and automatic discovery on ec2

So you’ve signed up for ec2 and figured out how to create instances with erlang installed. Sweet. I’m going to show you how to get your erlang nodes to communicate – first if we know the IP addresses of both machines, and second if we want the nodes to automatically discover each other.

I’m assuming at this point that you have two instances: I1 and I2, which are both members of the security group S. The first thing you need to do is edit the security group settings to allow full port access between all members of S. If you are using the management console, just add a new entry, leave everything blank but add S as the source and hit save. After a refresh it will show that icmp, tcp, and udp are all open for all members of S.

Open ssh connections to I1 and I2 in two terminals. Note the private DNS names of both instances – it’s the hostname followed by “.ec2.internal” (on us-east). For example: ip-10-200-100-50.ec2.internal

On both instances:

$ erl -name test@`hostname`.ec2.internal -setcookie test

If you don’t know what these parameters do, see the distributed programming section from this doc.

Assuming the private DNS name of I2 is ip-10-200-100-50.ec2.internal, on I1 we can do this:

> net_adm:ping('test@ip-10-200-100-50.ec2.internal').

If you received a pong message, congratulations – your nodes know about each other and can now communicate! Try running nodes() on each.

Next let’s have a look at doing this in a more automated way. Leave I2 running, but quit out of the erl shell on I1 and install Eric Cestari’s ec2nodefinder. As his README points out, you will need to either set the environment variables AMAZON_ACCESS_KEY_ID and AMAZON_SECRET_ACCESS_KEY, or you will need to edit the ec2nodefinder.app file in the ebin directory. Be careful of the commas in the app file if you go that route – if the file isn’t syntactically correct, the app won’t start. I like Eric’s version of ec2nodefinder better than the original version from the dukes, because it’s easier to install and it doesn’t rely on ec2-describe-instances.

Start your node on I1 again:

erl -name test@`hostname`.ec2.internal -setcookie test

You can now do:

> ec2nodefinder:start().

> ec2nodefinder:discover().

Which, if successful, will return something like:

{ok,[{'test@ip-10-200-100-50.ec2.internal',pong}]}

Calling nodes() on either node should again show they are connected. Awesome. Now you can do all kinds of crazy stuff like have nodes which automatically become part of an mnesia cluster when they start.

ecache

I’ve been looking for a simple cache with expiring values that I can use for things like query caching – essentially something like memcached but without all the setup and the port communication. I decided to write ecache to address this need. It’s fast and incredibly easy to start using, Just call application:start(ecache) and you’re ready to start loading and storing!

string similarity and more

I added estring to github today. It’s a string library which contains many useful functions including string similarity. Check this out:

> estring:similarity("ownage", "pwnage").
0.8333333333333334

estring is also full of goodies like:

> estring:begins_with("fancy pants", "fancy").
true
> estring:is_integer("35").
true
> estring:random(32).
"fBS6EsK4ODKimQjInBzaIDysdJ9ulmc3"

tons of open sockets

If you have ever tried to open hundreds or thousands socket connections in erlang, your efforts may have been met with some resistance. The following worked for me:

1) edit your limits configuration. On ubuntu systems it’s: /etc/security/limits.conf. You’ll want to increase the “number of files” limit to some large number. For example:

* hard nofile 16384
* soft nofile 16384

I think you need to reboot in order to make the changes take affect.

2) set the shell variable ERL_MAX_PORTS to another large number. In ubuntu I added export ERL_MAX_PORTS=4096 to my .bashrc. Make sure the variable is set before starting your erlang process.

3) if you are doing this to open a lot of web connections via ibrowse, be sure to call ibrowse:set_max_sessions in your code.

damerau–levenshtein in erlang

The Damerau–Levenshtein algorithm is like the Levenshtein edit distance algorithm, with the exception that two-character transpositions are treated as a single edit. I needed an implementation in erlang and couldn’t find one, so I started with this and added the necessary recurrance and bookkeeping mechanisms. It’s less than 30 lines of code but it took me a while to understand and get right so I hope someone out there finds this useful. I’ll be the first to admit the code is unreadable – going through this example may be helpful.

-module(levenshtein).
-export([distance/2]).

distance(Source, Source) -> 0;
distance(Source, []) -> length(Source);
distance([], Source) -> length(Source);
distance(Source, Target) ->
    D1 = lists:seq(0, length(Target)),
    outer_loop([[]|Source], [[]|Target], {D1, D1}, 1).

outer_loop([S1|[S0|S]], T, {D2, D1}, I) ->
    D0 = inner_loop(T, [S1, S0], {[[]|D2], D1, [I]}),
    outer_loop([S0|S], T, {D1, D0}, I + 1);
outer_loop([_S|[]], _, {_D1, D0}, _) ->
    lists:last(D0).

inner_loop([_T|[]], _, {_D2, _D1, D0}) ->
    lists:reverse(D0);
inner_loop([T1|[T0|T]], [S1, S0], {D2, D1, D0}) ->
    [S1T1|[S1T0|_]] = D1,
    Cost = if T0 =:= S0 -> 0; true -> 1 end,
    NewDist1 = lists:min([hd(D0) + 1, S1T0 + 1, S1T1 + Cost]),
    NewDist2 =
        if T1 =/= [] andalso S1 =/= [] andalso T1 =:= S0 andalso T0 =:= S1 ->
                lists:min([NewDist1, hd(D2) + Cost]);
           true -> NewDist1
        end,
    inner_loop([T0|T], [S1, S0], {tl(D2), tl(D1), [NewDist2|D0]}).