Python is Awesome!

Every now and then, I visit CodeWars and try some Katas to get inspired and hopefully learn some new tricks in Python.

Figuring out a solution yourself is rewarding, but comparing your solution afterwards to others is where it really shines and where you learn to really make the most out of Python as a language, and do some code review on your own code.

One of this weeks challenges was Kebabize and is about string transformation. Below is my initial solution;

def kebabize_v1(string):
    string = ['-' + char.lower() if char.isupper() else char for char in string]
    string = [char if not char.isdigit() else '' for char in string]
    string = ''.join(string)
    string = string.strip('-')

    return string

assert(kebabize_v1('myCamelCasedString') == 'my-camel-cased-string')
assert(kebabize_v1('myCamelHas3Humps') == 'my-camel-has-humps')
assert(kebabize_v1('SOS') == 's-o-s')
assert(kebabize_v1('42') == '')
assert(kebabize_v1('CodeWars') == 'code-wars')

Improvements

This solution runs fine and I don’t think it’s horrible, but after comparing to some of the other solutions on the page there is definitely room for some improvements.

  1. not char.isdigit() might be replaced by char.isalpha() for a more general check and a lighter cognitive load by removing the negated logic statement, though it wasn’t needed in this specific challenge.

  2. It’s quite verbose, let’s look at how we can reduce the loc a bit.

def kebabize_v2(string):
    string = ['-' + char.lower() if char.isupper() else char for char in string if char.isalpha()]
    string = ''.join(string).strip('-')

    return string

assert(kebabize_v2('myCamelCasedString') == 'my-camel-cased-string')
assert(kebabize_v2('myCamelHas3Humps') == 'my-camel-has-humps')
assert(kebabize_v2('SOS') == 's-o-s')
assert(kebabize_v2('42') == '')
assert(kebabize_v2('CodeWars') == 'code-wars')

if char.isalpha() added to the first statement removes the need for the second list completely! This feels like one of those chess moves that would be marked with an exclamation point for how beautiful it is.

As a second step, there’s not really a need to have two separate lines of code to join the list back as a string and then strip it when we might just as well chain the operations.

Keeping a separate return statement might be unneccesarry, but it’s something I rather like as a habit, due to the fact that when starting the skeleton of the function, I can just return the input and have my assert statement in place and then just add more transformations as needed without having the modify the return statement.

We might go even further though, and end up with only one line of code;

def kebabize_v3(string):
    return ''.join('-' + char.lower() if char.isupper() else char for char in string if char.isalpha()).strip('-')

assert(kebabize_v3('myCamelCasedString') == 'my-camel-cased-string')
assert(kebabize_v3('myCamelHas3Humps') == 'my-camel-has-humps')
assert(kebabize_v3('SOS') == 's-o-s')
assert(kebabize_v3('42') == '')
assert(kebabize_v3('CodeWars') == 'code-wars')

The Zen of Python

This is similar to the highest voted solution on CodeWars and it’s quite clever. To me though, this is adding complexity and reducing readability when too many transformation operations are nested inside one statement.

To some extent, this is of course a matter of how familiar you are with the constructs of the language, but like above with removing negated logical statements, I feel that a bit more verbosity is worth it if it reduces the cognitive load when reading the code. I would thus settle for v2 as of now at least and refer to the Zen of Python; Simple is better than complex and Readability counts, though someone might argue that v3 wins because of Beautiful is better than ugly.

import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Benchmark It

One additional concern to the readability and beauty of the code that sometimes needs to be taken into consideration is of course its performance. This three solutions are very similar so I wouldn’t expect any big differences but as it’s said; assumption makes an ass of u and, uhm, umption. Anyhow; trust but verify!

import timeit

test_string = 'myCamel1CasedAWESOME1String'
n_runs = 1000000

print("V1: {}".format(timeit.timeit(lambda: kebabize_v1(test_string), number=n_runs)))
print("V2: {}".format(timeit.timeit(lambda: kebabize_v2(test_string), number=n_runs)))
print("V3: {}".format(timeit.timeit(lambda: kebabize_v3(test_string), number=n_runs)))
V1: 7.8543304819613695
V2: 6.699023591994774
V3: 7.412952462967951

Interesting! Repeating this multiple times yields the same results so something is clearly making the first and third version perform worse than the second.

For the first one, it’s quite clear that the second list operation is causing extra computation, but what is happening with the v3?

Upon investigation, the difference is that while in v2 I’m passing a list to the join function, in v3 it’s a generator expression that then gets evaluated dynamically. This turns out it adds a not insignificant overhead.

Ensuring the list is generated before the join is called solves this problem and ensures an equivalent performance between v2 and the slightly adjusted short version;

def kebabize_v4(string):
    return ''.join(['-' + char.lower() if char.isupper() else char for char in string if char.isalpha()]).strip('-')

assert(kebabize_v4('myCamelCasedString') == 'my-camel-cased-string')
assert(kebabize_v4('myCamelHas3Humps') == 'my-camel-has-humps')
assert(kebabize_v4('SOS') == 's-o-s')
assert(kebabize_v4('42') == '')
assert(kebabize_v4('CodeWars') == 'code-wars')

print("V4: {}".format(timeit.timeit(lambda: kebabize_v4(test_string), number=n_runs)))
V4: 6.750837345025502

Takeaways

  • Generator Expressions - The lazy evaluation of generator expressions is oftentimes convenient, but seems to also cause some overhead at runtime at least in some cases which indicates it should be a deliberate decision when to use and not. On the upside for generator expressions; they consume less memory.
  • Trust but verify - On the surface trivial changes in code can turn out to have significant impact on performance. Thus, having benchmark tests ready is of great help when refactoring to avoid potentially negative surprises.

Comments