A day with mypy: Part 2

Python type checking: my experience adding type annotations to pycodestyle

Posted by Daniel Moisset 1 year, 8 months ago Comments

This is the second part of a 3-part post about my experiences using mypy to add type information to a real world open source project (I chose pycodestyle). I recommend you to read at list the first section of Part 1 if you haven’t yet.

Last time I told you what was mypy about, what questions I was trying to answer through my experiment, and what was the process I was following. This part will focus on the most significant insights that I found along the way. Part 3 has some conclusions and my answers to the questions that motivated the series.

The collection below is large and I start with negative things and move over to positive things. If you get bored in the middle at least skim the later sections or the conclusions, or you might end up with a view biased in a negative direction. I tried to be exhaustive about every interesting thing I found, so this is a large post. The next part of this series will be quite shorter and give a summarized conclusion, what you have here is the raw “data” to form that conclusion.

This section has a lot of examples from real code. To keep this from being even longer than it is I removed irrelevant code or expressions and wrote those as ...

The missing parts

Some things are obviously missing from mypy but I’m quite sure they will be there in future versions.

The first significant example are typesheds for the standard library. These cover the most popular parts, but not everything. In my case, the most blatant examples were the keyword and optparse modules. So imports for those modules threw warnings and anything I did on objects from those modules (like the OptionParser) was unchecked. Not having some components may lead you to a false sense of security. At some point I had declared a function as taking an argument of type optparse.OptionParser. After some review I noted that it actually took an optparse.Values, but mypy wasn’t able to tell me I was wrong because it thinks that anything inside optparse is an Any which means “everything is ok with this”. So I ended up with a misleading type-label which is arguably worse than no type declarations at all.

One function that was missing in a typeshed supposed to be complete was tokenize.generate_tokens. This may not be exactly a bug, because that function is not in the documented API, even if the python code is there and perfectly usable. So if you’re using undocumented APIs you might find some false positives like this which are hard to correct:

tokengen = tokenize.generate_tokens(self.readline)
pycodestyle.py:1695: error: "module" has no attribute "generate_tokens"

Something else that looks incomplete is the generation of error messages. Most of them are clear but not always. At some point I got to a (more complicated version of):

z = x.strip() + y.strip()

Which produced the following error message:

error: Some element of union has no attribute "strip"

On one hand it’s not easy to know which of the calls to strip() is producing a problem. Also, “some element of union” implies (but it’s not obvious the first time) that the expression involved (let’s say x) may have one of many possible types. This probably means that we thought x was always a string, but there’s some code path that could assign it something else, let’s say a list. It would also be useful to know what actually is this other type to find the problem, but mypy does not provide this information (which actually is available inside mypy). This may be a design trade-off: showing all the information may be overwhelming, so the hard question is deciding what’s relevant.

Other part that looks a bit incomplete right now is the documentation. The one available now is actually quite nice as a tutorial but there are some gaps in some frequently used parts of the language. There is no indication on how to specify the types of open arguments or keywords arguments. For example if I have a function product(*args) that multiplies all its integer arguments, my first intuition was to type it as product(*args: List[int]) but the correct way to do it is indicate the type of each element. A “reference” documentation could be nice to have in the future. Some important things that are not around in the documentation are present in PEP-484, so the PEP itself is worth a read.

Other undocumented part of mypy is around the included types. Even looking at the code of the typing module, it’s nice to know that a generic Generator type is available (for functions that return or take generator objects), but it has 3 generic arguments and there’s no clue of what is each. I still don’t know, my guess is that two of them are the types returned by next() and accepted by send(), but I’m not sure what’s the third one or which is actually the order of the three.

Some buggy behaviour

Other problems I found during my experiment are small bugs. I reported some of these to the team (when they weren’t already reported) and the community seems quite nice and willing to improve, so this part of the post hopefully will get old very quickly. However to make my description complete I’ll include my findings in this section:

  • Some of the provided typesheds don’t seem to have the appropriate level of strictness. For example, the frozenset typeshed says that most operations should accept an AbstractSet and this makes mypy complain about perfectly valid python like

    ARITHMETIC_OP = frozenset(['**', '*', '/', '//', '+', '-'])
    WS_OPTIONAL_OPERATORS = ARITHMETIC_OP.union(['^', '&', '|', '<<', '>>', '%'])
    pycodestyle.py:99: error: Argument 1 to "union" of "frozenset" has incompatible type List[str]; expected AbstractSet[str]

    The actual argument for frozenset.union should accept any iterable, not just a set. I think problems like this will be a frequent problem, given that the python library is not very strictly specified and a lot of python code tends to be fuzzy about this questions (can I pass this function any kind of iterable or must it actually be a list?). On the other hand, I think this is one of the places that widespread usage of mypy will be good because it will add clarity and explicitness about these decisions

  • If a function can return, say, an int, but may also return None, the return type should be written as Optional[int]. But I forgot about that several times, and even if I noticed my mistake later mypy wasn’t detecting it (even if it was obvious statically, i.e. the function had some return None statements). I understand that’s a work in progress and mypy 0.4.2 already has more work in Optional that I should try.

  • I had some issues where type variable aliases are not equivalent. For example, mypy offers a type variable called AnyStr that can be byte or unicode strings. As most python file handling functions accept both as path names, AnyStr may be useful for those situations. And actually, being able to say “this argument is a file path” is pretty cool for documenting purposes, so I added some code at the top saying

    from typing import ..., AnyStr, ...
    # ...
    FilePath = AnyStr
    # ...
    def readlines(filename: FilePath) -> List[str]:

    But this actually said on the function definition:

    error: Invalid type "pycodestyle.FilePath"

    So that kind of aliasing didn’t seem to work. If I change that line to the apparently equivalent:

    def readlines(filename: AnyStr) -> List[str]:

    mypy stops complaining. But actually I ended up going for a normal type-alias

    FilePath = str

    which loses some flexibility (states that all paths must be unicode) and also fixes the issue.

  • I found an occasion where I had a variable that could have two different kinds of tuple (depending on the code path, so that makes it a Union type in mypy terminology), and mypy was refusing to iterate on it. I couldn’t replicate it in a small scale and ended up refactoring that code, so I’m not 100% sure if I was doing something wrong myself, but the error message wasn’t helping. It may have been another case of mypy reporting that it can not do something on a union type, and actually the union had some type that I didn’t know about.

  • Another issue that looked like a bug (even if it isn’t) was around the Callable generic type (it’s a type used to describe callable objects like functions, methods and classes). To describe a type of a callable you need to include the types of the arguments and the result. For example, a function that accepts a string and integer arguments and returns a bool, would have type Callable[[str, int], bool]. The problem is that if you now add a default value for the int argument, now the function can also have type Callable[[str], bool]. This looked like it was confusing mypy on this code:

    if sys.version_info < (3,):
        stdin_get_value = sys.stdin.read
        def stdin_get_value() -> str:
            return ...

    mypy then complains with:

    error: Incompatible redefinition (redefinition with type Callable[[], str], original type Callable[[int], str])

    The original type seems to indicate that mypy can’t handle the default argument for read. Actually, it is assuming one of the possibilities. If you turn this condition around

    if not sys.version_info < (3,):
        def stdin_get_value() -> str:
            return ...
        stdin_get_value = sys.stdin.read

    when it gets to the sys.stdin.read it is already trying to fit it into a known type so it chooses the correct version. When there are many definitions of something, mypy tries to take the topmost as the “true” type, and any inconsistencies below are flagged as mistakes. I actually fixed that problem adding an explicit annotation which tells mypy which version of the function to use.

    if sys.version_info < (3,):
        stdin_get_value = sys.stdin.read  # type: Callable[[], str]
        def stdin_get_value() -> str:
            return ...

    The # type: X comments are the way to provide extra typing information to mypy besides the function annotations

Some paper cuts

Many problems I found were tiny, and actually about mypy being pedantically correct, but in a way that didn’t feel like the tool was being helpful. Much of this section may also be taken as personal preferences. What I’d like a type checker to do is help me provide useful information while not getting in my way, but I found several small usability problems. Nothing that I list here is a show-stopper, but stuff that feels more like a “usability” issue:

As a first example, pycodestyle imports the OptionParser class from the optparse module, creates an instance, configures it, and parses the arguments, which optparse returns as a Values object. In the original code, pycodestyle doesn’t care that this is a Values object, it just uses it. However, my version need to explicitly say that some functions accept/return optparse.Values, and given that I need to “talk” about this type, I need to import it. It’s a minor pollution to the namespace, but I seem to end up importing more stuff than originally, which I’d rather avoid (I’m concerned it may introduce possible circular imports, or force to expose library internals).

There’s also some “visual” pollution in the code. Function definitions end up longer, and if following PEP8 they tend to break into multiple lines which look uglier. Also, there’s no defined style guide for breaking type annotations.

Something that add to the “syntactic” issues is that there are two symbols in the syntax to say “has type”: -> for results and : for parameters. On many occasions I used : instead of -> which was a syntax error that confused the hell out of mypy (dozens of “inconsistent indentation” error messages on a single mistake). I know this is partly Python’s fault, but it affects the experience.

Going more into the “preference” or personal “opinion” terrain, the union syntax feels bulky. Union[int, str] means “something that can be either an int or a str”. The notation (especially for more complicated types inside) can be hard to read; I probably would have preferred something like int || str, or int+str. Same for tuples; you can say Tuple[int, str] for 2 element tuples, I would have liked to write (int, str) instead.

The syntax for declaring types of attributes and locals has its quirks too. When python 3 was introduced, it only added annotations in function declarations. So the annotations for object fields and locals was specified as a special comment starting with type:. But it is less practical in more complicated situations: you can specify variable types in initializations like foo, bar = 1, True with a comment # type int, bool. But if you have some initialization foo = bar = None, and you want to set different types to foo and bar you have to split the assignment; I found some examples of this in pycodestyle:

path = nrows = None

Which I had to turn into

path = None  # type: Optional[str]
nrows = 0

Functions that may return None or not can not have a plain return, you have to explicitly say return None. But I may be “wrong” to complain, given that I see that newer style standards for python recommend this (I found several instances in pycodestyle, ironically not following itself this style guideline). It didn’t feel like I was improving anything while I was fixing those. As a side note about these, when referring to the type of None you use None instead of NoneType. I know the latter is something usually less visible to python users, but for me the inconsistency looked weird each time I used it.

There were also some usual python patterns (which I don’t believe are against the style guide) that I was forced to break. One of them was the possibility of chaining return super()... in functions with no return values is not allowed:

class P():
    def f(self) -> None:

class C(P):
    def f(self) -> None:
        return super().f()
foo.py: note: In member "f" of class "C":
foo.py:9: error: No return value expected

I understand why this happens (and I know that removing the return and just leaving the call to super()) would fix this, but for me the return+super pattern is so ingrained in my finger memory, and makes the code more robust under refactors, that I don’t feel comfortable having to “fix” this.

Having a type system with type variables (that allows you to say for example that List[T] has a method count that takes a variable of type T) is fantastic and adds a lot of flexibility, but the way in which those type variables are implemented and created as objects is not completely intuitive. For example mypy provides a AnyStr type variable which is a bit confusing because it’s not a type. I understand why it’s needed (for functions that operate both with strings and bytes but need some kind of argument and/or result type matching), but I’m not sure that type variables should be exported and shared, from a user standpoint they are hard to tell apart from types and that may lead to surprises.

Another nice feature of mypy that is usually great but occasionally complicated is type inference. mypy is quite good inferring types of local variables. If your function has an x = 3 it assumes x is an int unless you tell it otherwise. However it doesn’t do it at all in function arguments. If you write:

def f(z: str, x=3) -> None: ...

Then it knows that z is a string, but thinks that x is Any (so you will be allowed to write x[0] in the function body, or call f("", "4")). For those functions where the default has the actual type (happens quite often, if not always because having a default of None is also quite common), having to explicitly state it again in the declaration ends up feeling verbose and redundant.

For variables of container types, mypy has a harder time figuring out their types automatically when the container is initialized empty (which is a very frequent case.) For example, when mypy finds code like

items = []

it won’t be able to infer the type of the items variable, so it will show a warning asking to explicitly type it. However, it is smart enough to figure out some common situations, for example, if you have something like

def f() -> None:
  items = []
  for x in range(-10, 10):

mypy infers that the list is a List[int], because you’re appending int values to it. That’s really nice and it saves some work, but it doesn’t seem to be consistently smart. Code which is quite similar like

def method(self) -> None:
  self.items = []
  for x in range(-10, 10):

makes it lose the ability to infer, forcing the user to enter a lot of details.

Something else that needs some work on usability is how the tool responds (or doesn’t) when the user makes a mistake (which is something that will happen). I already mentioned some examples of unclear error messages or reactions to syntax errors. At some occasions, I also forgot to add an annotation to a function that returned a value. When no annotation is added, the implicit return type is Any, which means that any return value is ok. Probably assuming that the function has no return value (None) would have made more sense if I have a partially annotated function, otherwise some obvious mistakes might slip by.

Another problem of that kind (where I spent some time thinking I had found a mypy bug), happened around this code:

CheckResult = Tuple[Union[int, SourcePosition], str]
Check = Callable[..., Iterable[CheckResult]]
def register_check(check: Check, codes: CheckCodes=None) -> None:
def init_checks_registry() -> None:
    mod = inspect.getmodule(register_check)
    functions = cast(
        Iterable[Tuple[str, 'Checker']],
        inspect.getmembers(mod, inspect.isfunction))
    for (name, function) in functions:

I added the cast because mypy has no idea what the inspect.getmembers function is going to return, so I provided the extra information. The last line was reported to have the following error:

error: Argument 1 to "register_check" has incompatible type "Checker"; expected Callable[..., Iterable[tuple(length 2)]]

Which sounded really silly to me because they seem to be the same... I ended up adding a #type: ignore flag to silence this and didn’t find my error until I reviewed this to write this post: I wrote Checker instead of Check (and Checker happens to be an existing, unrelated class). The mistake here is obviously mine, but a conclusion I find here is that even if type systems are supposed to prevent several mistakes and hard bugs that come from mistyped variable names, in this case it provided me the tool to create new errors, this time at the level of the type system instead of the code. One thing here is that type aliases are unnamed, if I had got the message has incompatible type "Checker"; expected "Check" the error would have been obvious.

Dealing with Python dynamic idioms

I encountered several occasions where, as I expected, a piece of code that is a bit dynamic although obviously right for me, there’s no way within the type system to describe what I’m doing without refactoring the code to something a bit more complex/ugly. This is probably the part of mypy that worries most Python fans.

The most idiomatic pattern in python that uses types flexibly is duck typing. If you mix types that are roughly compatible but mypy doesn’t know what is the common interface that you care about, it may reject code, even if you try to add explicit information. I found the following case:

valid_hangs = (4,) if indent_char != '\t' else (4, 8)
hanging_indent = hang in valid_hangs

It’s very easy to see that valid_hangs is just used as a collections of int, so you can easily check later if hang (which is an int) is in that collection or not. However mypy sees that (4,) has type Tuple[int], (4, 8) has type Tuple[int, int] so it infers that the “common type” of both for valid_hangs is object. Then it proceeds to say that hang in hangs is mistyped because an object isn’t a container. So, I tried to give mypy a hint:

valid_hangs = (4,) if indent_char != '\t' else (4, 8)  # type: Tuple[int, ...]

However this produces an error because mypy now knows the type that I explicitly gave to valid_hangs, but it’s still convinced that the expression produces an object, which is a type error in the assignment. In the end, I had to force it like:

valid_hangs = cast(
    Tuple[int, ...],
    (4,) if indent_char != '\t' else (4, 8))

The cast function provided in the typing module doesn’t do anything in runtime, but it’s a way to force the type of an expression, but in this case it does nothing more than obfuscating the code a bit just to placate the type checker.

Having mixed types happens more often that what’s apparent, particularly around conditional statements. Python has a bool type, but any type of value can be used as a condition in if or while statements. So many functions in my sample code base that return something that should be used as a condition do not actually return a bool but something else. So I ended up declaring:

noqa = re.compile(...).search  # type: Callable[[str], object]

Using object is reasonable in python to say (something that can be used as a bool) but doesn’t add clarity about how this function is going to be used. It might be useful to be able to say “something that can/will be used as a bool”, there’s nothing like that yet

As a generalization of the above, it’s not possible to talk about a protocol instead of a specific type. There are some standard protocols defined (like Sequence or Iterator) but if you want something more specific (like “something with a read function returning str”) or related to your own classes.

The short circuit evaluation in boolean operators produces union types that tend to confuse the checker in the long run. Take a look at this:

match = noqa and COMPARE_SINGLETON_REGEX.search(logical_line)
if match:
    ... match.group(...)

mypy infers the type of match to be Union[bool, re.Match]. So when two lines below the code tries to read the group attribute, mypy complains that the attribute might not be present if the value is a boolean. Actually the if should guarantee that it isn’t but realizing that is beyond the power of mypy (at least the version I used). So I had to refactor to an uglier form of the code to avoid the error:

if noqa:
match = COMPARE_SINGLETON_REGEX.search(logical_line)
if match:
    ... match.group(...)

Other python pattern that mypy doesn’t understand is dynamically checking for attributes. When faced with the following code, where line is a str:

if hasattr(line, 'decode'):   # Python 2
    length = len(line.decode('utf-8'))

You get a message saying that error: "str" has no attribute "decode"; maybe "encode"?. For a human reader, it’s quite obvious that there is a decode attribute in line (less obvious that it’s being correctly used). So in the end you have to silence this line with type: ignore. You may argue that this is a quite non standard way of checking python versions (PEP-484 recommends that checkers recognize common way to check for version differences), but I can imagine similar scenarios for other situations unrelated to Python 3 vs 2.

Another problem with conditionals is when multiple definitions of something (like a variable or even a function) are done in different paths. mypy actually handles quite well the usual case which is when multiple definitions have the same type. But in pycodestyle I found the following function which is conditionally redefined depending on a setting; and there is something special about it:

def _is_eol_token(token: Token) -> bool:
  return ...

    def _is_eol_token(token: Token, _eol_token=_is_eol_token) -> bool:
        return _eol_token(token) or ...
error: All conditional function variants must have identical signatures

The actual problem here is that the author used the “capture value in default argument” trick, just to be able to call the “old” version of the function from the new one, but that means that there is no possible type signature for _is_eol_token that is valid. I’ve seen this kind of tricks with default arguments (i.e. using them to store stuff that isn’t actually an argument) and they will probably be an issue. For this situation, what I ended up doing is add the extra argument also to the original definition and type it as Any.

def _is_eol_token(token: Token, _eol_token: Any=None) -> bool: ...

    def _is_eol_token(token: Token, _eol_token: Any=_is_eol_token) -> bool: ...

Actually, any kind of function that behaves differently (in terms of accepted/returned types) depending on argument count value are hard to specify. Type variables may help here but there’s a limit to the expressiveness of the type language. However I didn’t come across any of these problems in pycodestyle. mypy apparently has some support for this and uses it in the standard library, but it’s not clear how usable the mechanism is for other users (The documentation says that “this feature is not supported for user code, at least not yet”).

Not all the problems here happen on highly dynamic code. Tricks like throwaway variables in tuple unpacking like this:

prev_type, prev_text, __, prev_end, __ = tokens[0]

produces an error because mypy tries to type __ according to the elements of the tuple, but the 3rd and 5th elements have different types in this case. It doesn’t matter because the place holder variable __ is never used anyway, but mypy complains anyway.

Other not-so-dynamic behaviour that seems to be prohibited is accessing some methods directly. I got a error: Cannot access "__init__" directly on:

elif inspect.isclass(check):
    if (_get_parameters(check.__init__)[:2] == ['self', 'tree']):

which seems a bit restrictive, and all I can do with this message is ignore the line.

Something that is dynamic but I didn’t think would be a problem was having attributes defined in methods other than __init__. Pycodestyle has many classes with attributes like that. It was possible to add some # type: xxx declarations around at the point when they were defined, but it wasn’t very easy to follow. I ended up solving it by forcing a declaration in __init__. It seems to be hard to specify the type of attributes without changing the semantics in that case. I was a bit surprised that the type wasn’t inferred, given that the method that actually assigned an initial value had an obvious type for them. For example Checker.line_number is defined in the check_all method as:

self.line_number = 0

but that isn’t enough, so I had to add that assignment at __init__ to make the inference work. Other situation that was a similar problem was the Checker.indent_char attribute; even if all assignments to that variable assign it a str or None, mypy requests a explicit type declaration in the middle of a method that uses it. I think it finds the first line in the code (top to bottom) that looks like an assignment, even if that’s not the best place to define the type.

Other usual python pattern I found problems with is accepting arbitrary **kwargs keyword arguments, and then pass them through to another API. This allows exposing parts of a third party API instead of having to reimplement it (and follow it when it changes). However that flexible typing is hard to do with mypy because there is no way to say “This function accepts the same keyword arguments of this other 3rd party API”. even if you can end up typing keyword arguments as Any, the problem is also compounded with a buggy behaviour in cases like the following one (this is an artificial example based on a quite more complicated case if found in pycodestyle which was too long):

def f(*, a: int=1, b: str=""):

def g(**kwargs: Any)-> None:

Which make mypy complain about Too many positional arguments for "f". This problem appears whenever they are keyword only arguments (the most usual case is when having both variable length arguments and keyword arguments together).

Things that mypy helped me fix or improve

Until now we’ve seen many cases of mypy complaining in stuff I didn’t think they were a real problem. Now we’ll start seeing the opposite side of that coin, which are things that mypy asked me to change, and the end result was better than the original. These are the kind of promises that static type proponents always make, and I was happy to see some of those being delivered.

  • I wrote the standard typeshed for tokenize.generate_tokens, and I declared the return type to be an iterator over tokens. However the function is implemented as a generator, and pycodestyle tries to use it as such in this line

    COMMENT_WITH_NL = tokenize.generate_tokens(['#\n'].pop).send(None)[1] == '#\n'

    The send(None) is a strange idiom, and the next(g) builtin is available even in python 2.6, so I replaced this with

    COMMENT_WITH_NL = next(tokenize.generate_tokens(['#\n'].pop))[1] == '#\n'

    And I think the end result is more readable, and depends less on how the generate_tokens is implemented. Here the typechecker found that the code was using something that could be argued is an implementation detail, which is a sin that I often have seen in python, even when there are more elegant solutions.

  • pycodestyle operates a lot on tokens. Tokens are stored as a heterogeneous 5 element tuple that includes information on the token text, kind and position. As a tuple, it’s inmutable, so there’s a piece of code that tries to create a new artificial token based on a transformation of an existing one(in a variable token), and put it into some list called self.tokens. The original code did this:

    token = list(token)
    token[1] = text.rstrip('\r\n')
    token[3] = (token[2][0], token[2][1] + len(token[1]))
    self.tokens = [tuple(token)]

    mypy was quite confused about this. On one hand, the token variable changed types (because it enters as a token, but ends up being a heterogeneous list which is hard to describe). I tried to fix this by adding a variable called new_token and using that to avoid overloading the semantic of the token variable but there still were problems with the mixed types inside the list. Finally, I rewrote it like this:

    toktype, string, start, end, line = token
    string = text.rstrip('\r\n')
    end = (start[0], start[1] + len(string))
    self.tokens = [(toktype, string, start, end, line)]

    I was about to put this example into the “mypy doesn’t get dynamic code” bin, but then I saw the 2 pieces of code side by side and the new one is much more readable. There are name instead of indices, and it’s much more clear what’s happening, how the new end line/column is being calculated, and how the new token is rebuilt. So again, here forcing me to not use dynamism actually ended up in better and more readable code.

  • pycodestyle has a different checking function for each kind of style error. They return an iterator on errors, and each error is a pair with a message and the source position. The positions are sometimes just an int (the line number), other times a tuple of 2 int values (line and column). On my first pass I saw a few functions, all returned line number only, and assumed that was true for all of them. This is what I wrote:

    CheckResult = Tuple[int, str]
    def some_checker(...) -> Iterable[CheckResult]:

    When I typed it that way the checker immediately told me that there was something wrong. I can imagine myself working on this project without the type information and making a refactor where I made my incorrect assumption, and I’m quite sure it would have been some painful debugging. Having my assumption validated will make it much easier. This is how the declarations ended up looking:

    SourcePosition = Tuple[int, int]
    CheckResult = Tuple[Union[int, SourcePosition], str]
    def some_checker(...) -> Iterable[CheckResult]:
  • I also noted that some checks directly return a CheckResult instead of an iterable, which needs to have some ad-hoc handling elsewhere. I didn’t refactor this, but the typing information makes the inconsistency more obvious, so it’s good to have some help to detect potential improvements to the codebase

  • Having to add types to arguments helped me notice several unused arguments to functions that could just be removed, simplifying code for free. Probably they were left around in some refactor. This was more a benefit of the “going through the code and add type info” than from the type checking itself.

  • While adding type info I found a function called continued_indentation that had a variable called inent_chances initialized as an empty dict. Mypy asked me to specify the type for that dict, and this led me to realize that this was quite hard, because the code was very complicated The keys of the dict seemed to be numbers (representing source code columns) but the values sometimes were `True, other times a string, and other times the str type itself. This in turn makes the code that uses the content of this dictionary quite complicated, but given that the values are so heterogeneous the only thing it can do are checks with is or in (i.e. value is not True, value in (some_text, str), etc.). In the end just typing the dictionary as Dict[int, object] worked perfect. This is a case where the dynamic nature brought no problem, but at the same time the type system makes more visible something strange going on (my guess is that this code could be made simpler).

  • The missing_whitespace_around_operator functions also had some unusually typed local called need_space. It could be True, False, None, pairs of ints, depending on which branch of a deep set of nested ifs the code goes through. I ended up refactoring this function a bit and actually was happier with the end result. In the end I transformed need_space into a bool, added a needed_at which was a source position (the tuple of int values that was used only in some branches) and a space_is_optional variable that was used deep inside one of the ifs (the original code used None in this case). The function was a bit big to include here but you can see the changes at my pycodestyle repo. Making this change also allowed me to find further simplifications of the code

  • The normalize_paths function ended up declared as:

    def normalize_paths(value: Union[List[FilePath], str],
                        parent: FilePath=os.curdir) -> List[FilePath]:
        """Parse a comma-separated list of paths.
        Return a list of absolute paths.
        if not value:
            return []
        if isinstance(value, list):
            return value
        paths = []
        for path in value.split(','):

    The type of the value argument looks unusual, but it only highlights that the implementation is a bit “un-pythonic” (the isinstance checks to take different codepaths tends to be a smell). Again, here the typing only highlights that there is something that could be cleaner.

  • I also detected thanks to mypy several functions that appeared to return booleans, but in fact could return a mix of types (with the correct meaning of truthiness). For example:

    def is_string_literal(line: str) -> bool:
        if line[0] in 'uUbB':
            line = line[1:]
        if line and line[0] in 'rR':
            line = line[1:]
        return line and (line[0] == '"' or line[0] == "'")

    which mypy detects as a problem given that this can also return "". That may not be a problem, but I like to get it explicitly detected, because in many situations having "" is not the same as having False (for example, if you’re passing this to a JSON serializer, or comparing equivalence of booleans with ==). I fixed this adding an explicit call to bool() and I actually like being sure that this only can return True or False.

  • The same happened in some object fields, where I think the problem may be larger (because the not-exactly-boolean can propagate to unexpected places). Lines like

    self.noqa = comments and noqa(''.join(comments))

    where noqa had been declared as bool were detected an error, and helped me add the explicit call to bool() around the expression which is a good idea.

  • The BaseReport.elapsed field, measuring time, is initialized to 0, and mypy inferred it was an int. However, it then ends up being used for floats (because it is based on time.time() which returns floats). I found it nice that mypy told me about this; I changed the initialization of 0.0 and now when reading the code I think it’s more obvious what to expect from this variable.

  • The StyleGuide class had the following initialization method:

    def __init__(self, *args, **kwargs):
        # build options from the command line
        self.checker_class = kwargs.pop('checker_class', Checker)
        parse_argv = kwargs.pop('parse_argv', False)
        config_file = kwargs.pop('config_file', False)
        parser = kwargs.pop('parser', None)

    When I started adding explicit types for each of the extracted kwargs, actually this helped me notice that this way of extracting the arguments is redundant. I ended up rewriting as:

    def __init__(self, *args: object,
                 checker_class: Callable[..., 'Checker']=Checker,
                 parse_argv: bool=False, config_file: FilePath=None,
                 parser: OptionParser=None, **kwargs: object) -> None:
        self.checker_class = checker_class

    Which actually provides a more clear interface, both in names and types of accepted arguments.

  • There is a StyleGuide.input_dir method that traverses a tree checking files. When checking its return type I noted that it has an early return of 0 when the directory being checked has been listed as “excluded” in the command line options, and otherwise does all the checking with no further return statements (i.e., implicitly returning None at the end). So, does it return an Optional[int]? Given that this looked unusual, I decided to take a closer look and eventually find that nobody uses that return value, so I ended up removing the return value.

  • An auxiliar method in the code reads:

    def _parse_multi_options(options, split_token=','):
        r"""Split and strip and discard empties.
        Turns the following:
        into ["A", "B"]
        if options:
            return [o.strip() for o in options.split(split_token) if o.strip()]
            return options

    This function may return a list of strings, or an empty string. That actually looks curious and forced me to check the calling site and see that the reasonable thing to return is an empty list. For the use it has it the same (iterating on an empty string or list does nothing in either case), but it’s easier to understand the method that way, and actually the whole if can be removed making it simpler.

Using mypy to understand the codebase

A large part of the experience for me was understanding what each function was doing to figure out which were the correct types. The interesting thing about adding type annotations is that a fairly large part of understanding a new piece of code is to get an idea of what are its inputs and outputs, and a large part of that answer is knowing the types. Sometimes figuring out the type is quite obvious from local information, or from conventions, but other times you need to trace a large chain of calls before being able to do it. The feel after adding each type annotation (especially the difficult ones) is that not only I had understood more about the code, but that the code was more understandable to the next person coming in and reading the code with the annotation.

In the case of pycodestyle the types of the arguments for the checker functions was one of the hardest parts. Each of those functions take a different set of arguments (they don’t have a uniform call API), but the code does not have explicit calls to each. Instead, they are collected through introspections, and a piece of code introspects the argument names, and pulls the values from the check argument from equally named attributes from a Checker instance. So the flow of information is much less explicit than a direct function call. In that case, learning about how each name mapped to a type was a great aid to understand how the different checkers fitted together.

Other question I had to ask myself in many places was if a given argument/return value was actually a list, or a sequence, or an iterable. In many cases any of those looked valid locally, but when I tried one, mypy complained that I was using too few or too many assumptions. A bit of trial and error aided by error messages actually allowed me to easily get it right, and actually have more clear expectations about what a function will provide me and what is what it accepts. I believe a lot of clarity was added here and this was a positive surprise. For example, I know that most checks return an Iterable[CheckResult]. Even if a few return a list, I now know for sure that trying to index the result of a check function is a bad idea.

One large piece of the puzzle was the type of this global:

_checks = {'physical_line': {}, 'logical_line': {}, 'tree': {}}

Many functions work with it, with many levels of indexing, so after typing it I was quite happy with the result:

Check = Callable[..., Iterable[CheckResult]]
CheckCodes = List[str]
CheckArgs = Sequence[str]
CheckKind = str
CheckSet = Dict[Check, Tuple[CheckCodes, CheckArgs]]
_checks = {
    'physical_line': {},
    'logical_line': {},
    'tree': {}}  # type: Dict[CheckKind, CheckSet]

The auxiliary definitions made it much more easy to understand and describe not only this structure but many functions that operated around it or parts of it.

Of course not everything is made completely clear; functions like parse_udiff (source) manipulate complex structures and even with annotations it’s a bit hard to follow, but having the added information makes it somewhat better.

The type checker itself can be used as a tool to figure out information about types in the unknown code base: you can declare something as object, and if it finds an error (for example, trying to apply len to it) you know some more information (it’s probably some kind of sequence), and you can iterate and refine. I used this idea a lot, although sometimes it’s not enough. There is, for example, in pycodestyle this StyleGuide.input_file() method which is supposed to check a single file and takes an optional expected argument. It took me a while to trace when this is called, because, it is never called directly but aliased into a self.runner = self.input_file attribute which is also never called but aliased as a local variable runner in the check_files method. I was looking for the type of the expected argument there to see what was its type; however runner is never called with an expected argument so I had no information here. The argument seems to be available only for users using pycodestyle as an API. So what I had to do there is dig down. The expected argument is passed down to Checker.check_all, which in turn passes it down to BaseReport.init_file, which stores it into the BaseReport.expected attribute, and the BaseReport.error() method reads it in a line with says if code in self.expected, so finally we know that this is some kind of container of error codes. In this case, I had to do all the analysis myself, and as you can see, this can mean diving a lot through a lot of code.

Other random thoughts

Adding some custom type alias made several declarations more readable. It probably would made more sense if the libraries provide them, otherwise you’re stuck with what you get form the stdlib. But saying “this argument is a FilePath” is much more informative than “this is a str”, even if one is an alias. It would be interesting if the core team decides to provides conventions on those.

In general, as this is something new, there are a lot of missing naming and styling conventions that I think will be developed. What are appropriate name for type vars? How to split lines on type declarations to follow type conventions. How to insert # type: xxx comments in long lines? should I put the auxiliary type definitions nearby their use or at the top of the module?

It would have been nice to have some kind of automated assistance during the “transformation” process. I feel that a lot of the work that I did could have been “assisted” by mypy or some related tool, given it already can infer a lot.

I already knew when I started that the Python language has some aspects that would be hard for a type checker, but I didn’t foresee that booleans would be one of them. The semantics of the and and or operators and the ability to use anything as a value of truth tends to created mixed type scenarios quite easily.

I made some mistakes during the process that I expected to get caught by mypy but they weren’t. I initially mixed many type declarations related to StyleGuide.reporter (which is a class) and StyleGuide.report which is an instance of this class, and never got an error related to it (although seeing the type annotations led me to realize they were two different things).


After all the changes here (and a lot more of routine updates) I had a fully typed version of pycodestyle. Even if mypy is optional I covered all arguments and return values (with only one use of Any that I described above in _is_eol_token). The program still runs and passes all tests, and mypy accepts is with no errors (I’m using a modified version of the stdlib typesheds to remove bugs in the typesheds for frozenset). It took me roughly a full day of work (~10hs split in 3 different days). I found several problems, but I also got several insights about the code by using mypy.

In the next post I’ll make a short version with my findings and conclusions, including the answers to:

  1. Does it help me discover actual bugs?
  2. Does the process of adding types help making the code base more understandable?
  3. How mature is mypy itself? Is it usable right now, does it have a lot of bugs?
  4. Is the type system flexible enough to express the kind of actual dynamic tricks that real developers like us use in actual, production python code?
  5. Does it feel practical/usable?
  6. What other things that I didn’t expect can be learned from the experience?

See you next time! Update: Part 3 is up now


  • My version of pycodestyle with type anotations can be found on github. That is the exact version described in this post.

Previous / Next posts