How does Go parseInt?

Photo by Chinmay B on Unsplash

How does Go parseInt?

Understanding strconv.Atoi() as a JS Dev learning Go.

Intro and Motivation.

Hello there, it's Tired Dev! I've been learning Go in my spare time(as a JS Dev). It's been very scattered, mostly because I had a hard time "adjusting" to the syntax, also because I got sidetracked by life and work, amongst other things. However, I recently picked it up again since I started flirting with the idea of building tools using Go. This was inspired by my fascination with esbuild, a JS bundler built with Go. It's what Vite uses under the hood and it's blazingly fast! (lol, Primeagen).

It's got me thinking Go is worth the investment! If I don't fall off the wagon again I intend to build very interesting things when I become proficient with the language, but for now, baby steps!

This is a naive piece-by-piece breakdown of what the strconv.Atoi() method does by looking directly into the Go source code on GitHub.

What is strconv.Atoi (as a JS Dev)?

I'm not one for using big technical terms so I'll lazily explain it. For JS Devs, strconv.Atoi is basically parseInt in JavaScript. It converts a string (of integers) into a number, and that's it. Atoi is a function that belongs to the strconv go package.

But how does it do this? Let's take a look at its function definition from the source code and then take it step by step. I'll try to break it down based on my basic understanding of Go and leave out the parts I don't completely understand, comments are welcome! Let's dive in, here's the whole function from Github:

// Atoi is equivalent to ParseInt(s, 10, 0), converted to type int.
func Atoi(s string) (int, error) {
    const fnAtoi = "Atoi"

    sLen := len(s)
    if intSize == 32 && (0 < sLen && sLen < 10) ||
        intSize == 64 && (0 < sLen && sLen < 19) {
        // Fast path for small integers that fit int type.
        s0 := s
        if s[0] == '-' || s[0] == '+' {
            s = s[1:]
            if len(s) < 1 {
                return 0, syntaxError(fnAtoi, s0)
            }
        }

        n := 0
        for _, ch := range []byte(s) {
            ch -= '0'
            if ch > 9 {
                return 0, syntaxError(fnAtoi, s0)
            }
            n = n*10 + int(ch)
        }
        if s0[0] == '-' {
            n = -n
        }
        return n, nil
    }

    // Slow path for invalid, big, or underscored integers.
    i64, err := ParseInt(s, 10, 0)
    if nerr, ok := err.(*NumError); ok {
        nerr.Func = fnAtoi
    }
    return int(i64), err
}

Let's break it down.

  1. Function Definition:
func Atoi(s string) (int, error){}

All this part says is that Atoi is a function that takes a string s and returns two possible values, either the converted integer(int) or an error that explains why the conversion couldn't be done. It's kind of like the error-first callback pattern in Node js. The user of this function would have to handle both scenarios, the happy path and the error scenario like so:

num, err := strconv.Atoi("string-to-be-converted")
if err != nil {
// handle error
} 
//use integer if all goes well
  1. Handling smaller integer conversions:

         const fnAtoi = "Atoi"
    
         sLen := len(s)
         if intSize == 32 && (0 < sLen && sLen < 10) ||
             intSize == 64 && (0 < sLen && sLen < 19){
             ...
            }
    

    The next block comprises the following steps:

    • a constant string declaration fnAtoi ,

    • finding the length of the string by passing the string to Go's len function,

    • and deciding based on the integer size what function to use in parsing the string to an integer.

      You see, there are actually two functions in the source code. Atoi handles smaller integers of either integer size 32 (and a length of 10 numbers) or integer size 64 (and a length of 18 numbers) respectively. For larger, invalid, or underscored integers, they are passed to the ParseInt function, but I won't cover that in this article. So it means the following integers will follow the fast path in the if block:

    "100000000" //length: 9, int size: 32, valid 
    "100000000000000000" //length: 18, int size: 64, also valid

The intSize is defined outside the Atoi function elsewhere:

    const intSize = 32 << (^uint(0) >> 63)

    // IntSize is the size in bits of an int or uint value.
    const IntSize = intSize

Naively explained without going too technical, it is the integer size in bits. I don't quite understand those bitwise operators but this stack-overflow answer is super great(the one with over 300 upvotes).

  1. Enter the Fast path:

     if intSize == 32 && (0 < sLen && sLen < 10) ||
             intSize == 64 && (0 < sLen && sLen < 19) {
             // Fast path for small integers that fit int type.
             s0 := s
             if s[0] == '-' || s[0] == '+' {
                 s = s[1:]
                 if len(s) < 1 {
                     return 0, syntaxError(fnAtoi, s0)
                 }
             }
    
             n := 0
             for _, ch := range []byte(s) {
                 ch -= '0'
                 if ch > 9 {
                     return 0, syntaxError(fnAtoi, s0)
                 }
                 n = n*10 + int(ch)
             }
             if s0[0] == '-' {
                 n = -n
             }
             return n, nil
         }
    

    The first thing Atoi does is to check if the passed string you wish to convert to a number is either negative or positive, it does this by accessing the value in the first index of the string and checking if it is a sign(-/+), then reassigning the s variable to the remaining parts of the string assumed to be the number. If the length of the sliced-off part is less than 1, it returns an error. However, if the first value of the string is not a sign, it skips this block:

             s0 := s
             if s[0] == '-' || s[0] == '+' {
    
         // s[1:] means if s is "-42"/ "+42" it gets reassigned to "42"
                 s = s[1:] 
    
         //if only "-" or "+" was passed w/o a number, return an error 
                 if len(s) < 1 {
                     return 0, syntaxError(fnAtoi, s0)
                 }
             }
    

    The next part of the code is where the magic happens. If you're a JS dev reading this, remember String.prototype.charCodeAt()? Check it out before proceeding!

             n := 0
             for _, ch := range []byte(s) {
                 ch -= '0'
                 if ch > 9 {
                     return 0, syntaxError(fnAtoi, s0)
                 }
                 n = n*10 + int(ch)
             }
    

    What's happening here is that the string is first converted to a slice of bytes and then looped over using the range form of the for loop. Range returns two values on each iteration, the index of the value and the value itself respectively. The index isn't needed here, hence is ignored with an underscore (so the compiler doesn't yell at us). On each iteration, the character code of '0' is subtracted from the character code of the integer:

     ch -= '0'
    

    Why this? It's an intuitive way to determine if the string passed is an integer without having to do a type conversion/cast of the string to int. The character code of '0' is 48. Going from '0'-'9' , the character code of '9' , the highest integer, is 57. 48 subtracted from 57 is 9. This means that, for each character code passed, if the remainder of the value after the character code of '0' is subtracted from it is greater than 9, then the string is not a valid integer! If this is the case, an error is returned.

         if ch > 9 {
                 return 0, syntaxError(fnAtoi, s0)
             }
    

    If all goes well and the character code is not greater than 9, the string item in byte form on that iteration is cast to an integer and added to the current value of n*10 where n is an integer variable initialized to 0. It multiplies n by 10 before adding so that it preserves the form of the passed string being converted to an integer. Here's an example: https://go.dev/play/p/bjJ6uzGI2xH, run it and play with it!

    When the loop is finished, a check is done again for the first value of the initial string passed, if it is a negative sign(-), the variable n containing our newly converted integer is reassigned to a negative form of itself. If it is not negative, we return the integer as is.

         if s0[0] == '-' {
                     n = -n
             }
         return n, nil
    

    PS: if you're wondering why we're returning two values on every return refer back to 1. Function Definition above.

And there we have it! Now you know how Golang's strconv.Atoi works under the hood for smaller numbers!

Thank you for reading! If you found this article helpful please share!