LeetCode MEDIUM 8 String to Integer (atoi) Summary

Generated by Codex with GPT-5

Quick facts

Difficulty: MEDIUM
Problem: String to Integer (atoi)
Main tags: String

What the problem is really asking

The input is a string, and the task is to extract the integer prefix that follows a very specific parsing rule:

ignore leading spaces
read at most one optional + or -
read as many consecutive digits as possible
stop as soon as the next character is not a digit
clamp the result into the 32-bit signed integer range

So this is not a math problem. It is a parsing problem with a few edge cases that must be handled in the correct order.

That order matters a lot. For example:

"42" should become 42
" -42" should become -42
"4193 with words" should become 4193
"words and 987" should become 0
"-91283472332" should become -2147483648 because the true value is below the allowed range

The tempting mistakes

The common wrong approaches are predictable:

trimming too aggressively instead of only skipping leading spaces
accepting signs in the middle of the string
continuing after the first non-digit character
converting the whole digit block first and only later thinking about overflow

In Python, the last mistake is easy to miss because Python integers do not overflow. In Java, C++, or Go, that can break the solution. A strong interview answer handles overflow while the number is being built.

The key insight

The problem statement already describes the algorithm.

Once the rules are read carefully, the optimal solution is a deterministic left-to-right scan with a small amount of state:

a pointer into the string
the sign
the integer value built so far

There is no need for backtracking, recursion, regular expressions, or storing substrings.

The only subtle part is overflow handling. Before doing:

value = value * 10 + digit

the parser should check whether that operation would push the result outside the 32-bit signed range.

How to derive the optimal solution

The easiest way to derive the solution is to turn the English spec into phases.

First, the parser skips spaces because nothing meaningful can happen before the first non-space character.

Second, it checks for a single optional sign. If that sign is present, it fixes whether the final answer is positive or negative. If there is no sign, the number is positive by default.

Third, it consumes digits one by one. This is the real work of the problem. Every new digit shifts the existing number left by one decimal place and adds the new digit.

Finally, the parser stops at the first non-digit. That stopping rule is what makes inputs like "4193 with words" straightforward: the parser never needs to understand the trailing text. It simply stops reading when the integer token ends.

Because each character is inspected at most once, the complexity is:

Time: O(n)
Extra space: O(1)

Best approaches

The cleanest optimal approach is a one-pass parser with explicit phases. It is easy to explain, easy to implement, and maps directly onto the wording of the problem.

Another equally optimal approach is to model the parser as a finite-state machine with states like START, SIGNED, IN_NUMBER, and END. That version has the same O(n) time and O(1) extra space complexity, but it is usually more useful when the interviewer wants a reusable parsing framework rather than the shortest correct answer.

For a normal LeetCode interview, the direct one-pass parser is the better default.

Why the overflow check works

Suppose the current accumulated value is value and the next digit is digit.

If the parser is building a positive number, then overflow happens exactly when:

value > INT_MAX // 10, or
value == INT_MAX // 10 and digit > INT_MAX % 10

The same idea works for negative numbers, except the absolute-value limit is 2147483648, because INT_MIN is -2147483648.

That check is better than multiplying first and hoping to recover later. It proves that the parser never steps outside the allowed range, even in languages where overflow is unsafe.

Python solution

INT_MAX = 2**31 - 1
INT_MIN = -(2**31)


class _AtoiParser:
    def __init__(self, text: str) -> None:
        self.text = text
        self.index = 0
        self.length = len(text)

    def parse(self) -> int:
        self._skip_leading_spaces()
        sign = self._read_optional_sign()
        return self._consume_digits_with_clamp(sign)

    def _skip_leading_spaces(self) -> None:
        while self.index < self.length and self.text[self.index] == " ":
            self.index += 1

    def _read_optional_sign(self) -> int:
        if self.index >= self.length:
            return 1

        current_character = self.text[self.index]
        if current_character == "+":
            self.index += 1
            return 1
        if current_character == "-":
            self.index += 1
            return -1

        return 1

    def _consume_digits_with_clamp(self, sign: int) -> int:
        value = 0
        absolute_limit = INT_MAX if sign == 1 else -INT_MIN

        while self.index < self.length:
            current_character = self.text[self.index]
            if not self._is_ascii_digit(current_character):
                break

            digit = ord(current_character) - ord("0")

            # Clamp before multiplying so the logic is safe in fixed-width
            # integer languages too.
            if self._would_overflow(
                value=value,
                digit=digit,
                absolute_limit=absolute_limit,
            ):
                return INT_MAX if sign == 1 else INT_MIN

            value = value * 10 + digit
            self.index += 1

        return sign * value

    @staticmethod
    def _would_overflow(value: int, digit: int, absolute_limit: int) -> bool:
        return (
            value > absolute_limit // 10
            or (
                value == absolute_limit // 10
                and digit > absolute_limit % 10
            )
        )

    @staticmethod
    def _is_ascii_digit(character: str) -> bool:
        return "0" <= character <= "9"


class Solution:
    def myAtoi(self, s: str) -> int:
        parser = _AtoiParser(s)
        return parser.parse()

Why this solution works

The parser has a simple invariant: before digit parsing starts, the index points at the first character that could begin the numeric part. After that, the parser only moves forward, and every consumed digit becomes part of the answer.

That means the solution matches the problem statement exactly:

leading spaces are skipped once
at most one sign is consumed
only a contiguous digit run is parsed
trailing garbage is ignored because parsing stops immediately
overflow is clamped before it can happen

No valid prefix is skipped, and no invalid character is accidentally absorbed into the number.

Interview follow-ups

What changes if the interviewer wants the whole string to be a valid integer token?

Then the parser can no longer stop quietly at the first non-digit and return the prefix. Instead, after consuming the optional sign and the digit run, it must verify that no extra non-space characters remain. If extra characters do remain, the function should reject the input according to whatever error contract the interviewer wants, such as returning None, raising an exception, or returning a success flag plus value.

The parsing core stays almost the same, so the time complexity is still O(n) and the extra space is still O(1). The only real change is the post-processing rule: in this version, trailing junk is an error instead of a stopping point.

How would this be implemented safely in Java or C++ where integer overflow is real?

The structure would be the same, but the overflow check becomes mandatory instead of optional. The parser must compare the current value against the allowed limit before doing the multiply-and-add step. That is exactly why the solution above checks the threshold first rather than relying on Python’s big integers.

This works because overflow can be predicted from the current prefix and the next digit. The tradeoff is only a constant amount of extra logic per digit, so the algorithm remains O(n) time and O(1) extra space.

What if the input arrives as a stream of characters instead of a fully built string?

The same parser idea still works. Instead of indexing into a string, the code reads one character at a time from the stream, keeps track of whether it is still in the leading-space phase, whether a sign has already been seen, and what the current numeric value is.

This is a good follow-up because it shows that the algorithm is fundamentally online. It does not need random access or buffering the whole input. The complexity stays linear in the number of consumed characters and uses constant extra memory.

When would a finite-state machine be a better answer than the direct parser?

A finite-state machine is useful when the interviewer wants a general parsing pattern that can be extended to more token types or more validation rules. For example, if later versions of the parser need decimal points, exponents, or custom separators, an explicit state table can make the control flow easier to maintain.

For this exact problem, the direct parser is shorter and usually easier to explain under interview pressure. The tradeoff is not asymptotic complexity, since both versions are O(n) time and O(1) extra space. The tradeoff is mostly about clarity versus extensibility.

Practical takeaway

atoi looks messy at first, but the core idea is simple: treat the prompt as a tiny language spec and implement the parser in the same order the rules are written.

Once the solution is broken into phases, the only genuinely important technical detail is overflow handling. Everything else is just disciplined left-to-right parsing.