Topic: Why is my code smaller when compiled in speed optimization mode?

The Raisonance RCSTM8 compiler for ST7/STM8 has 2 modes of optimization:
- Optimize for size (default), where the produced code has the smallest possible size.
- Optimize for speed, where the compiler tries to reduce the number of executed cycles for any given code.

The differences between these 2 optimization modes are as follows:
1) The case statements are optimized using binary trees in speed mode, although they use jump tables in size mode. Note that 5 different switch algorithms may be used, depending on the code.
2) The long division -which is a sloooow operation- follows a simple algorithm in size mode, but this requires 32 occurrences of a code loop for each division. In speed mode the division routine will try to optimize the time taken by the division, so it will recognize cases where dividend is smaller than divisor (hence directly return 0), where divisor fits in a 8-bits char, where both dividend and divisor fit on short integers, and perform specific (fast) division algorithms in these cases.
3) In speed mode a 32+8 addition will be called when possible (for instance for "long += 10")
4) In size mode, the compiler will use internal routines to perform specific operations on 32-bits data objects (size conversions, push 32-bits object on stack, passing objects objects by reference -in 8 or 16-bit- instead of value passing in 32-bit)
5) The functions are aligned on 16-bit boundaries in speed mode on STM8 (for faster fetch from flash)


So how is it possible that a given code becomes bigger in size optimization mode?
The answer lies in the point 4) above: In size mode some compiler helper routines will be called instead of inline code.

For example, if the compiler needs to push a 32-bit null constant on stack, in size optimization mode it will call a routine "?C?mv4_null2sk" that will do it, instead of performing inline code.
This will replace the 5-bytes "CLR A/PUSH A/PUSH A/PUSH A/PUSH A" code by a 3-byte "CALL ?C?mv4_null2sk", gaining 2 bytes of code.
So each time a 32-bit null constant is pushed on stack, the size optimization will make your code shorter by 2 bytes.

However, you have to include the "?C?mv4_null2sk" code in your project, which is 6 bytes long ("CLR A/PUSH A/PUSH A/PUSH A/PUSH A/RET").

The size optimization gain for this optimization will hence be (2 * number_of_null32 - 6). It will be smaller only if it is called 3 or more times in your whole project. For instance, if 32-bit null push is only used once in your whole project, this optimization will result in a loss of 4 bytes.
This is why your project may be larger in size optimization mode than in speed mode.

Note that the compiler optimizes a compilation unit (a .c file) at a time, so it does not know how many uses of a specific optimization will be done in the whole project. Hence it cannot decide to "skip" a size optimization which is called only once in a given unit, as it may be used in other units. Only the linker has knowledge about how many uses of the specific optimization are done in the whole application

The bottom line: Should you use speed or size optimization?
- If you have a small project (few KB of code), use speed optimization.
- For larger projects, use size optimization if you are limited in code space, else keep speed optimization.
- In case you project uses 32-bit division, speed mode will make your application code very large (complex division algorithms). First try to avoid such divisions! If not possible, choose size optimization at least for the functions that perform such divisions.

And remember to read your linker map file, which will give you plenty of useful information about what happens.

Regards,
Bruno

Bruno Richard, PhD.
RAISONANCE - 17, avenue Jean Kuntzmann - F-38330 Montbonnot St Martin - FRANCE
There are 10 types of people in the world: Those who understand binary, and those who don't.