Getting (possibly) 500% speed gain on divisions
This post was migrated from My old blog delphi-snippets.blogspot.com, for explanation about this switch see my introduction post. One of my fellow students once made a remark about the old days when a division was much faster if it was written as multiplication.
This means, instead of devising with a 100, you multiply with 0.01 (1/100). Because some programs of mine have a lot of divisions in their core loops, I investigated the difference.
I created the following test program.
program DivVsMult;
{$APPTYPE CONSOLE}
uses
SysUtils,
Windows;
var
Start,Stop, Start2, Stop2, Freq:int64;
i : integer;
t : real;
CpuSpeed : integer;
begin
{ TODO -oUser -cConsole Main : Insert code here }
{ Cpu Speed fastes cpu = 1 slower => 10
it's just to determin the number of time to do the loop
Maxint div CpuSpeed is calculated }
if ParamCount = 1 then
CpuSpeed := StrToIntDef(ParamStr(1),1)
else
CpuSpeed := 10;
Writeln('Simple Number division:');
Writeln('Calculating');
QueryPerformanceFrequency(freq);
QueryPerformanceCounter(Start);
for i:=0 to MaxInt div CpuSpeed do
t := i / 100;
QueryPerformanceCounter(Stop);
Writeln(Format('First Pass Result: %f',[t]));
{ This is needed because the compiler would optimize,
and would notice the result of the loop isn't used at all,
so therefor the result is useless.. so depending on the compiler, it will
choose what to do with it, this disables that optimization }
QueryPerformanceCounter(Start2);
for i:=0 to MaxInt div CpuSpeed do
t := i * (1/100);
QueryPerformanceCounter(Stop2);
Writeln(Format('Second Pass Result: %15.6f',[t]));
{ This is needed because the compiler would optimize,
and would notice the result of the loop isn't used at all,
so therefor the result is useless.. so depending on the compiler, it will
choose what to do with it, this disables that optimization }
Writeln('Done, Results:');
Writeln(Format('/ 100 Time: %6.4f seconds'+#13#10+
'/ 100 Clock: %d ticks'+#13#10+
'* 0.01 Time: %6.4f seconds'+#13#10+
'* 0.01 Clock: %d ticks',[(Stop-Start) / freq, (Stop-Start), (Stop2-Start2) / freq, (Stop2-Start2)]));
Writeln;
Writeln('Odd Number division:');
QueryPerformanceCounter(Start);
for i:=0 to high(i) div CpuSpeed do
t := i / 556;
QueryPerformanceCounter(Stop);
Writeln(Format('First Pass Result: %15.6f',[t]));
{ This is needed because the compiler would optimize,
and would notice the result of the loop isn't used at all,
so therefor the result is useless.. so depending on the compiler, it will
choose what to do with it, this disables that optimization }
QueryPerformanceCounter(Start2);
for i:=0 to high(i) div CpuSpeed do
t := i * (1/556);
QueryPerformanceCounter(Stop2);
Writeln(Format('Second Pass Result: %15.6f',[t]));
Writeln(Format('/ 556 Time: %6.4f seconds'+#13#10+
'/ 556 Clock: %d ticks'+#13#10+
'* (1/556) Time: %6.4f seconds'+#13#10+
'* (1/556) Clock: %d ticks',[(Stop-Start) / freq, (Stop-Start), (Stop2-Start2) / freq, (Stop2-Start2)]));
Writeln(Format(' (1/556) = %15.14f (approximate)',[1/556]));
Readln;
end.
On an old P3 900Mhz:
Simple Number division:
Calculating
First Pass Result: 2147483,64
Second Pass Result: 2147483,64
Done, Results:
/ 100 Time: 10,3319 seconds
/ 100 Clock: 36983482 ticks
* 0.01 Time: 2,0378 seconds
* 0.01 Clock: 7294251 ticks
Odd Number division:
First Pass Result: 386238,0647482014610000
Second Pass Result: 386238,0647482014610000
Done, Results:
/ 556 Time: 10,0735 seconds
/ 556 Clock: 36058581 ticks
* (1/556) Time: 2,0446 seconds
* (1/556) Clock: 7318775 ticks
(1/556) = 0,0017985611510791 (approximate)
On a new P4 2.3 Ghz:
Simple Number division:
Calculating
First Pass Result: 2147483.64
Second Pass Result: 2147483.64
Done, Results:
/ 100 Time: 4.6227 seconds
/ 100 Clock: 16547055 ticks
* 0.01 Time: 1.0782 seconds
* 0.01 Clock: 3859508 ticks
Odd Number division:
First Pass Result: 386238.064748
Second Pass Result: 386238.064748
Done, Results:
/ 556 Time: 4.5820 seconds
/ 556 Clock: 16401425 ticks
* (1/556) Time: 12.1746 seconds
* (1/556) Clock: 43579366 ticks
(1/556) = 0.00179856115108 (approximate)
The results are variating, on simple numbers like 0.01 the speedup is allways working, but somehow the very complex numbers tend to be slower sometimes.
I use this tip a lot when working with percentage.
Automatic Cropping of a Bitmap
This post was migrated from my old blog delphi-snippets.blogspot.com, for explanation about this switch see my introduction post. I have created a lot of program’s which needed to do some operations on Bitmaps, most of them are pretty simple, but recently I needed some auto cropping due to the fact I could not calculate the width and height without a lot of calculation, but the maximum could easily be calculated/estimated, therefor I needed an algorithm for cropping these bitmap automatically.
Everyone who uses Delphi and Bitmaps should have at least heard about scanlines, and in particularly efg’s pages on the topic. In short:
The ScanLine property, new in Delphi 3, allows quick access to individual pixels, but you must know what PixelFormat you're working with before you can access the pixels correctly.
Because I almost always work with 24bit bitmaps, I didn’t adapt the code for other pixel formats, but it should really just be editing the “procedure AutoCropBitmap(InputBitmap, OutputBitmap: TBitmap; iBleeding : Integer; BackColor: TColor);” overload to start the right sub functions for each pixel format.
The current function will always translate it to an 24bit bitmap, which can be a processor and memory heavy job, the advise is therefor directly after creating the bitmap set the PixelFormat to pf24bit.
You might notice the overloads, well thats just one part of me being lazy again, sometimes I haven’t got the time to create an extra variable and assign the property’s etc. The overloads allow the choice of which input and output you’ll like.
unit unBitmapCropping;
interface
uses
Windows, Graphics, Dialogs, SysUtils, Math, Classes;
const
PixelCountMax = 32768;
type
pRGBArray = ^TRGBArray;
TRGBArray = array[0..PixelCountMax-1] of TRGBTriple;
procedure AutoCropBitmap(InputBitmap, OutputBitmap: TBitmap; iBleeding : Integer); overload;
procedure AutoCropBitmap(InputBitmap, OutputBitmap: TBitmap; iBleeding : Integer; BackColor: TColor); overload;
procedure AutoCropBitmap(BitMapToCrop: TBitmap; iBleeding : Integer); overload;
procedure AutoCropBitmap(BitMapToCrop: TBitmap; iBleeding : Integer; BackColor: TColor); overload;
procedure AutoCropBmp(const sFileName : String; iBleeding : Integer); overload;
procedure AutoCropBmp(const sFileName : String; iBleeding : Integer; BackColor: TColor); overload;
implementation
procedure AutoCropBitmap(BitMapToCrop: TBitmap; iBleeding : Integer);
var bmpTmp : TBitmap;
begin
bmpTmp := TBitmap.Create;
try
AutoCropBitmap(BitMapToCrop,bmpTmp,iBleeding);
BitMapToCrop.Assign(bmpTmp);
finally
bmpTmp.Free;
end;
end;
procedure AutoCropBitmap(BitMapToCrop: TBitmap; iBleeding : Integer; BackColor: TColor);
var bmpTmp : TBitmap;
begin
bmpTmp := TBitmap.Create;
try
AutoCropBitmap(BitMapToCrop,bmpTmp,iBleeding, BackColor);
BitMapToCrop.Assign(bmpTmp);
finally
bmpTmp.Free;
end;
end;
procedure AutoCropBitmap(InputBitmap, OutputBitmap: TBitmap; iBleeding : Integer);
begin
AutoCropBitmap(InputBitmap,OutputBitmap, iBleeding, InputBitmap.Canvas.Pixels[0,0]);
end;
procedure AutoCropBitmap(InputBitmap, OutputBitmap: TBitmap; iBleeding : Integer; BackColor: TColor);
var Row : pRGBArray;
MyTop, MyBottom, MyLeft,
i, j, MyRight : Integer;
begin
MyTop := InputBitmap.Height;
MyLeft := InputBitmap.Width;
MyBottom := 0;
MyRight := 0;
InputBitmap.PixelFormat := pf24bit;
OutputBitmap.PixelFormat := pf24Bit;
{ Find Top }
for j := 0 to InputBitmap.Height-1 do
begin
if j > MyTop then
Break;
Row := pRGBArray(InputBitmap.Scanline[j]);
for i:= InputBitmap.Width - 1 downto 0 do
if ((Row[i].rgbtRed <> GetRvalue(BackColor)) or
(Row[i].rgbtGreen <> GetGvalue(BackColor)) or
(Row[i].rgbtBlue <> GetBvalue(BackColor))) then
begin
MyTop := j;
Break;
end;
end;
if MyTop = InputBitmap.Height then
{ Empty Bitmap }
MyTop := 0;
{ Find Bottom }
for j := InputBitmap.Height-1 Downto MyTop do
begin
if (j + 1) < MyBottom then
Break;
Row := pRGBArray(InputBitmap.Scanline[j]);
for i:= InputBitmap.Width - 1 downto 0 do
if ((Row[i].rgbtRed <> GetRvalue(BackColor)) or
(Row[i].rgbtGreen <> GetGvalue(BackColor)) or
(Row[i].rgbtBlue <> GetBvalue(BackColor))) then
begin
MyBottom := j+1;
Break;
end;
end;
{ Find Left }
for j := MyTop to MyBottom-1 do
begin
Row := pRGBArray(InputBitmap.Scanline[j]);
for i:= 0 to MyLeft-1 do
if ((Row[i].rgbtRed <> GetRvalue(BackColor)) or
(Row[i].rgbtGreen <> GetGvalue(BackColor)) or
(Row[i].rgbtBlue <> GetBvalue(BackColor))) then
begin
MyLeft := i;
Break;
end;
end;
if MyLeft = InputBitmap.Width then
{ Empty Bitmap }
MyLeft := 0;
{ Find Right }
for j := MyTop to MyBottom -1 do
begin
Row := pRGBArray(InputBitmap.Scanline[j]);
for i:= InputBitmap.Width-1 downto MyRight do
if ((Row[i].rgbtRed <> GetRvalue(BackColor)) or
(Row[i].rgbtGreen <> GetGvalue(BackColor)) or
(Row[i].rgbtBlue <> GetBvalue(BackColor))) then
begin
MyRight := i+1;
Break;
end;
end;
if (MyRight = 0) or (MyBottom = 0) then
{ Empty Bitmap }
iBleeding := 0;
OutputBitmap.Width := MyRight - MyLeft + (iBleeding * 2);
OutputBitmap.Height := MyBottom - MyTop + (iBleeding * 2);
OutputBitmap.Canvas.Brush.Color := BackColor;
OutputBitmap.Canvas.FillRect(Rect(0,0,OutputBitmap.Width,OutputBitmap.Height));
BitBlt(OutputBitmap.canvas.Handle, -MyLeft + iBleeding,
-MyTop + iBleeding,MyLeft + MyRight,MyTop + MyBottom,
InputBitmap.Canvas.Handle, 0, 0, SRCCOPY);
end;
procedure AutoCropBmp(const sFileName : String; iBleeding : Integer);
var InputBitmap, OutputBitmap : TBitmap;
begin
if not FileExists(sFileName) then
raise Exception.Create('File doesn''s exists.');
InputBitmap := TBitmap.Create;
OutputBitmap := TBitmap.Create;
try
InputBitmap.LoadFromFile(sFileName);
OutputBitmap.PixelFormat := InputBitmap.PixelFormat;
AutoCropBitmap(InputBitmap, OutputBitmap,iBleeding);
OutputBitmap.SaveToFile(sFileName);
finally
OutputBitmap.Free;
InputBitmap.Free;
end;
end;
procedure AutoCropBmp(const sFileName : String; iBleeding : Integer; BackColor: TColor);
var InputBitmap, OutputBitmap : TBitmap;
begin
if not FileExists(sFileName) then
raise Exception.Create('File doesn''s exists.');
InputBitmap := TBitmap.Create;
OutputBitmap := TBitmap.Create;
try
InputBitmap.LoadFromFile(sFileName);
OutputBitmap.PixelFormat := InputBitmap.PixelFormat;
AutoCropBitmap(InputBitmap, OutputBitmap,iBleeding, BackColor);
OutputBitmap.SaveToFile(sFileName);
finally
OutputBitmap.Free;
InputBitmap.Free;
end;
end;
end.
I started off with an example found on efg’s site, and started optimizing it’s algorithm:
- Dismissed the Temp variables and the counter variable.
- Making the loops downto 0 as many as possible (this will make the loop slightly faster).
- Making sure no extra round on the loop is used (adding breaks).
- Decreasing the number of core operations (removing if’s)
With this I created a >400% speed gain.
Writing this article, I got the following ideas for a little more speedup:
- Storing the GetXvalue results, as to decrease the recalculation in each for loop.
- Maybe running loops for 0 till the end will make it faster because of better page alignment
But I will try that out an other time.
Tags: delphi
CPU and Form Friendly (Long) Sleep
This post was migrated from my old blog delphi-snippets.blogspot.com, for explanation about this switch see my introduction post. Because a Sleep(1000) will seriously freeze your form for a second, the you’ll see that the solution will often be using a Application.ProcessMessages loop until the second has passed, but the problem with that loop is, it will create 100% cpu usage.
Let’s say your waiting for a response from a printer your controlling, than the 100% usage might slow down the complete process, not to mention that on a laptop you’ll be seriously eating the battery.
The following source offers a nice solution to this problem.
procedure TForm.LongDelay(DelayMs : Cardinal);
var
StopTime : Cardinal;
begin
StopTime := GetTickCount + DelayMs;
while (GetTickCount < StopTime) do
begin
Application.ProcessMessages;
Sleep(1);
end;
end;
It’s pretty straight forward, offcourse when using a basic windows function, you should check out MSDN for the arguments and remarks. There was one thing that was interresting.
A value of zero causes the thread to relinquish the remainder of its time slice to any other thread of equal priority that is ready to run. If there are no other threads of equal priority ready to run, the function returns immediately, and the thread continues execution.
This fixes one problem, you will only use the cpu when it’s idle. But that still makes this a battery eater on laptops.
I hope you liked this first post, this was just a minor subject.. But I got to start somewhere.