|
Post by dragonjim on Mar 7, 2024 13:07:03 GMT 1
Not sure if this has been done already, but here is a short function to find a string in a string array.
If anyone can improve it, especially for unsorted or case insensitive searches, then please do and post below.
Dim s$(10000), n%, p%, t1# For n% = 0 To 10000 : s$(n%) = RandomString : Next n% p% = Int(Rnd * 10001) s$(p%) = "Hello" Trace p% Trace s$(p%) Debug "Find Unsorted" t1# = Timer p% = FindString(s$(), "Hello", FIND_UNSORTED) Trace Timer - t1# Trace p% If p% = -1 : Debug "Not found" Else : Trace s$(p%) EndIf Debug "Find Unsorted and Case Insensitive" t1# = Timer p% = FindString(s$(), "HelLo", FIND_UNSORTED | FIND_CASEINSENSITIVE) Trace Timer - t1# Trace p% If p% = -1 : Debug "Not found" Else : Trace s$(p%) EndIf QSort s$() Debug "Find Sorted" t1# = Timer p% = FindString(s$(), "Hello") Trace Timer - t1# Trace p% If p% = -1 : Debug "Not found" Else : Trace s$(p%) EndIf Debug "Find Sorted and Case Insensitive" t1# = Timer p% = FindString(s$(), "HelLo", FIND_UNSORTED | FIND_CASEINSENSITIVE) Trace Timer - t1# Trace p% If p% = -1 : Debug "Not found" Else : Trace s$(p%) EndIf Debug.Show Function FindString(ByRef a$(), s$, Optional options%) Global Const FIND_SORTED = 0, FIND_UNSORTED = 1, FIND_CASESENSITIVE = 0, FIND_CASEINSENSITIVE = 2 Local hpos% = UBound(a$()), lpos% = LBound(a$()), n%, ret% If options% = 0 Return FindStringSub(a$(), s$) Else Local sort$(lpos% To hpos%), sort%(lpos% To hpos%) If Flag(options%, FIND_CASEINSENSITIVE) s$ = Upper(s$) For n% = lpos% To hpos% : sort$(n%) = Upper(a$(n%)) : sort%(n%) = n% : Next n% Else For n% = lpos% To hpos% : sort$(n%) = a$(n%) : sort%(n%) = n% : Next n% EndIf QSort sort$(+), (hpos% - lpos%) + 1, sort%() ret% = FindStringSub(sort$(), s$) Return Iif(ret% = -1, -1, sort%(ret%)) EndIf EndFunction Function FindStringSub(ByRef a$(), s$) Local hpos% = UBound(a$()), lpos% = LBound(a$()), pos%, rpos% = (hpos% - lpos% + 1) / 2 pos% = lpos% + rpos% While a$(pos%) <> s$ If a$(pos%) > s$ If pos% = LBound(a$()) || a$(pos% - 1) < s$ Then Return -1 rpos% = Int(rpos% / 2) : pos% = lpos% + rpos% Else If pos% = UBound(a$()) || a$(pos% + 1) > s$ Then Return -1 lpos% = pos% + 1 : rpos% = Int(rpos% / 2) : pos% = MinI(lpos% + rpos%, hpos%) EndIf Wend Return pos% EndFunction Function Flag(parent%, flag%) // v2 If parent% < 0 Then parent% = - parent% : flag% = -flag% Return (And(parent%, flag%) = flag%) EndFunction Function RandomString Local ct% = Int(Rnd * 10) + Int(Rnd * 4) + 1, n%, t$ = Space(ct%) For n% = 1 To ct% : Mid(t$, n%, 1) = Chr(Int(Rnd * 57) + 64) : Next n% Return t$ EndFunction
|
|
|
Post by (X) on Mar 7, 2024 13:43:51 GMT 1
I wonder what difference a Compute Shader might make.
|
|
|
Post by dragonjim on Mar 7, 2024 15:03:48 GMT 1
Here is an update using a local Hash array, which cuts the speed on all processes to between a half and a third of the time.
Dim s$(100000), n%, p%, t1# For n% = 0 To 100000 : s$(n%) = RandomString : Next n% p% = Int(Rnd * 100001) s$(p%) = "Hello" Trace p% Trace s$(p%) Debug "Find Unsorted" t1# = Timer : p% = FindStringOld(s$(), "Hello", FIND_UNSORTED) : Trace Timer - t1# Trace p% : Debug Iif(p%, -1, "Not found"),s$(p%) t1# = Timer : p% = FindString(s$(), "Hello", FIND_UNSORTED) : Trace Timer - t1# Trace p% : Debug Iif(p% = -1, "Not found", s$(p%)) Debug "Find Unsorted and Case Insensitive" t1# = Timer : p% = FindStringOld(s$(), "HelLo", FIND_UNSORTED | FIND_CASEINSENSITIVE) : Trace Timer - t1# Trace p% : Debug Iif(p% = -1, "Not found", s$(p%)) t1# = Timer : p% = FindString(s$(), "HelLo", FIND_UNSORTED | FIND_CASEINSENSITIVE) : Trace Timer - t1# Trace p% : Debug Iif(p% = -1, "Not found", s$(p%)) QSort s$() Debug "Find Sorted" t1# = Timer : p% = FindStringOld(s$(), "Hello") : Trace Timer - t1# Trace p% : Debug Iif(p% = -1, "Not found", s$(p%)) t1# = Timer : p% = FindString(s$(), "Hello") : Trace Timer - t1# Trace p% : Debug Iif(p% = -1, "Not found", s$(p%)) Debug "Find Sorted and Case Insensitive" t1# = Timer : p% = FindStringOld(s$(), "HelLo", FIND_UNSORTED | FIND_CASEINSENSITIVE) : Trace Timer - t1# Trace p% : Debug Iif(p% = -1, "Not found", s$(p%)) t1# = Timer : p% = FindString(s$(), "HelLo", FIND_UNSORTED | FIND_CASEINSENSITIVE) : Trace Timer - t1# Trace p% : Debug Iif(p% = -1, "Not found", s$(p%)) Function FindString(ByRef a$(), s$, Optional options%) Global Const FIND_SORTED = 0, FIND_UNSORTED = 1, FIND_CASESENSITIVE = 0, FIND_CASEINSENSITIVE = 2 Local hpos% = UBound(a$()), lpos% = LBound(a$()), n%, pos%, ret%, rpos% = (hpos% - lpos% + 1) / 2 If options% = 0 pos% = lpos% + rpos% While a$(pos%) <> s$ If a$(pos%) > s$ If pos% = LBound(a$()) || a$(pos% - 1) < s$ Then Return -1 rpos% = Int(rpos% / 2) : pos% = lpos% + rpos% Else If pos% = UBound(a$()) || a$(pos% + 1) > s$ Then Return -1 lpos% = pos% + 1 : rpos% = Int(rpos% / 2) : pos% = MinI(lpos% + rpos%, hpos%) EndIf Wend Return pos% Else Local sort As Hash Int32 For n% = lpos% To hpos% If ! sort[? a$(n%)] Then Hash Add sort[a$(n%)], n% Next n% If sort[? s$] Then Return sort[s$] Return -1 EndIf Function FindStringOld(ByRef a$(), s$, Optional options%) Global Const FIND_SORTED = 0, FIND_UNSORTED = 1, FIND_CASESENSITIVE = 0, FIND_CASEINSENSITIVE = 2 Local hpos% = UBound(a$()), lpos% = LBound(a$()), n%, ret% If options% = 0 Return FindStringSub(a$(), s$) Else Local sort$(lpos% To hpos%), sort%(lpos% To hpos%) If Flag(options%, FIND_CASEINSENSITIVE) s$ = Upper(s$) For n% = lpos% To hpos% : sort$(n%) = Upper(a$(n%)) : sort%(n%) = n% : Next n% Else For n% = lpos% To hpos% : sort$(n%) = a$(n%) : sort%(n%) = n% : Next n% EndIf QSort sort$(+), (hpos% - lpos%) + 1, sort%() ret% = FindStringSub(sort$(), s$) Return Iif(ret% = -1, -1, sort%(ret%)) EndIf EndFunction Function FindStringSub(ByRef a$(), s$) Local hpos% = UBound(a$()), lpos% = LBound(a$()), pos%, rpos% = (hpos% - lpos% + 1) / 2 pos% = lpos% + rpos% While a$(pos%) <> s$ If a$(pos%) > s$ If pos% = LBound(a$()) || a$(pos% - 1) < s$ Then Return -1 rpos% = Int(rpos% / 2) : pos% = lpos% + rpos% Else If pos% = UBound(a$()) || a$(pos% + 1) > s$ Then Return -1 lpos% = pos% + 1 : rpos% = Int(rpos% / 2) : pos% = MinI(lpos% + rpos%, hpos%) EndIf Wend Return pos% EndFunction Function Flag(parent%, flag%) // v2 If parent% < 0 Then parent% = - parent% : flag% = -flag% Return (And(parent%, flag%) = flag%) EndFunction Function RandomString Local ct% = Int(Rnd * 10) + Int(Rnd * 4) + 1, n%, t$ = Space(ct%) For n% = 1 To ct% : Mid(t$, n%, 1) = Chr(Int(Rnd * 57) + 64) : Next n% Return t$ EndFunction
|
|
|
Post by dragonjim on Mar 7, 2024 15:06:43 GMT 1
I wonder what difference a Compute Shader might make. A compute shader may be useful in converting strings to Upper case; sadly, as the search is linear, I doubt it would have much affect on the actual search time.
|
|
|
Post by Roger Cabo on Mar 11, 2024 19:06:47 GMT 1
Hi everyone,
transferring a large string array to GPU memory and conducting the search with a compute shader could significantly speed up 100 times certain search algorithms that are well-suited for parallelization.
With w100,000 entries and a total size of 500 MB, it's important to consider the overhead of data transfer, but if the data is transferred once followed by intensive searching, this approach could be beneficial.
The key is optimizing the search algorithm for parallel processing on the GPU.
But this is a lot work because shader code can't compare strings, but only bytes.
|
|
|
Post by dragonjim on Mar 11, 2024 23:37:54 GMT 1
Multithreading would be easier to enact but only if you have enough cores to make it worth it.
|
|
|
Post by Roger Cabo on Mar 12, 2024 10:08:45 GMT 1
Multithreading would be easier to enact but only if you have enough cores to make it worth it. Small code to compare in sse2: section .text global _start
_start: ; Assumptions: ; rsi = address of the first string (search string) ; rdi = address of the second string (memory string) ; rcx = length of the first string (search string) ; rdx = length of the second string (memory string)
; Compare lengths, exit if search string is longer than memory string cmp rcx, rdx ja exit_program ; if search string length > memory string length, exit
; Calculate the number of 16-byte blocks and the remaining bytes for comparison mov rax, rcx shr rax, 4 ; Number of full 16-byte blocks mov rbx, rcx and rbx, 15 ; Remaining bytes after the full blocks
compare_loop: test rax, rax jz compare_remainder ; Jump to compare the remainder if no full blocks left
; Load 16 bytes from both strings movdqu xmm0, [rsi] ; Unaligned load from the search string movdqu xmm1, [rdi] ; Unaligned load from the memory string
; Compare byte-wise pcmpeqb xmm0, xmm1
; Move the comparison result to a general-purpose register pmovmskb eax, xmm0
; Check if all bytes were equal cmp eax, 0xFFFF jne mismatch_found ; If not all bytes match, jump to mismatch handling
; Prepare for the next iteration add rsi, 16 add rdi, 16 dec rax jmp compare_loop
compare_remainder: ; Code to compare the remaining bytes (if rbx > 0) ; Similar logic as above, but for the remainder less than 16 bytes ; This part is left as an exercise or can be implemented using scalar comparisons
mismatch_found: ; Handle the case where a mismatch is detected ; For simplicity, we'll just exit the program jmp exit_program
exit_program: ; Exit the program (Placeholder, actual implementation depends on the environment) ; For Linux, you could use: ; mov eax, 60 ; syscall: exit ; xor edi, edi ; status: 0 ; syscall
-----------+----------- This example outlines the core logic for comparing two strings using SSE2 instructions while considering the lengths of both the search string and the memory string. If the search string is longer than the memory string, the program exits early. For full functionality, additional code would be needed to handle the comparison of the remaining bytes when the string length is not a multiple of 16. Also, keep in mind that this example assumes a Linux environment for the exit routine, and actual implementation details might vary depending on your specific requirements and operating win system. -----+------ ; Assuming rcx holds the address of the first string (str1), ; rdx holds the address of the second string (str2), ; and r8 holds the length of the strings to be compared.
section .text global StringCompare StringCompare: ; Initialize r9 with the number of 16-byte blocks to process mov r9, r8 shr r9, 4
compare_loop: test r9, r9 jz compare_remainder ; Jump to remainder if no full blocks left
; Load 16 bytes from both strings movdqu xmm0, [rcx] ; Unaligned load from the first string movdqu xmm1, [rdx] ; Unaligned load from the second string
; Compare byte-wise pcmpeqb xmm0, xmm1
; Move the comparison result to a general-purpose register pmovmskb eax, xmm0
; Check if all bytes were equal cmp eax, 0xFFFF jne mismatch_found ; If not all bytes match, jump to mismatch handling
; Prepare for the next iteration add rcx, 16 add rdx, 16 dec r9 jmp compare_loop
compare_remainder: ; Code to compare the remaining bytes (if any), handling less than 16 bytes ; This part is left as an exercise or can be implemented using scalar comparisons
mismatch_found: ; Handle the case where a mismatch is detected ; For simplicity, we'll just return here. Actual implementation might signal ; the mismatch to the calling function through a register or a flag.
ret
For sure you need an array layout that is completely free of string descriptors.. Works on amd and Intel. ================= The provided assembly code is designed to compare two strings byte-by-byte using SSE2 instructions, an efficient method for parallel data processing on x86-64 architectures. It assumes the first string's address is in the rcx register, the second string's address in rdx, and the length of the strings to be compared in r8. The code works by dividing the comparison into 16-byte blocks, leveraging the SIMD (Single Instruction, Multiple Data) capabilities of SSE2 to compare these blocks simultaneously. Here's a brief rundown of the process: Initialization: The code starts by calculating the number of 16-byte blocks that need to be compared, storing this count in r9. Comparison Loop: It then enters a loop (compare_loop) where it loads 16 bytes from each string into SSE2 registers (xmm0 and xmm1) and compares them using the pcmpeqb instruction. This instruction compares each corresponding byte in two 128-bit registers and sets each byte in the destination to all 1s if they are equal, or all 0s if not. Result Checking: After each block comparison, the pmovmskb instruction is used to move the comparison result into a general-purpose register (eax), effectively converting the result into a bitmask where each bit represents the comparison result of each byte. If all bytes match, this value will be 0xFFFF, indicating a perfect match for that block. Mismatch Handling: If any byte within a block doesn't match (detected by cmp eax, 0xFFFF and jne mismatch_found), the code jumps to the mismatch_found section, where it would typically signal a mismatch. In this simplified example, it just returns, but in a real-world application, you might handle mismatches differently. * Remainder Handling: After all full 16-byte blocks are processed, any remaining bytes (less than 16) would need to be compared separately. This part of the code is noted but not implemented in the example, as handling would depend on specific requirements and could involve scalar comparison instructions.
|
|
|
Post by Roger Cabo on Mar 12, 2024 13:44:15 GMT 1
I thought string contents are packed sequentially +1 in an string array. adr: 008(4Bbytes): Len 100 (4Bbytes) : ary$(0) = String$(100, 65) <--- 008 adr: 116(4Bbytes): Len 100 (4Bbytes) : ary$(1) = String$(100, 65) <--- 116
--------------------------------------------------------------
I know there is also a string descriptor and the length in 4Bytes stored prior the string data. But it seems each element is stored randomly anywhere? If yes, I know why I never read any arrays in the memory directly. // restart me several times and watch the offsets
Dim ary$(1) Dim i%
For i% = 0 To UBound(ary$()) ary$(i%) = String$(100, 65) Next i%
MsgBox ary$(1) & " <> " & ary$(0) & " --- what??? -->> V:ary$(1) - V:ary$(0) = " & V:ary$(0) - V:ary$(1)
|
|
|
Post by dragonjim on Mar 12, 2024 15:50:45 GMT 1
I thought string contents are packed sequentially +1 in an string array. adr: 008(4Bbytes): Len 100 (4Bbytes) : ary$(0) = String$(100, 65) <--- 008 adr: 116(4Bbytes): Len 100 (4Bbytes) : ary$(1) = String$(100, 65) <--- 116 --------------------------------------------------------------
I know there is also a string descriptor and the length in 4Bytes stored prior the string data. But it seems each element is stored randomly anywhere? If yes, I know why I never read any arrays in the memory directly. You are correct. Unlike almost every other type of array (including fixed strings) where the data is stored as a block of sequential data (reducing the need for an element address table), variable length string arrays have an element address table (which is a sequential data block) but each element (except null strings which are stored as 0 in the element address block) has memory allocated to them from the program heap depending on their length and when they were created. What is more, the assigned memory location can change every time the element's contents are changed. The only other array types that I can think of that work in a similar fashion are Variants and Objects, although in those cases, the 16-byte variable is stored in a sequential data block (so there is no need for an element address block) BUT for any strings, objects or other large, variable-length data types, the 16-byte element serves purely as a address pointer to the data rather than a data container. Therefore, to enact your machine code example, you would need to access the element address table and use those values as pointers to the string data.
|
|
|
Post by Roger Cabo on Mar 12, 2024 15:53:48 GMT 1
Here is another example that use Instr(). It was not possible to speed-beat this source code with any custom written assembler.. excluding sse2. Compare this to the other ones, plz. The string "Hello" is found at the position 100000 everytime
' Declare Dim i% Dim _s$ = "Hello" Dim slen% = Len(_s$)
Dim a$(100000) Dim peak% = UBound(a$()) Dim t# = Timer
' Fill For i% = 0 To peak% a$(i%) = String$(Random(50), 65 + Random(32)) Next i% a$(peak%) = _s$
' Compare Dim t# = Timer For i% = 0 To peak% If Len(a$(i%)) = slen% If InStr(a$(i%), _s$) t# = Timer - t# EndIf EndIf Next i%
MsgBox t# & " Position: " & peak%
|
|
|
Post by Roger Cabo on Mar 12, 2024 16:11:33 GMT 1
You are correct. Unlike almost every other type of array (including fixed strings) where the data is stored as a block of sequential data (reducing the need for an element address table), variable length string arrays have an element address table (which is a sequential data block) but each element (except null strings which are stored as 0 in the element address block) has memory allocated to them from the program heap depending on their length and when they were created. What is more, the assigned memory location can change every time the element's contents are changed. The only other array types that I can think of that work in a similar fashion are Variants and Objects, although in those cases, the 16-byte variable is stored in a sequential data block (so there is no need for an element address block) BUT for any strings, objects or other large, variable-length data types, the 16-byte element serves purely as a address pointer to the data rather than a data container. Therefore, to enact your machine code example, you would need to access the element address table and use those values as pointers to the string data. I think Frank Ostrowsky thought about to prevent memory moving, while using data is stored as a block of sequential data. a$(1 to 1000000) = String$(Random(50000), "+") Then a$(0) = "ouch!" Now you have to update all 1000000 descriptors and MOVE the complete block for 5 bytes. Not sure how c#/++ does, but I'm sure they use sse2 and other freaky stuff to get this speed for string arrays. Or?
|
|
|
Post by dragonjim on Mar 12, 2024 16:28:33 GMT 1
I agree, it is not impossible to store the string data as block: you could simply concatenate the strings into one long memory block which could be increased or decreased as needed, although you will still need the element table to reference where each string starts. But, as you say, you would then need to update all descriptors beyond the string which has changed which would slow things down, especially back in the days of the Atari ST, Amiga or the old PCs running DOS and early Windows.
One other issue in earlier machines was that, if you had a large array of long strings, you may not actually have a single memory block big enough to store it, whereas slotting in the separate strings where there was space was far more practical and potentially allowed for larger arrays.
As for C++ and the rest, they rely upon the system they are running on and, sometimes, the compiler they are using, to determine how and where string arrays (or char arrays) are stored, as far as I understand. Certainly - and again, to the best of my knowledge - byte level hacking of strings is strongly discouraged as it can give some very odd results.
|
|
|
Post by dragonjim on Mar 12, 2024 16:37:28 GMT 1
Here is another example that use Instr(). It was not possible to speed-beat this source code with any custom written assembler.. excluding sse2. Compare this to the other ones, plz. The string "Hello" is found at the position 100000 everytime ' Declare Dim i% Dim _s$ = "Hello" Dim slen% = Len(_s$)
Dim a$(100000) Dim peak% = UBound(a$()) Dim t# = Timer
' Fill For i% = 0 To peak% a$(i%) = String$(Random(50), 65 + Random(32)) Next i% a$(peak%) = _s$
' Compare Dim t# = Timer For i% = 0 To peak% If Len(a$(i%)) = slen% If InStr(a$(i%), _s$) t# = Timer - t# EndIf EndIf Next i%
MsgBox t# & " Position: " & peak%
Good tip - checking the length before comparing the strings gives a small increase in speed (the bottom of these timings): TRACE:(13):Timer - t1# = 9.4000333774602e-06 TRACE:(14):p% = 15396 Hello TRACE:(15):Timer - t1# = 9.00002521575516e-06 TRACE:(16):p% = 15396 Hello Simply change While a$(pos%) <> s$ to While Len(a$(pos%)) <> Len(s$) || a$(pos%) <> s$
|
|
|
Post by (X) on Mar 12, 2024 20:18:17 GMT 1
I am getting marginally faster results with F_FindStr() Function F_FindStr(ByRef a$(), s$, Optional options%) As Long 'Naked Print "Option:"; options% Dim U% = UBound(a$()) Dim L% = LBound(a$()) Dim i As Register Int If (options% == 0) // Sorted & Case Sensitive Do If (s$ < a$(i)) U = i i = L Add ((U Sub L) Div 2) ElseIf (s$ > a$(i)) L = i i = L Add ((U Sub L) Div 2) Else // Match Return i EndIf Loop While (L < U) Return -1 Else Local sort As Hash Int32 Local n% For n = L To U If ! sort[? a$(n)] Then Hash Add sort[a$(n)], n Next n If sort[? s$] Then Return sort[s$] Return -1 EndIf EndFunc Attachments:Demo String Search.G32 (8.16 KB)
|
|
|
Post by (X) on Mar 13, 2024 16:31:22 GMT 1
There is some variance in results due to the random placement of the search string in the string array, but, overall this solution seems up to twice as fast. I've only tried to improve option%==0: Sorted and Case_Sensitive
Function F_FindStr(ByRef a$(), s$, Optional options%) As Long 'Print "Option:"; options% Local U% = UBound(a$()) //: Print "U:"; U Local L% = LBound(a$()) //: Print "L:"; L Local D% = (U Sub L Add 1) Div 2 //: Print "D:"; D Local i% = D //: Print "i:"; i If options% == 0 // Sorted & Case Sensitive Do If (s$ < a$(i)) D = D / 2 // "D Div 2" and "D \ 2" and "D Shr 1" and "Floor(D/2)" sometimes leads to not found... Sub i, D ElseIf (s$ > a$(i)) D = D / 2 Add i, D Else //If (s$ == a$(i)) Return i EndIf Loop While D <> 0 Return -1 Else Local sort As Hash Long For i = L% To U% 'If ! sort[? a$(i%)] Then Hash Add sort[a$(i%)], i% If !(sort[? a$(i)]) sort[a$(i)] = i // A tiny bit faster... Next i If sort[? s$] Then Return sort[s$] Return -1 EndIf EndFunc Debug window...
And for a million and 1 elements...
And for 10 million... there is a significant improvement.
|
|