Find String in an Array

dragonjim
Senior Member

Posts: 489

Find String in an Array Mar 7, 2024 13:07:03 GMT 1

Quote

Post by dragonjim on Mar 7, 2024 13:07:03 GMT 1

Not sure if this has been done already, but here is a short function to find a string in a string array.

If anyone can improve it, especially for unsorted or case insensitive searches, then please do and post below.

Dim s$(10000), n%, p%, t1#
For n% = 0 To 10000 : s$(n%) = RandomString : Next n%
p% = Int(Rnd * 10001)
s$(p%) = "Hello"
Trace p%
Trace s$(p%)
Debug "Find Unsorted"
t1# = Timer
p% = FindString(s$(), "Hello", FIND_UNSORTED)
Trace Timer - t1#
Trace p%
If p% = -1 : Debug "Not found"
Else : Trace s$(p%)
EndIf
Debug "Find Unsorted and Case Insensitive"
t1# = Timer
p% = FindString(s$(), "HelLo", FIND_UNSORTED | FIND_CASEINSENSITIVE)
Trace Timer - t1#
Trace p%
If p% = -1 : Debug "Not found"
Else : Trace s$(p%)
EndIf
QSort s$()
Debug "Find Sorted"
t1# = Timer
p% = FindString(s$(), "Hello")
Trace Timer - t1#
Trace p%
If p% = -1 : Debug "Not found"
Else : Trace s$(p%)
EndIf
Debug "Find Sorted and Case Insensitive"
t1# = Timer
p% = FindString(s$(), "HelLo", FIND_UNSORTED | FIND_CASEINSENSITIVE)
Trace Timer - t1#
Trace p%
If p% = -1 : Debug "Not found"
Else : Trace s$(p%)
EndIf
Debug.Show
Function  FindString(ByRef a$(), s$, Optional options%)
  Global Const FIND_SORTED = 0, FIND_UNSORTED = 1, FIND_CASESENSITIVE = 0, FIND_CASEINSENSITIVE = 2
  Local hpos% = UBound(a$()), lpos% = LBound(a$()), n%, ret%
  If options% = 0
    Return FindStringSub(a$(), s$)
  Else
    Local sort$(lpos% To hpos%), sort%(lpos% To hpos%)
    If Flag(options%, FIND_CASEINSENSITIVE)
      s$ = Upper(s$)
      For n% = lpos% To hpos% : sort$(n%) = Upper(a$(n%)) : sort%(n%) = n% : Next n%
    Else
      For n% = lpos% To hpos% : sort$(n%) = a$(n%) : sort%(n%) = n% : Next n%
    EndIf
    QSort sort$(+), (hpos% - lpos%) + 1, sort%()
    ret% = FindStringSub(sort$(), s$)
    Return Iif(ret% = -1, -1, sort%(ret%))
  EndIf
EndFunction
Function  FindStringSub(ByRef a$(), s$)
  Local hpos% = UBound(a$()), lpos% = LBound(a$()), pos%, rpos% = (hpos% - lpos% + 1) / 2
  pos% = lpos% + rpos%
  While a$(pos%) <> s$
    If a$(pos%) > s$
      If pos% = LBound(a$()) || a$(pos% - 1) < s$ Then Return -1
      rpos% = Int(rpos% / 2) : pos% = lpos% + rpos%
    Else
      If pos% = UBound(a$()) || a$(pos% + 1) > s$ Then Return -1
      lpos% = pos% + 1 : rpos% = Int(rpos% / 2) : pos% = MinI(lpos% + rpos%, hpos%)
    EndIf
  Wend
  Return pos%
EndFunction
Function  Flag(parent%, flag%)                                                                  // v2
  If parent% < 0 Then parent% = - parent% : flag% = -flag%
  Return (And(parent%, flag%) = flag%)
EndFunction
Function  RandomString
  Local ct% = Int(Rnd * 10) + Int(Rnd * 4) + 1,  n%, t$ = Space(ct%)
  For n% = 1 To ct% : Mid(t$, n%, 1) = Chr(Int(Rnd * 57) + 64) : Next n%
  Return t$
EndFunction

(X) God I am not really God. I did not pick this "label". It just means I posted > 500 times. Posts: 1,363	Find String in an Array Mar 7, 2024 13:43:51 GMT 1 Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by (X) on Mar 7, 2024 13:43:51 GMT 1 I wonder what difference a Compute Shader might make.
	I code for FUN...

dragonjim
Senior Member

Posts: 489

Find String in an Array Mar 7, 2024 15:03:48 GMT 1

Quote

Post by dragonjim on Mar 7, 2024 15:03:48 GMT 1

Here is an update using a local Hash array, which cuts the speed on all processes to between a half and a third of the time.

Dim s$(100000), n%, p%, t1#
For n% = 0 To 100000 : s$(n%) = RandomString : Next n%
p% = Int(Rnd * 100001)
s$(p%) = "Hello"
Trace p%
Trace s$(p%)
Debug "Find Unsorted"
t1# = Timer : p% = FindStringOld(s$(), "Hello", FIND_UNSORTED) : Trace Timer - t1#
Trace p% : Debug Iif(p%, -1, "Not found"),s$(p%)
t1# = Timer : p% = FindString(s$(), "Hello", FIND_UNSORTED) : Trace Timer - t1#
Trace p% : Debug Iif(p% = -1, "Not found", s$(p%))
Debug "Find Unsorted and Case Insensitive"
t1# = Timer : p% = FindStringOld(s$(), "HelLo", FIND_UNSORTED | FIND_CASEINSENSITIVE) : Trace Timer - t1#
Trace p% : Debug Iif(p% = -1, "Not found", s$(p%))
t1# = Timer : p% = FindString(s$(), "HelLo", FIND_UNSORTED | FIND_CASEINSENSITIVE) : Trace Timer - t1#
Trace p% : Debug Iif(p% = -1, "Not found", s$(p%))
QSort s$()
Debug "Find Sorted"
t1# = Timer : p% = FindStringOld(s$(), "Hello") : Trace Timer - t1#
Trace p% : Debug Iif(p% = -1, "Not found", s$(p%))
t1# = Timer : p% = FindString(s$(), "Hello") : Trace Timer - t1#
Trace p% : Debug Iif(p% = -1, "Not found", s$(p%))
Debug "Find Sorted and Case Insensitive"
t1# = Timer : p% = FindStringOld(s$(), "HelLo", FIND_UNSORTED | FIND_CASEINSENSITIVE) : Trace Timer - t1#
Trace p% : Debug Iif(p% = -1, "Not found", s$(p%))
t1# = Timer : p% = FindString(s$(), "HelLo", FIND_UNSORTED | FIND_CASEINSENSITIVE) : Trace Timer - t1#
Trace p% : Debug Iif(p% = -1, "Not found", s$(p%))
Function  FindString(ByRef a$(), s$, Optional options%)
  Global Const FIND_SORTED = 0, FIND_UNSORTED = 1, FIND_CASESENSITIVE = 0, FIND_CASEINSENSITIVE = 2
  Local hpos% = UBound(a$()), lpos% = LBound(a$()), n%, pos%, ret%, rpos% = (hpos% - lpos% + 1) / 2
  If options% = 0
    pos% = lpos% + rpos%
    While a$(pos%) <> s$
      If a$(pos%) > s$
        If pos% = LBound(a$()) || a$(pos% - 1) < s$ Then Return -1
        rpos% = Int(rpos% / 2) : pos% = lpos% + rpos%
      Else
        If pos% = UBound(a$()) || a$(pos% + 1) > s$ Then Return -1
        lpos% = pos% + 1 : rpos% = Int(rpos% / 2) : pos% = MinI(lpos% + rpos%, hpos%)
      EndIf
    Wend
    Return pos%
  Else
    Local sort As Hash Int32
    For n% = lpos% To hpos%
      If ! sort[? a$(n%)] Then Hash Add sort[a$(n%)], n%
    Next n%
    If sort[? s$] Then Return sort[s$]
    Return -1
  EndIf
Function  FindStringOld(ByRef a$(), s$, Optional options%)
  Global Const FIND_SORTED = 0, FIND_UNSORTED = 1, FIND_CASESENSITIVE = 0, FIND_CASEINSENSITIVE = 2
  Local hpos% = UBound(a$()), lpos% = LBound(a$()), n%, ret%
  If options% = 0
    Return FindStringSub(a$(), s$)
  Else
    Local sort$(lpos% To hpos%), sort%(lpos% To hpos%)
    If Flag(options%, FIND_CASEINSENSITIVE)
      s$ = Upper(s$)
      For n% = lpos% To hpos% : sort$(n%) = Upper(a$(n%)) : sort%(n%) = n% : Next n%
    Else
      For n% = lpos% To hpos% : sort$(n%) = a$(n%) : sort%(n%) = n% : Next n%
    EndIf
    QSort sort$(+), (hpos% - lpos%) + 1, sort%()
    ret% = FindStringSub(sort$(), s$)
    Return Iif(ret% = -1, -1, sort%(ret%))
  EndIf
EndFunction
Function  FindStringSub(ByRef a$(), s$)
  Local hpos% = UBound(a$()), lpos% = LBound(a$()), pos%, rpos% = (hpos% - lpos% + 1) / 2
  pos% = lpos% + rpos%
  While a$(pos%) <> s$
    If a$(pos%) > s$
      If pos% = LBound(a$()) || a$(pos% - 1) < s$ Then Return -1
      rpos% = Int(rpos% / 2) : pos% = lpos% + rpos%
    Else
      If pos% = UBound(a$()) || a$(pos% + 1) > s$ Then Return -1
      lpos% = pos% + 1 : rpos% = Int(rpos% / 2) : pos% = MinI(lpos% + rpos%, hpos%)
    EndIf
  Wend
  Return pos%
EndFunction
Function  Flag(parent%, flag%)                                                                  // v2
  If parent% < 0 Then parent% = - parent% : flag% = -flag%
  Return (And(parent%, flag%) = flag%)
EndFunction
Function  RandomString
  Local ct% = Int(Rnd * 10) + Int(Rnd * 4) + 1,  n%, t$ = Space(ct%)
  For n% = 1 To ct% : Mid(t$, n%, 1) = Chr(Int(Rnd * 57) + 64) : Next n%
  Return t$
EndFunction

dragonjim
Senior Member

Posts: 489

Find String in an Array Mar 7, 2024 15:06:43 GMT 1

Quote

Post by dragonjim on Mar 7, 2024 15:06:43 GMT 1

Mar 7, 2024 13:43:51 GMT 1 (X) said:

I wonder what difference a Compute Shader might make.

A compute shader may be useful in converting strings to Upper case; sadly, as the search is linear, I doubt it would have much affect on the actual search time.

Roger Cabo
God

Posts: 711

Find String in an Array Mar 11, 2024 19:06:47 GMT 1 via mobile (X) likes this

Quote

Post by Roger Cabo on Mar 11, 2024 19:06:47 GMT 1

Hi everyone,

transferring a large string array to GPU memory and conducting the search with a compute shader could significantly speed up 100 times certain search algorithms that are well-suited for parallelization.

With w100,000 entries and a total size of 500 MB, it's important to consider the overhead of data transfer, but if the data is transferred once followed by intensive searching, this approach could be beneficial.

The key is optimizing the search algorithm for parallel processing on the GPU.

But this is a lot work because shader code can't compare strings, but only bytes.

Last Edit: Mar 11, 2024 19:19:12 GMT 1 by Roger Cabo

dragonjim Senior Member Posts: 489	Find String in an Array Mar 11, 2024 23:37:54 GMT 1 via mobile Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by dragonjim on Mar 11, 2024 23:37:54 GMT 1 Multithreading would be easier to enact but only if you have enough cores to make it worth it.

Roger Cabo
God

Posts: 711

Find String in an Array Mar 12, 2024 10:08:45 GMT 1 via mobile

Quote

Post by Roger Cabo on Mar 12, 2024 10:08:45 GMT 1

Mar 11, 2024 23:37:54 GMT 1 dragonjim said:

Multithreading would be easier to enact but only if you have enough cores to make it worth it.

Small code to compare in sse2:

section .text
global _start

_start:
; Assumptions:
; rsi = address of the first string (search string)
; rdi = address of the second string (memory string)
; rcx = length of the first string (search string)
; rdx = length of the second string (memory string)

; Compare lengths, exit if search string is longer than memory string
cmp rcx, rdx
ja exit_program ; if search string length > memory string length, exit

; Calculate the number of 16-byte blocks and the remaining bytes for comparison
mov rax, rcx
shr rax, 4 ; Number of full 16-byte blocks
mov rbx, rcx
and rbx, 15 ; Remaining bytes after the full blocks

compare_loop:
test rax, rax
jz compare_remainder ; Jump to compare the remainder if no full blocks left

; Load 16 bytes from both strings
movdqu xmm0, [rsi] ; Unaligned load from the search string
movdqu xmm1, [rdi] ; Unaligned load from the memory string

; Compare byte-wise
pcmpeqb xmm0, xmm1

; Move the comparison result to a general-purpose register
pmovmskb eax, xmm0

; Check if all bytes were equal
cmp eax, 0xFFFF
jne mismatch_found ; If not all bytes match, jump to mismatch handling

; Prepare for the next iteration
add rsi, 16
add rdi, 16
dec rax
jmp compare_loop

compare_remainder:
; Code to compare the remaining bytes (if rbx > 0)
; Similar logic as above, but for the remainder less than 16 bytes
; This part is left as an exercise or can be implemented using scalar comparisons

mismatch_found:
; Handle the case where a mismatch is detected
; For simplicity, we'll just exit the program
jmp exit_program

exit_program:
; Exit the program (Placeholder, actual implementation depends on the environment)
; For Linux, you could use:
; mov eax, 60 ; syscall: exit
; xor edi, edi ; status: 0
; syscall

-----------+-----------

This example outlines the core logic for comparing two strings using SSE2 instructions while considering the lengths of both the search string and the memory string.

If the search string is longer than the memory string, the program exits early. For full functionality, additional code would be needed to handle the comparison of the remaining bytes when the string length is not a multiple of 16.

Also, keep in mind that this example assumes a Linux environment for the exit routine, and actual implementation details might vary depending on your specific requirements and operating win system.

-----+------

; Assuming rcx holds the address of the first string (str1),
; rdx holds the address of the second string (str2),
; and r8 holds the length of the strings to be compared.

section .text
global StringCompare
StringCompare:
; Initialize r9 with the number of 16-byte blocks to process
mov r9, r8
shr r9, 4

compare_loop:
test r9, r9
jz compare_remainder ; Jump to remainder if no full blocks left

; Load 16 bytes from both strings
movdqu xmm0, [rcx] ; Unaligned load from the first string
movdqu xmm1, [rdx] ; Unaligned load from the second string

; Compare byte-wise
pcmpeqb xmm0, xmm1

; Move the comparison result to a general-purpose register
pmovmskb eax, xmm0

; Check if all bytes were equal
cmp eax, 0xFFFF
jne mismatch_found ; If not all bytes match, jump to mismatch handling

; Prepare for the next iteration
add rcx, 16
add rdx, 16
dec r9
jmp compare_loop

compare_remainder:
; Code to compare the remaining bytes (if any), handling less than 16 bytes
; This part is left as an exercise or can be implemented using scalar comparisons

mismatch_found:
; Handle the case where a mismatch is detected
; For simplicity, we'll just return here. Actual implementation might signal
; the mismatch to the calling function through a register or a flag.

ret

For sure you need an array layout that is completely free of string descriptors..

Works on amd and Intel.

=================

The provided assembly code is designed to compare two strings byte-by-byte using SSE2 instructions, an efficient method for parallel data processing on x86-64 architectures. It assumes the first string's address is in the rcx register, the second string's address in rdx, and the length of the strings to be compared in r8. The code works by dividing the comparison into 16-byte blocks, leveraging the SIMD (Single Instruction, Multiple Data) capabilities of SSE2 to compare these blocks simultaneously.

Here's a brief rundown of the process:

Initialization: The code starts by calculating the number of 16-byte blocks that need to be compared, storing this count in r9.

Comparison Loop: It then enters a loop (compare_loop) where it loads 16 bytes from each string into SSE2 registers (xmm0 and xmm1) and compares them using the pcmpeqb instruction. This instruction compares each corresponding byte in two 128-bit registers and sets each byte in the destination to all 1s if they are equal, or all 0s if not.

Result Checking: After each block comparison, the pmovmskb instruction is used to move the comparison result into a general-purpose register (eax), effectively converting the result into a bitmask where each bit represents the comparison result of each byte. If all bytes match, this value will be 0xFFFF, indicating a perfect match for that block.

Mismatch Handling: If any byte within a block doesn't match (detected by cmp eax, 0xFFFF and jne mismatch_found), the code jumps to the mismatch_found section, where it would typically signal a mismatch. In this simplified example, it just returns, but in a real-world application, you might handle mismatches differently.

* Remainder Handling: After all full 16-byte blocks are processed, any remaining bytes (less than 16) would need to be compared separately. This part of the code is noted but not implemented in the example, as handling would depend on specific requirements and could involve scalar comparison instructions.

Last Edit: Mar 12, 2024 13:03:13 GMT 1 by Roger Cabo

Roger Cabo
God

Posts: 711

Find String in an Array Mar 12, 2024 13:44:15 GMT 1

Quote

Post by Roger Cabo on Mar 12, 2024 13:44:15 GMT 1

I thought string contents are packed sequentially +1 in an string array.

adr: 008(4Bbytes): Len 100 (4Bbytes) : ary$(0) = String$(100, 65) <--- 008

adr: 116(4Bbytes): Len 100 (4Bbytes) : ary$(1) = String$(100, 65) <--- 116

--------------------------------------------------------------

I know there is also a string descriptor and the length in 4Bytes stored prior the string data.
But it seems each element is stored randomly anywhere?
If yes, I know why I never read any arrays in the memory directly.

// restart me several times and watch the offsets

Dim ary$(1)
Dim i%

For i% = 0 To UBound(ary$())
  ary$(i%) = String$(100, 65)
Next i%

MsgBox ary$(1) & " <> " & ary$(0) & " --- what??? -->> V:ary$(1) - V:ary$(0) =  " &  V:ary$(0) - V:ary$(1)

dragonjim
Senior Member

Posts: 489

Find String in an Array Mar 12, 2024 15:50:45 GMT 1

Quote

Post by dragonjim on Mar 12, 2024 15:50:45 GMT 1

Mar 12, 2024 13:44:15 GMT 1 Roger Cabo said:

I thought string contents are packed sequentially +1 in an string array.

adr: 008(4Bbytes): Len 100 (4Bbytes) : ary$(0) = String$(100, 65) <--- 008

adr: 116(4Bbytes): Len 100 (4Bbytes) : ary$(1) = String$(100, 65) <--- 116

--------------------------------------------------------------

I know there is also a string descriptor and the length in 4Bytes stored prior the string data.
But it seems each element is stored randomly anywhere?
If yes, I know why I never read any arrays in the memory directly.

You are correct. Unlike almost every other type of array (including fixed strings) where the data is stored as a block of sequential data (reducing the need for an element address table), variable length string arrays have an element address table (which is a sequential data block) but each element (except null strings which are stored as 0 in the element address block) has memory allocated to them from the program heap depending on their length and when they were created. What is more, the assigned memory location can change every time the element's contents are changed.

The only other array types that I can think of that work in a similar fashion are Variants and Objects, although in those cases, the 16-byte variable is stored in a sequential data block (so there is no need for an element address block) BUT for any strings, objects or other large, variable-length data types, the 16-byte element serves purely as a address pointer to the data rather than a data container.

Therefore, to enact your machine code example, you would need to access the element address table and use those values as pointers to the string data.

Roger Cabo
God

Posts: 711

Find String in an Array Mar 12, 2024 15:53:48 GMT 1

Quote

Post by Roger Cabo on Mar 12, 2024 15:53:48 GMT 1

Here is another example that use Instr(). It was not possible to speed-beat this source code with any custom written assembler.. excluding sse2.
Compare this to the other ones, plz. The string "Hello" is found at the position 100000 everytime

' Declare
Dim i%
Dim _s$ = "Hello"
Dim slen% = Len(_s$)

Dim a$(100000)
Dim peak% = UBound(a$())
Dim t# = Timer

' Fill
For i% = 0 To peak%
  a$(i%) = String$(Random(50), 65 + Random(32))
Next i%
a$(peak%) = _s$

' Compare
Dim t# = Timer
For i% = 0 To peak%
  If Len(a$(i%)) = slen%
    If InStr(a$(i%), _s$)
      t# = Timer - t#
    EndIf
  EndIf
Next i%

MsgBox t# &  "   Position: " & peak%

Roger Cabo
God

Posts: 711

Find String in an Array Mar 12, 2024 16:11:33 GMT 1

Quote

Post by Roger Cabo on Mar 12, 2024 16:11:33 GMT 1

Mar 12, 2024 15:50:45 GMT 1 dragonjim said:

You are correct. Unlike almost every other type of array (including fixed strings) where the data is stored as a block of sequential data (reducing the need for an element address table), variable length string arrays have an element address table (which is a sequential data block) but each element (except null strings which are stored as 0 in the element address block) has memory allocated to them from the program heap depending on their length and when they were created. What is more, the assigned memory location can change every time the element's contents are changed.

The only other array types that I can think of that work in a similar fashion are Variants and Objects, although in those cases, the 16-byte variable is stored in a sequential data block (so there is no need for an element address block) BUT for any strings, objects or other large, variable-length data types, the 16-byte element serves purely as a address pointer to the data rather than a data container.

Therefore, to enact your machine code example, you would need to access the element address table and use those values as pointers to the string data.

I think Frank Ostrowsky thought about to prevent memory moving, while using data is stored as a block of sequential data.

a$(1 to 1000000) = String$(Random(50000), "+")

Then a$(0) = "ouch!"
Now you have to update all 1000000 descriptors and MOVE the complete block for 5 bytes.

Not sure how c#/++ does, but I'm sure they use sse2 and other freaky stuff to get this speed for string arrays.
Or?

dragonjim
Senior Member

Posts: 489

Find String in an Array Mar 12, 2024 16:28:33 GMT 1

Quote

Post by dragonjim on Mar 12, 2024 16:28:33 GMT 1

I agree, it is not impossible to store the string data as block: you could simply concatenate the strings into one long memory block which could be increased or decreased as needed, although you will still need the element table to reference where each string starts. But, as you say, you would then need to update all descriptors beyond the string which has changed which would slow things down, especially back in the days of the Atari ST, Amiga or the old PCs running DOS and early Windows.

One other issue in earlier machines was that, if you had a large array of long strings, you may not actually have a single memory block big enough to store it, whereas slotting in the separate strings where there was space was far more practical and potentially allowed for larger arrays.

As for C++ and the rest, they rely upon the system they are running on and, sometimes, the compiler they are using, to determine how and where string arrays (or char arrays) are stored, as far as I understand. Certainly - and again, to the best of my knowledge - byte level hacking of strings is strongly discouraged as it can give some very odd results.

dragonjim
Senior Member

Posts: 489

Find String in an Array Mar 12, 2024 16:37:28 GMT 1

Quote

Post by dragonjim on Mar 12, 2024 16:37:28 GMT 1

Mar 12, 2024 15:53:48 GMT 1 Roger Cabo said:

Here is another example that use Instr(). It was not possible to speed-beat this source code with any custom written assembler.. excluding sse2.
Compare this to the other ones, plz. The string "Hello" is found at the position 100000 everytime

' Declare
Dim i%
Dim _s$ = "Hello"
Dim slen% = Len(_s$)

Dim a$(100000)
Dim peak% = UBound(a$())
Dim t# = Timer

' Fill
For i% = 0 To peak%
  a$(i%) = String$(Random(50), 65 + Random(32))
Next i%
a$(peak%) = _s$

' Compare
Dim t# = Timer
For i% = 0 To peak%
  If Len(a$(i%)) = slen%
    If InStr(a$(i%), _s$)
      t# = Timer - t#
    EndIf
  EndIf
Next i%

MsgBox t# &  "   Position: " & peak%

Good tip - checking the length before comparing the strings gives a small increase in speed (the bottom of these timings):

TRACE:(13):Timer - t1# =  9.4000333774602e-06
TRACE:(14):p% =  15396
Hello
TRACE:(15):Timer - t1# =  9.00002521575516e-06
TRACE:(16):p% =  15396
Hello

Simply change

While a$(pos%) <> s$

to

While Len(a$(pos%)) <> Len(s$) || a$(pos%) <> s$

(X)
God

I am not really God. I did not pick this "label". It just means I posted > 500 times.

Posts: 1,363

Find String in an Array Mar 12, 2024 20:18:17 GMT 1 dragonjim likes this

Quote

Post by (X) on Mar 12, 2024 20:18:17 GMT 1

I am getting marginally faster results with F_FindStr()

Function F_FindStr(ByRef a$(), s$, Optional options%) As Long 'Naked
  Print "Option:"; options%
  
  Dim U% = UBound(a$())
  Dim L% = LBound(a$())
  Dim i As Register Int
  
  If (options% == 0)  // Sorted & Case Sensitive
    Do
      If (s$ < a$(i))
        U = i
        i = L Add ((U Sub L) Div 2)
      ElseIf (s$ > a$(i))
        L = i
        i = L Add ((U Sub L) Div 2)
      Else  // Match
        Return i
      EndIf
    Loop While (L < U)
    Return -1
  Else
    Local sort As Hash Int32
    Local n%
    For n = L To U
      If ! sort[? a$(n)] Then Hash Add sort[a$(n)], n
    Next n
    If sort[? s$] Then Return sort[s$]
    Return -1
  EndIf
EndFunc

Attachments:

Demo String Search.G32 (8.16 KB)

Last Edit: Mar 12, 2024 20:23:58 GMT 1 by (X)

I code for FUN...

(X)
God

I am not really God. I did not pick this "label". It just means I posted > 500 times.

Posts: 1,363

Find String in an Array Mar 13, 2024 16:31:22 GMT 1 Roger Cabo likes this

Quote

Post by (X) on Mar 13, 2024 16:31:22 GMT 1

There is some variance in results due to the random placement of the search string in the string array, but, overall this solution seems up to twice as fast. I've only tried to improve option%==0: Sorted and Case_Sensitive

Function F_FindStr(ByRef a$(), s$, Optional options%) As Long
  'Print "Option:"; options%
  Local U% = UBound(a$())               //: Print "U:"; U
  Local L% = LBound(a$())               //: Print "L:"; L
  Local D% = (U Sub L Add 1)  Div 2     //: Print "D:"; D
  Local i% = D                          //: Print "i:"; i
  If options% == 0   // Sorted & Case Sensitive
    Do
      If (s$ < a$(i))
        D = D / 2   // "D Div 2" and "D \ 2" and "D Shr 1" and "Floor(D/2)" sometimes leads to not found...
        Sub i, D
      ElseIf (s$ > a$(i))
        D = D / 2
        Add i, D
      Else //If (s$ == a$(i))
        Return i
      EndIf
    Loop While D <> 0
    
    Return -1
    
  Else
    Local sort As Hash Long
    For i = L% To U%
      'If ! sort[? a$(i%)] Then Hash Add sort[a$(i%)], i%
      If !(sort[? a$(i)]) sort[a$(i)] = i  // A tiny bit faster...
    Next i
    If sort[? s$] Then Return sort[s$]
    Return -1
  EndIf
EndFunc

Debug window...

The string : "Hello" was randomly placed at index: 57934 out of 100001 possible indices from: 0 to: 100000.

Find Sorted & Case_Sensitive
0.0000250 15456 Hello
0.0000118 15456 Hello
0.0000086 15456 Hello

And for a million and 1 elements...

The string : "Hello" was randomly placed at index: 591716 out of 1000001 possible indices from: 0 to: 1000000.

Find Sorted & Case_Sensitive
0.0000232 152126 Hello
0.0000150 152126 Hello
0.0000087 152126 Hello

And for 10 million... there is a significant improvement.

The string : "Hello" was randomly placed at index: 4652507 out of 10000001 possible indices from: 0 to: 10000000.

Find Sorted & Case_Sensitive
0.0044636 1521205 Hello
0.0007453 1521205 Hello
0.0000544 1521205 Hello

Attachments:

Demo String Search 2.G32 (8.67 KB)

Last Edit: Mar 14, 2024 14:44:50 GMT 1 by (X)

I code for FUN...

Post by dragonjim on Mar 7, 2024 13:07:03 GMT 1

Post by (X) on Mar 7, 2024 13:43:51 GMT 1

Post by dragonjim on Mar 7, 2024 15:03:48 GMT 1

Post by dragonjim on Mar 7, 2024 15:06:43 GMT 1

Post by Roger Cabo on Mar 11, 2024 19:06:47 GMT 1

Post by dragonjim on Mar 11, 2024 23:37:54 GMT 1

Post by Roger Cabo on Mar 12, 2024 10:08:45 GMT 1

Post by Roger Cabo on Mar 12, 2024 13:44:15 GMT 1

Post by dragonjim on Mar 12, 2024 15:50:45 GMT 1

Post by Roger Cabo on Mar 12, 2024 15:53:48 GMT 1

Post by Roger Cabo on Mar 12, 2024 16:11:33 GMT 1

Post by dragonjim on Mar 12, 2024 16:28:33 GMT 1

Post by dragonjim on Mar 12, 2024 16:37:28 GMT 1

Post by (X) on Mar 12, 2024 20:18:17 GMT 1

Post by (X) on Mar 13, 2024 16:31:22 GMT 1