undefinedfix
Sign in

Two lists, how to judge whether the elements in a exist in the elements in B

Ganesh edited in Sat, 13 Aug 2022

Question: List_ a = [key1,key2,key3...keyn]list_ b = ['1234343key1','weewqsfsdfkey2',........'lkadsadsadsa']

list_ a. Can be understood as a set of keywords; list_ b. It can be understood as a collection of long strings. Each element is relatively long and may include a list_ The elements in a may not be included.

How to judge a list_ Is the element in a included in the list_ B. The most common method is the double-layer for loop

    for b in list_b:
        for a in list_a:
            if a in b:
                print(a)

The time complexity of this method is O (m * n), and large amount of data is GG. So it's a fast algorithm.

8 Replies
mortgagemanagement
commented on Sun, 14 Aug 2022

It seems that most of the answers do not solve the problem of time complexity. We ignore the complexity of string matching when we consider reducing the loop level. Hypothesis:

len(key) == 3
len(long_str1) == 10
len(long_str2) == 10

Which group is more time-consuming? one

key in long_str1
key in long_str2

two

key in (long_str1 + long_str2)

You must have the answer in your mind. The first group matches two strings for (10-3) * 2 rounds of comparison. The second group is (10 * 2) - 3 rounds, and long_ str1 + long_ STR2 produces a new substring, which is logically wrong.

I think there is a solution to the problem of building owners with lower time complexity, but space should be sacrificed. At worst, list_ Each long in B_ The str element has to be key in long n times_ STR judges that the larger the n is, the more time-consuming it is; conversely, the key should be the same as each long_ STR for key in long_ STR operation

Put all the long in advance_ All the substrings of STR are put into the hash table to traverse a list_ B and the key calculation is matched one by one to reduce the complexity to o (1) hash query operation. Let's prepare for Python to find all the substrings of the string set

After reading the above link, you know how to put the list_ B to a hash is not difficult (we use set)

set_a = {s[i:j+1] for s in list_b for i in range(len(s)) for j in range(i, len(s))}
for i in list_a:
    if i in set_a:
        print(i)

Because for each long_ STR, all of which have factorial substrings of their length_ A will be big. If the problem is fixed length, it's better.

FactorVIII
commented on Sun, 14 Aug 2022
list(set(a) & set(b))
wareyon
commented on Sun, 14 Aug 2022

We can consider using HashSet to store keyword set, and the time complexity can be reduced to o (n). Or if the keyword set can guarantee order, you can use dichotomy. It's a little better than direct traversal.

If you just judge whether it contains the same element, you can return it immediately as long as you detect the existence of the same element. You don't need to go through all the remaining elements

Mahdi
commented on Sun, 14 Aug 2022

If the elements are judged to be the same, set can be used naturally, but it is not applicable here. If you need to match keywords, simply merge the strings first (regardless of the case that each continuous string has a part to form keywords.)

res = [i for i in list_a if i in ''.join(list_b)]

Or use regular:

import re

res = re.findall('|'.join(list_a), ''.join(list_b))

If the string is too long or needs to be semantically accurate, you can start with the list_ For example, ['mayor '] does not match ['nanjing Yangtze River Bridge'] above, only from the perspective of application

IamTrying
commented on Sun, 14 Aug 2022

The python level operation is transferred to the built-in function contain (i.e. in keyword) compiled by C. the code is as follows.

>>> list_a = ['key1', 'key2', 'key3']
>>> list_b = ['1234343key1', 'weewqsfsdfkey2', 'lkadsadsadsakey3']
>>> long_str=', '.join(list_b)
>>> long_str
'1234343key1, weewqsfsdfkey2, lkadsadsadsakey3'
>>> for key in list_a:
...     if key in long_str:
...         print(key)
... 
key1
key2
key3

It should be noted that the above code may not be efficient in algorithm, but the actual execution efficiency should be higher.

butch7903
commented on Mon, 15 Aug 2022

For comparison of array elements, please refer to arraydiffby

comul
commented on Mon, 15 Aug 2022

You can add the list_ B to set; then traverse the list_ A call set.add If it is false, the element of a exists in B.

Set<String> set = new HashSet(list_b);
for(String str : list_a) {
    if (!set.add(str)) {
      // 说明存在,直接break
      break;
    }
}
lfgup
commented on Mon, 15 Aug 2022

You can add the list_ B = ['1234343key1 ','weewqsfsdfkey2',...'lkadsadsadsadsa '] all strings of digits are connected: "1234343key1, weewqsfsdfkey2,...." then lista uses n elements to traverse listb string from front to back at the same time, and matches by characters