undefinedfix
Sign in

How does Python quickly detect the validity of URL (50W +) and resolve IP address region?

bmartindcs edited in Tue, 24 May 2022

The URL is stored in the text (CSV). It needs to detect the validity of the URL, resolve the IP address and its corresponding physical location, and then append the result to the URL line

sample data

1,www.qq.com,腾讯
2,www.baidu.com,百度
.
.
.

Expected results

1,www.qq.com,腾讯,有效,61.129.7.47,上海市电信
2,www.baidu.com,百度,有效,14.215.177.39,广东省广州市 北京百度网讯科技有限公司电信节点
.
.
.

The first mock exam of Python is to read URL in blocks, then detect the validity and parse IP by URL. But this single mode detection is slow to drive the task of 50W+ quantity. Moreover, I don't understand the multi process and the association process very well, and I hope to learn the multi process and the association through this example.

Hope to answer the boss who left ideas and code comments!!

2 Replies
Luis
commented on Tue, 24 May 2022

This answer may help you

https://stackoverflow.com/a/1...

And this

https://github.com/lorenzog/d...

Mahdi
commented on Tue, 24 May 2022

First of all, the URL of such a large factory often corresponds to many IP addresses. If you don't mind, you can do so. Build a function, the function parameter is the URL address, return city information. Parse the file, extract the URL and store it in the list. Get the return value and write to the file. Suppose you have completed the second and third steps

import aiohttp
import asyncio
import aiofiles

async def foo(url):
    params={'url': url}
    async with aiohttp.ClientSession() as session:
        async with session.get(url='http://httpbin.org/get', params=params):
            pass

if __name__ == "__main__":
    urls = list()
    loop = asyncio.get_event_loop()
    tasks = [foo(url=url) for url in urls]
    loop.run_until_complete(asyncio.wait(tasks))
    loop.close()
lock This question has been locked and the reply function has been disabled.