Need to crawl articles on the site, similar to Baidu Library, articles with format, and there are forms and so on. I can only get plain text,
It is troublesome to read CSS style to set the format of each line.
Is there any good way? Is there a tool to convert word from HTML available?