Think about the design choices of how to make the library easily extendable. For example, make the query to accept as an argument list of function to process text and images. For example, text handlers can accept HTML of the page and its URL as an input, and then return some key-value pair to be added to the dataset.
With that approach, if a user wants to parse additional field he would only need to define the function which with appropriate parsing and pass it as a parameter to query function, where all the meaty and common processing is done. With that approach, the user can select what to download by modifying the list of pre-created handlers for wikitext or caption parsing. Also, we could have designed an approach to uniformly pass cache-related parameters to such functions.
Might be a very good idea but requires tons of work. Will probably be suspended until some reasonable interest to the script appears.
Think about the design choices of how to make the library easily extendable. For example, make the query to accept as an argument list of function to process text and images. For example, text handlers can accept HTML of the page and its URL as an input, and then return some key-value pair to be added to the dataset.
With that approach, if a user wants to parse additional field he would only need to define the function which with appropriate parsing and pass it as a parameter to
queryfunction, where all the meaty and common processing is done. With that approach, the user can select what to download by modifying the list of pre-created handlers for wikitext or caption parsing. Also, we could have designed an approach to uniformly pass cache-related parameters to such functions.Might be a very good idea but requires tons of work. Will probably be suspended until some reasonable interest to the script appears.