We could consume various analyses of Stack Exchange data to make recommendations.
Possible implementation strategy: build on a version of GPT fine-tuned on SO Q&A tasks
Could we set up a simple version of GPT trained on Stack Overflow data, just to get it working? Then think about how to get a learning loop set up to improve the results...
Could this at least help a human navigate the questions on Stack Exchange?
Rather than just answering the question, generate the answer and use that to guide search (by combining generation with document similarity)
Use a distance to set up a margin of tolerance
Stack Roboflow creates ersatz Q&A using
AWD_LTSM. Surely we can do better?
In Google Books, they use crappy OCR which is good enough for search, but you wouldn't want to read the output. For search, they use something like rewrite distance, finding something ‘within 5 errors’.
In parsing, it's not just edit distance but has to involve the grammar
Case against going too deep:
Code generation is hard
Case against worrying about that:
Worry instead about applications like generating learning packets
E.g., learn everything there is to know about
gitfrom Stack Overflow in a nicely organised way.
E.g., compare the Schuam’s Outline series: could we reassemble open source clones of Schuam’s Outlines by retrieving contents from Math.Stack Exchange?
Application of the model: Display SO with similarity graph
E.g., use generated answers to help identify ‘similarity’.
https://github.com/stared/tag-graph-map-of-stackexchange/wiki presents a nice-looking map of the relationship between tags.