Things i need to do or think about:

  • Politeness
    • Robots.txt (every running session i save robots.txt)
    • Url prioritization
      • Random
      • Mercator
        • Front queue
        • Back Queue keeping track of when i last requested it, that would allow me to only grab stuff i am allowed to queue right now. so i can just ddos whatever i can as long as i keep timer
      • Breadth fist search (bad because it ddosses the website)
      • Depth first search (bad because it still can ddos, and also grabs bad results unless you index the entire web)
      • [ ]