programminghistorian
diff --git a/‎en/lessons/basic-text-processing-in-r.md‎
Lines changed: 1 addition & 1 deletion b/‎en/lessons/basic-text-processing-in-r.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎en/lessons/beginners-guide-to-twitter-data.md‎
Lines changed: 1 addition & 1 deletion b/‎en/lessons/beginners-guide-to-twitter-data.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎en/lessons/correspondence-analysis-in-R.md‎
Lines changed: 1 addition & 1 deletion b/‎en/lessons/correspondence-analysis-in-R.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎en/lessons/creating-apis-with-python-and-flask.md‎
Lines changed: 1 addition & 1 deletion b/‎en/lessons/creating-apis-with-python-and-flask.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎en/lessons/detecting-text-reuse-with-passim.md‎
Lines changed: 1 addition & 1 deletion b/‎en/lessons/detecting-text-reuse-with-passim.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎en/lessons/installing-python-modules-pip.md‎
Lines changed: 1 addition & 1 deletion b/‎en/lessons/installing-python-modules-pip.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎en/lessons/research-data-with-unix.md‎
Lines changed: 2 additions & 2 deletions b/‎en/lessons/research-data-with-unix.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎en/lessons/sonification.md‎
Lines changed: 3 additions & 3 deletions b/‎en/lessons/sonification.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎en/lessons/sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md‎
Lines changed: 1 addition & 1 deletion b/‎en/lessons/sustainable-authorship-in-plain-text-using-pandoc-and-markdown.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎en/lessons/topic-modeling-and-mallet.md‎
Lines changed: 1 addition & 1 deletion b/‎en/lessons/topic-modeling-and-mallet.md‎
Lines changed: 1 addition & 1 deletion
@@ -646,7 +646,7 @@ Many generic tutorials exist for all three of these, as well as extensive packag
 
 [^3]: All Presidential State of the Union Addresses were downloaded from The American Presidency Project at the University of California Santa Barbara. (Accessed 2016-11-11) [http://www.presidency.ucsb.edu/sou.php](http://www.presidency.ucsb.edu/sou.php).
 
-[^4]: Peter Norvig. "Google Web Trillion Word Corpus". (Accessed 2016-11-11) [http://norvig.com/ngrams/](http://norvig.com/ngrams/).
+[^4]: Peter Norvig. "Google Web Trillion Word Corpus". (Accessed 2016-11-11) [http://norvig.com/ngrams/](https://web.archive.org/web/20260326183858/http://norvig.com/ngrams/).
 
 [^5]: This does happen for a few written State of the Union addresses, where a long bulleted list gets parsed into one very long sentence.
 
 
@@ -40,7 +40,7 @@ While this walkthrough proposes a specific workflow that we think is suitable fo
 
 First, we need to gather some data. George Washington University’s [TweetSets](https://tweetsets.library.gwu.edu/) allows you to create your own data queries from existing Twitter datasets they have compiled. The datasets primarily focus on the biggest (mostly American) geopolitical events of the last few years, but the TweetSets website states they are also open to queries regarding the construction of new datasets.  We chose TweetSets because it makes narrowing and cleaning your dataset very easy, creating stable, archivable datasets through a relatively simple graphical interface. Additionally, this has the benefit of allowing you to search and analyze the data with your own local tools, rather than having your results shaped by Twitter search algorithms that may prioritize users you follow, etc.
 
-You could, however, substitute any tool that gives you a set of dehydrated tweets. Because tweets can be correlated to so much data, it’s more efficient to distribute dehydrated data sets consisting of unique tweet IDs, and then allow users to “hydrate” the data, linking retweet counts, geolocation info, etc., to unique IDs. More importantly, [Twitter's terms for providing downloaded content to third parties](https://web.archive.org/web/20190927151316/https://developer.twitter.com/en/developer-terms/agreement-and-policy), as well as research ethics, are at play.   Other common places to acquire dehydrated datasets include Stanford’s [SNAP](https://snap.stanford.edu/data/) collections, the [DocNow Project](https://www.docnow.io) and data repositories, or the [Twitter Application Programming Interface (API)](https://developer.twitter.com/), directly. (If you wonder what an API is, please check this [lesson](/en/lessons/introduction-to-populating-a-website-with-api-data#what-is-application-programming-interface-api).) This latter option will require some coding, but Justin Littman, one of the creators of TweetSets, does a good job summarizing some of the higher-level ways of interacting with the API in this [post](https://gwu-libraries.github.io/sfm-ui/posts/2017-09-14-twitter-data).
+You could, however, substitute any tool that gives you a set of dehydrated tweets. Because tweets can be correlated to so much data, it’s more efficient to distribute dehydrated data sets consisting of unique tweet IDs, and then allow users to “hydrate” the data, linking retweet counts, geolocation info, etc., to unique IDs. More importantly, [Twitter's terms for providing downloaded content to third parties](https://web.archive.org/web/20190927151316/https://developer.twitter.com/en/developer-terms/agreement-and-policy), as well as research ethics, are at play.   Other common places to acquire dehydrated datasets include Stanford’s [SNAP](https://snap.stanford.edu/data/) collections, the [DocNow Project](https://web.archive.org/web/20260316082621/https://www.docnow.io/) and data repositories, or the [Twitter Application Programming Interface (API)](https://developer.twitter.com/), directly. (If you wonder what an API is, please check this [lesson](/en/lessons/introduction-to-populating-a-website-with-api-data#what-is-application-programming-interface-api).) This latter option will require some coding, but Justin Littman, one of the creators of TweetSets, does a good job summarizing some of the higher-level ways of interacting with the API in this [post](https://gwu-libraries.github.io/sfm-ui/posts/2017-09-14-twitter-data).
 
 We find that the graphical, web-based nature of TweetSets, however, makes it ideal for learning this process. That said, if you want to obtain a dehydrated dataset by other means, you can just start at the [Hydrating](/en/lessons/beginners-guide-to-twitter-data#hydrating) section.
 
 
@@ -398,7 +398,7 @@ JUST    0.408    0.000    0.000
 
 The normalisation process does something interesting. Those who are members of multiple committees and/or who belong to committees with many members will tend to have normalisation scores that are lower, suggesting that they are more central to the network. These members will be put closer to the centre of the matrix. For example, the cell belonging to S Ambler and IWFA has the lowest score of 0.192 because S Ambler is a member of three committees and the IWFA committee has nine members in the graph represented.
 
-The next stage is to find the singular value decomposition of this normalised data. This involves fairly complex linear algebra that will not be covered here, but you can learn more from [these Singular Value Decomposition lecture notes](https://math.mit.edu/classes/18.095/2016IAP/lec2/SVD_Notes.pdf) or in more detail from [this pdf file on SVD](http://davetang.org/file/Singular_Value_Decomposition_Tutorial.pdf). I will try to summarize what happens in lay terms.
+The next stage is to find the singular value decomposition of this normalised data. This involves fairly complex linear algebra that will not be covered here, but you can learn more from [these Singular Value Decomposition lecture notes](https://math.mit.edu/classes/18.095/2016IAP/lec2/SVD_Notes.pdf) or in more detail from [this pdf file on SVD](https://perma.cc/F7MJ-EGET). I will try to summarize what happens in lay terms.
 
 * Two new matrices are created that show "dimension" scores for the rows (committees) and the columns (MPs) based on eigenvectors.
 * The number of dimensions is equal to the size of the columns or rows minus 1, which ever is smaller. In this case, there are five committees compared to the MPs eleven, so the number of dimensions is 4.
 
@@ -492,7 +492,7 @@ After incorporating these design improvements, a request to our API might look l
 
 Without documentation, even the best-designed API will be unusable. Your API should have documentation describing the resources or functionality available through your API that also provides concrete working examples of request URLs or code for your API. You should have a section for each resource that describes which fields, such as `id` or `title`, it accepts. Each section should have an example in the form of a sample HTTP request or block of code.
 
-A fairly common practice in documenting APIs is to provide annotations in your code that are then automatically collated into documentation using a tool such as [Doxygen](http://www.doxygen.org/) or [Sphinx](http://www.sphinx-doc.org/en/stable/). These tools create documentation from **docstrings**—comments you make on your function definitions. While this kind of documentation is a good idea, you shouldn't consider your job done if you've only documented your API to this level. Instead, try to imagine yourself as a potential user of your API and provide working examples. In an ideal world, you would have three kinds of documentation for your API: a reference that details each route and its behavior, a guide that explains the reference in prose, and at least one or two tutorials that explain every step in detail.
+A fairly common practice in documenting APIs is to provide annotations in your code that are then automatically collated into documentation using a tool such as [Doxygen](https://www.doxygen.nl/) or [Sphinx](http://www.sphinx-doc.org/en/stable/). These tools create documentation from **docstrings**—comments you make on your function definitions. While this kind of documentation is a good idea, you shouldn't consider your job done if you've only documented your API to this level. Instead, try to imagine yourself as a potential user of your API and provide working examples. In an ideal world, you would have three kinds of documentation for your API: a reference that details each route and its behavior, a guide that explains the reference in prose, and at least one or two tutorials that explain every step in detail.
 
 For inspiration on how to approach API documentation, see the [New York Public Library Digital Collections API](http://api.repo.nypl.org/), which sets a standard of documentation achievable for many academic projects. For an extensively documented (though sometimes overwhelming) API, see the [MediaWiki Action API](https://www.mediawiki.org/wiki/API:Main_page), which provides documentation to users who pass partial queries to the API. (In our example above, we returned an error on a partial query.) For other professionally maintained API documentation examples, consider the [World Bank API](https://datahelpdesk.worldbank.org/knowledgebase/articles/889392-api-documentation), the various [New York Times APIs](https://developer.nytimes.com/), or the [Europeana Pro API](https://pro.europeana.eu/resources/apis).
 
 
@@ -906,4 +906,4 @@ MR gratefully acknowledges the financial support of the Swiss National Science F
 9. Hannu Salmi, Heli Rantala, Aleksi Vesanto, Filip Ginter. The long-term reuse of text in the Finnish press, 1771–1920. **2364**, 394–544 In *CEUR Workshop Proceedings*. (2019).
 10. Axel J Soto, Abidalrahman Mohammad, Andrew Albert, Aminul Islam, Evangelos Milios, Michael Doyle, Rosane Minghim, Maria Cristina de Oliveira. Similarity-Based Support for Text Reuse in Technical Writing. 97–106 In *Proceedings of the 2015 ACM Symposium on Document Engineering*. ACM, 2015. [Link](http://dx.doi.org/10.1145/2682571.2797068)
 11. Alexandra Schofield, Laure Thompson, David Mimno. Quantifying the Effects of Text Duplication on Semantic Models. 2737–2747 In *Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing*. Association for Computational Linguistics, 2017. [https://doi.org/10.18653/v1/D17-1290](https://perma.cc/KSK6-5TXP)
-12. Matteo Romanello, Aurélien Berra, Alexandra Trachsel. Rethinking Text Reuse as Digital Classicists. *Digital Humanities conference*, 2014. [Link](https://wiki.digitalclassicist.org/Text_Reuse)
+12. Matteo Romanello, Aurélien Berra, Alexandra Trachsel. Rethinking Text Reuse as Digital Classicists. *Digital Humanities conference*, 2014. [Link](https://web.archive.org/web/20140829121705/https://wiki.digitalclassicist.org/Text_Reuse)
@@ -30,7 +30,7 @@ Lesson Goals
 
 This lesson shows you how to download and install Python modules. There
 are many ways to install external modules, but for the purposes of this
-lesson, we’re going to use a program called pip, easily installable on [mac/linux](https://pip.pypa.io/en/stable/) and [windows]( https://sites.google.com/site/pydatalog/python/pip-for-windows). As of Python 2.7.9 and newer, pip is installed by default. This tutorial will be helpful for anyone using older versions of Python (which are still quite common).
+lesson, we’re going to use a program called pip, easily installable on [mac/linux](https://pip.pypa.io/en/stable/) and [windows](https://pydatalog.readthedocs.io/en/latest/installation/#using-pip). As of Python 2.7.9 and newer, pip is installed by default. This tutorial will be helpful for anyone using older versions of Python (which are still quite common).
 
 Introducing Modules
 -------------------
 
@@ -128,8 +128,8 @@ _____
 
 In this lesson you have learnt to undertake some basic file counting, to query across research data for common strings, and to save results and derived data. Though this lesson is restricted to using the Unix shell to count and mine tabulated data, the processes can be easily extended to free text. For this we recommend two guides written by William Turkel:
 
-- William Turkel, '[Basic Text Analysis with Command Line Tools in Linux](http://williamjturkel.net/2013/06/15/basic-text-analysis-with-command-line-tools-in-linux/)' (15 June 2013)
-- William Turkel, '[Pattern Matching and Permuted Term Indexing with Command Line Tools in Linux](http://williamjturkel.net/2013/06/20/pattern-matching-and-permuted-term-indexing-with-command-line-tools-in-linux/)' (20 June 2013)
+- William Turkel, '[Basic Text Analysis with Command Line Tools in Linux](https://web.archive.org/web/20140925220046/http://williamjturkel.net/2013/06/15/basic-text-analysis-with-command-line-tools-in-linux/)' (15 June 2013)
+- William Turkel, '[Pattern Matching and Permuted Term Indexing with Command Line Tools in Linux](https://web.archive.org/web/20200925054120/http://williamjturkel.net/2013/06/20/pattern-matching-and-permuted-term-indexing-with-command-line-tools-in-linux/)' (20 June 2013)
 
 As these recommendations suggest, the present lesson only scratches the surface of what the Unix shell environment is capable of. It is hoped, however, that this lesson has provided a taster sufficient to prompt further investigation and productive play.
 
 
@@ -195,7 +195,7 @@ Let us assume that you have a historic diary to which you've fitted a [topic mod
 Installing miditime is straightforward using [pip](/lessons/installing-python-modules-pip):
 
 `$ pip install miditime` or `$ sudo pip install miditime` for a Mac or Linux machine;
-`> python pip install miditime` on a Windows machine. (Windows users, if the instructions above didn't quite work for you, you might want to try [this helper program](https://sites.google.com/site/pydatalog/python/pip-for-windows) instead to get Pip working properly on your machine).
+`> python pip install miditime` on a Windows machine. (Windows users, if the instructions above didn't quite work for you, you might want to try [this helper program](https://pydatalog.readthedocs.io/en/latest/installation/#using-pip) instead to get Pip working properly on your machine).
 
 ### Practice
 Let us look at the sample script provided. Open your text editor, and copy and paste the sample script in:
@@ -240,7 +240,7 @@ Baa, Baa, black, sheep, have, you, any, wool?
 
 Can you make your computer play this song? (This [chart](https://web.archive.org/web/20171211192102/http://www.electronics.dit.ie/staff/tscarff/Music_technology/midi/midi_note_numbers_for_octaves.htm) will help).
 
-**By the way** There is a text file specification for describing music called '[ABC Notation](http://abcnotation.com/wiki/abc:standard:v2.1)'. It is beyond us for now, but one could write a sonification script in say a spreadsheet, mapping values to note names in the ABC specification (if you've ever used an IF - THEN in Excel to convert percentage grades to letter grades, you'll have a sense of how this might be done) and then using a site like [this one](http://trillian.mit.edu/~jc/music/abc/ABCcontrib.html) to convert the ABC notation into a .mid file.
+**By the way** There is a text file specification for describing music called '[ABC Notation](https://web.archive.org/web/20160617203735/http://abcnotation.com/wiki/abc:standard:v2.1)'. It is beyond us for now, but one could write a sonification script in say a spreadsheet, mapping values to note names in the ABC specification (if you've ever used an IF - THEN in Excel to convert percentage grades to letter grades, you'll have a sense of how this might be done) and then using a site like [this one](http://trillian.mit.edu/~jc/music/abc/ABCcontrib.html) to convert the ABC notation into a .mid file.
 
 ### Getting your own data in
 
@@ -439,7 +439,7 @@ The code is pretty clear: loop the 'bd_boom' sample with the reverb sound effect
 
 By the way, 'live-coding'? What makes this a 'live-coding' environment is that you can make changes to the code _while Sonic Pi is turning it into music_. Don't like what you're hearing? Change the code up on the fly!
 
-For more on Sonic Pi, [this workshop website](https://www.miskatonic.org/music/access2015/) is a good place to start. See also Laura Wrubel's [report on attending that workshop, and her and her colleague's work in this area](http://library.gwu.edu/scholarly-technology-group/posts/sound-library-work).
+For more on Sonic Pi, [this workshop website](https://web.archive.org/web/20150907155822/https://www.miskatonic.org/music/access2015/) is a good place to start. See also Laura Wrubel's [report on attending that workshop, and her and her colleague's work in this area](http://library.gwu.edu/scholarly-technology-group/posts/sound-library-work).
 
 # Nihil Novi Sub Sole
 Again, lest we think that we are at the cutting edge in our algorithmic generation of music, a salutary reminder was published in 1978 on 'dice music games' of the eighteenth century, where rolls of the dice determined the recombination of pre-written snippets of music. [Some of these games have been explored and re-coded for the Sonic-Pi by Robin Newman](https://rbnrpi.wordpress.com/project-list/mozart-dice-generated-waltz-revisited-with-sonic-pi/). Newman also uses a tool that could be described as Markdown+Pandoc for musical notation, [Lilypond](http://www.lilypond.org/) to score these compositions. The antecedents for everything you will find at _The Programming Historian_ are deeper than you might suspect!
 
@@ -704,7 +704,7 @@ according to [this blog
 post](http://web.archive.org/web/20140120195538/http://mashable.com/2013/06/24/markdown-tools/))
 other, Markdown-specific alternatives to MS Word are available online,
 and often free of cost. From the standalone ones, we liked
-[Mou](http://mouapp.com/), [Write Monkey](http://writemonkey.com), and
+[Mou](http://mouapp.com/), [Write Monkey](https://web.archive.org/web/20260327163157/http://writemonkey.com/), and
 [Sublime Text](http://www.sublimetext.com/). Several web-based platforms
 have recently emerged that provide slick, graphic interfaces for
 collaborative writing and version tracking using Markdown. These
 
@@ -468,7 +468,7 @@ This command
 -   opens your `tutorial.mallet` file
 -   trains MALLET to find 20 topics
 -   outputs every word in your corpus of materials and the topic it
-    belongs to into a compressed file (`.gz`; see www.gzip.org on how to
+    belongs to into a compressed file (`.gz`; see [www.gzip.org](https://web.archive.org/web/20260607182747/https://gzip.org/) on how to
     unzip this)
 -   outputs a text document showing you what the top key words are for
     each topic (`tutorial_keys.txt`)