Jekyll2024-02-01T00:03:33-06:00https://eligurarie.github.io/feed.xmlGurarie LabResearch and teaching hub for the Gurarie Lab at SUNY-ESF. Elie Gurarieegurarie@esf.eduCircumpolar mapping (by Chloe Beaupré)2023-09-23T00:00:00-05:002023-09-23T00:00:00-05:00https://eligurarie.github.io/Arctic-Projection<blockquote style="font-size: 80%">
<p>It’s been over three years since a blog post. Does that even count as a blog? Since the last entry, I’ve become a professor. Which (a) takes a big old chunk out of time to do anything else, but (b) provides access to a hitherto unavailable resource known as <strong>graduate students</strong> (a subset of the spectacular <a href="https://eligurarie.github.io/_pages/labmembers/">lab members</a> phenomenon). One of these, <strong>Chloe Beaupré</strong>, provides the following hot tips for dealing with some very particular mapping issues in R.</p>
</blockquote>
<h2 id="background">background</h2>
<p>For a presentation I wanted to create a plot of the Arctic, showing the circumpolar distribution of <em>Rangifer</em> populations next to a map of a map of Arctic Indigenous languages. Turns out mapping a circumpolar projection in R is not as easy a making a pretty map of a smaller area at lower latitudes. Here is an outline to recreate (one of) these plots.</p>
<p><img src="../assets/post04/HerdsLanguages.png" alt="" /></p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">packages</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="s2">"dplyr"</span><span class="p">,</span><span class="s2">"sf"</span><span class="p">,</span><span class="w"> </span><span class="s2">"ggplot2"</span><span class="p">,</span><span class="s2">"mapview"</span><span class="p">,</span><span class="w"> </span><span class="s2">"viridis"</span><span class="p">,</span><span class="s2">"maptools"</span><span class="p">,</span><span class="w"> </span><span class="s2">"raster"</span><span class="p">)</span><span class="w">
</span><span class="n">sapply</span><span class="p">(</span><span class="n">packages</span><span class="p">,</span><span class="w"> </span><span class="n">require</span><span class="p">,</span><span class="w"> </span><span class="n">character</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<h2 id="shapefiles">shapefiles</h2>
<p>We’ll plot the map of arctic Indigenous Peoples languages and dialects since the data are made publicly available by the <a href="https://arctic-indigenous-languages-uito.hub.arcgis.com/">Arctic Indigenous languages and revitalization project</a>. Shapefiles can be downloaded <a href="https://arctic-indigenous-languages-uito.hub.arcgis.com/datasets/UITO::arctic-indigenous-peoples-languages-and-revitalization-languages-and-dialects/explore">here</a>.</p>
<p>After downloading the shapefiles, read them into your environment and re-project to the EPSG 3995 projection (arctic polar stereographic) using the <code class="language-plaintext highlighter-rouge">sf</code> package</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">lang</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">st_read</span><span class="p">(</span><span class="s2">"data/Arctic_Indigenous_Peoples_languages_and_revitalization_-_Languages_and_Dialects.shp"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">st_transform</span><span class="p">(</span><span class="n">st_crs</span><span class="p">(</span><span class="m">3995</span><span class="p">))</span><span class="w">
</span></code></pre></div></div>
<h2 id="mapview-fails-us">mapview fails us</h2>
<p>We usually love plotting spatial data using the <code class="language-plaintext highlighter-rouge">mapview</code> package because it’s fast and interactive but it defaults to the dreaded Mercator projection.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mapview</span><span class="p">(</span><span class="n">lang</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="../assets/post04/mapview1.png" alt="" /></p>
<p><em>[ed. note - interactivity of mapview disabled on blog]</em></p>
<p>Looks horrifying.</p>
<p><img src="../assets/post04/DreadedMercator.jpg" alt="" /></p>
<p>There is an argument in mapview to use the native CRS (which we’ve transformed to Arctic Polar Stereographic EPSG 3995), let’s try it out.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mapview</span><span class="p">(</span><span class="n">lang</span><span class="p">,</span><span class="w"> </span><span class="n">native.crs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">map.types</span><span class="o">=</span><span class="s2">"Esri.WorldImagery"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="../assets/post04//mapview2.png" alt="" /></p>
<p>This looks better! But there isn’t a basemap for the North Pole so we have our language polygons floating in the ether. We can do better.</p>
<h2 id="coastlines">coastlines</h2>
<p>The <code class="language-plaintext highlighter-rouge">maptools</code> package provides a simple whole-world coastline data set. We set the y-limit to 45 degrees North, so only keep the coastlines in the north.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="p">(</span><span class="s2">"wrld_simpl"</span><span class="p">,</span><span class="w"> </span><span class="n">package</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"maptools"</span><span class="p">)</span><span class="w">
</span><span class="n">w</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">raster</span><span class="o">::</span><span class="n">crop</span><span class="p">(</span><span class="n">wrld_simpl</span><span class="p">,</span><span class="w"> </span><span class="n">extent</span><span class="p">(</span><span class="m">-180</span><span class="p">,</span><span class="w"> </span><span class="m">180</span><span class="p">,</span><span class="w"> </span><span class="m">45</span><span class="p">,</span><span class="w"> </span><span class="m">90</span><span class="p">))</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">w</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="../assets/post04/worldCoastlines-1.png" style="display: block; margin: auto;" /></p>
<p>After downloading the northern coastlines, convert to a simple feature multipolygon, then re-project to the arctic polar stereographic (EPSG 3995).</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># make it into an sf object, transform the crs</span><span class="w">
</span><span class="n">w_sf</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">st_as_sf</span><span class="p">(</span><span class="n">coords</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"long"</span><span class="p">,</span><span class="w"> </span><span class="s2">"lat"</span><span class="p">),</span><span class="w"> </span><span class="n">crs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4326</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">st_transform</span><span class="p">(</span><span class="n">st_crs</span><span class="p">(</span><span class="m">3995</span><span class="p">))</span><span class="w">
</span></code></pre></div></div>
<h2 id="putting-it-all-together">putting it all together</h2>
<p>The following code uses <code class="language-plaintext highlighter-rouge">ggplot2</code> to plot the arctic coastlines and overlays the arctic Indigenous Peoples languages shapefile.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ggplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_sf</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">w_sf</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"grey"</span><span class="p">,</span><span class="w"> </span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"black"</span><span class="p">,</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">geom_sf</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lang</span><span class="p">[</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">lang</span><span class="o">$</span><span class="n">LangFamily</span><span class="p">),],</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">LangFamily</span><span class="p">),</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.8</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">scale_fill_viridis_d</span><span class="p">()</span><span class="o">+</span><span class="w">
</span><span class="n">scale_x_continuous</span><span class="p">(</span><span class="n">breaks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Language family"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
</span><span class="n">theme</span><span class="p">(</span><span class="n">axis.text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">axis.ticks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">panel.background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_blank</span><span class="p">(),</span><span class="w">
</span><span class="n">text</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">12</span><span class="p">),</span><span class="w">
</span><span class="n">legend.key.size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unit</span><span class="p">(</span><span class="m">.2</span><span class="p">,</span><span class="w"> </span><span class="s1">'cm'</span><span class="p">))</span><span class="w"> </span><span class="c1"># make the legend smaller</span><span class="w">
</span></code></pre></div></div>
<p><img src="../assets/post04/plotLanguages-1.png" style="display: block; margin: auto;" /></p>
<p>The remaining code creates a similar plot using base R if that’s more your style (<em>cough</em> owner-of-this-blog-Elie <em>cough</em>).</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">cols</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">viridis</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="n">lang</span><span class="o">$</span><span class="n">LangFamily</span><span class="p">)))</span><span class="w">
</span><span class="n">labs</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">levels</span><span class="p">(</span><span class="n">as.factor</span><span class="p">(</span><span class="n">lang</span><span class="o">$</span><span class="n">LangFamily</span><span class="p">))</span><span class="w">
</span><span class="c1"># set margins</span><span class="w">
</span><span class="n">par</span><span class="p">(</span><span class="n">mar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">.1</span><span class="p">,</span><span class="m">4</span><span class="p">))</span><span class="w">
</span><span class="n">layout</span><span class="p">(</span><span class="n">matrix</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="m">2</span><span class="p">),</span><span class="w"> </span><span class="n">widths</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1.5</span><span class="p">,</span><span class="m">1</span><span class="p">))</span><span class="w">
</span><span class="c1"># plot the coastlines, to initiate the area</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">st_geometry</span><span class="p">(</span><span class="n">w_sf</span><span class="p">),</span><span class="w"> </span><span class="n">lwd</span><span class="o">=</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">lty</span><span class="o">=</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="w"> </span><span class="n">alpha</span><span class="p">(</span><span class="s2">"grey"</span><span class="p">,</span><span class="w"> </span><span class="m">.5</span><span class="p">),</span><span class="w"> </span><span class="n">border</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"darkgrey"</span><span class="p">)</span><span class="w">
</span><span class="c1"># plot the language data</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">lang</span><span class="p">[</span><span class="s2">"LangFamily"</span><span class="p">],</span><span class="w"> </span><span class="n">lwd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">lty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">pal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">alpha</span><span class="p">(</span><span class="n">cols</span><span class="p">,</span><span class="w"> </span><span class="m">.8</span><span class="p">),</span><span class="w"> </span><span class="n">add</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">border</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NA</span><span class="p">)</span><span class="w">
</span><span class="n">par</span><span class="p">(</span><span class="n">mar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">6</span><span class="p">,</span><span class="m">0</span><span class="p">),</span><span class="w"> </span><span class="n">xpd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NA</span><span class="p">);</span><span class="w"> </span><span class="n">plot.new</span><span class="p">()</span><span class="w">
</span><span class="n">mtext</span><span class="p">(</span><span class="n">side</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="s2">"Language family"</span><span class="p">,</span><span class="w"> </span><span class="n">font</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">adj</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w">
</span><span class="n">legend</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"topleft"</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">cex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.8</span><span class="p">,</span><span class="w"> </span><span class="n">legend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">labs</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cols</span><span class="p">,</span><span class="w"> </span><span class="n">pch</span><span class="o">=</span><span class="m">15</span><span class="p">,</span><span class="w"> </span><span class="n">pt.cex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1.5</span><span class="p">,</span><span class="w"> </span><span class="n">bty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"n"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="../assets/post04/baseRPlot-1.png" style="display: block; margin: auto;" /></p>
<ul>
<li>Chloe Beaupré</li>
</ul>
<h2 id="editorial-cough-cough-comment">editorial (<em>cough-cough</em>) comment</h2>
<p>Maybe you’re bothered by the weirdly straight polygon line in Chloe’s otherwise lovely map? Try this:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data</span><span class="p">(</span><span class="s2">"wrld_simpl"</span><span class="p">,</span><span class="w"> </span><span class="n">package</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"maptools"</span><span class="p">)</span><span class="w">
</span><span class="n">w.equator</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">raster</span><span class="o">::</span><span class="n">crop</span><span class="p">(</span><span class="n">wrld_simpl</span><span class="p">,</span><span class="w"> </span><span class="n">extent</span><span class="p">(</span><span class="m">-180</span><span class="p">,</span><span class="w"> </span><span class="m">180</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">90</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">st_as_sf</span><span class="p">(</span><span class="n">coords</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"long"</span><span class="p">,</span><span class="w"> </span><span class="s2">"lat"</span><span class="p">),</span><span class="w"> </span><span class="n">crs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4326</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">st_transform</span><span class="p">(</span><span class="n">st_crs</span><span class="p">(</span><span class="m">3995</span><span class="p">))</span><span class="w">
</span><span class="n">donut</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">cbind</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="o">:</span><span class="m">360</span><span class="p">,</span><span class="w"> </span><span class="m">360</span><span class="o">:</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">),</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="m">45</span><span class="p">,</span><span class="w"> </span><span class="m">361</span><span class="p">),</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">361</span><span class="p">)),</span><span class="w"> </span><span class="m">45</span><span class="p">)))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">st_polygon</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">st_sfc</span><span class="p">(</span><span class="n">crs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4326</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">st_transform</span><span class="p">(</span><span class="n">st_crs</span><span class="p">(</span><span class="m">3995</span><span class="p">))</span><span class="w">
</span><span class="n">fortyfive</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="o">:</span><span class="m">360</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">45</span><span class="p">,</span><span class="w"> </span><span class="m">361</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">st_linestring</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
</span><span class="n">st_sfc</span><span class="p">(</span><span class="n">crs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">4326</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">st_transform</span><span class="p">(</span><span class="n">st_crs</span><span class="p">(</span><span class="m">3995</span><span class="p">))</span><span class="w">
</span></code></pre></div></div>
<p>Now … put it all together</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">par</span><span class="p">(</span><span class="n">mar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">.1</span><span class="p">,</span><span class="m">4</span><span class="p">))</span><span class="w">
</span><span class="n">layout</span><span class="p">(</span><span class="n">matrix</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="m">2</span><span class="p">),</span><span class="w"> </span><span class="n">widths</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1.5</span><span class="p">,</span><span class="m">1</span><span class="p">))</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">fortyfive</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"darkgrey"</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">st_geometry</span><span class="p">(</span><span class="n">w.equator</span><span class="p">),</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"antiquewhite"</span><span class="p">,</span><span class="w"> </span><span class="n">add</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">border</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"darkgrey"</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">donut</span><span class="p">,</span><span class="w"> </span><span class="n">add</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">,</span><span class="w"> </span><span class="n">border</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NA</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">lang</span><span class="p">[</span><span class="s2">"LangFamily"</span><span class="p">],</span><span class="w"> </span><span class="n">lwd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">lty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">pal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">alpha</span><span class="p">(</span><span class="n">cols</span><span class="p">,</span><span class="w"> </span><span class="m">.8</span><span class="p">),</span><span class="w"> </span><span class="n">add</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">border</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NA</span><span class="p">)</span><span class="w">
</span><span class="n">par</span><span class="p">(</span><span class="n">mar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">6</span><span class="p">,</span><span class="m">0</span><span class="p">),</span><span class="w"> </span><span class="n">xpd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NA</span><span class="p">);</span><span class="w"> </span><span class="n">plot.new</span><span class="p">()</span><span class="w">
</span><span class="n">mtext</span><span class="p">(</span><span class="n">side</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="s2">"Language family"</span><span class="p">,</span><span class="w"> </span><span class="n">font</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="n">adj</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w">
</span><span class="n">legend</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"topleft"</span><span class="p">,</span><span class="w"> </span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">cex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.8</span><span class="p">,</span><span class="w"> </span><span class="n">legend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">labs</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cols</span><span class="p">,</span><span class="w"> </span><span class="n">pch</span><span class="o">=</span><span class="m">15</span><span class="p">,</span><span class="w"> </span><span class="n">pt.cex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1.5</span><span class="p">,</span><span class="w"> </span><span class="n">bty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"n"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p><img src="../assets/post04/finalmap.png" alt="" /></p>
<p>Do <em>that</em> in ggplot!</p>Chloe BeaupréIt’s been over three years since a blog post. Does that even count as a blog? Since the last entry, I’ve become a professor. Which (a) takes a big old chunk out of time to do anything else, but (b) provides access to a hitherto unavailable resource known as graduate students (a subset of the spectacular lab members phenomenon). One of these, Chloe Beaupré, provides the following hot tips for dealing with some very particular mapping issues in R.The trees will be our eyes2020-07-14T00:00:00-05:002020-07-14T00:00:00-05:00https://eligurarie.github.io/Trees-Will-Be-Our-Eyes<blockquote style="font-size: 80%">
<p>The following is reprinted (with permission) from the <a href="https://www.huyckpreserve.org/uploads/2/4/5/6/24560510/spring_newsletter_2019.pdf"><em>Spring 2019 edition of the Myosotis Messenger</em></a> - the bi-annual newsletter of the <a href="https://www.huyckpreserve.org/">Edmund Niles Huyck Preserve</a>, a lovely tucked-away protected area in upstate New York with a <a href="https://www.huyckpreserve.org/our-history.html">long history as one of the oldest biological research stations in the U.S.</a> The preserve also offers some <a href="https://www.huyckpreserve.org/huyck-research-grants.html">research grants</a> - one of which was awarded to <a href="https://scottlapoint.weebly.com/">Scott LaPoint</a> and myself for a camera-trapping and snow-tracking study. This short essay was written as the outreach component of receiving that grant.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
</blockquote>
<p style="text-align: center;"><img src="../../assets/post03/waterfall.jpg" alt="waterfalls" height="100px" width="400px" /></p>
<p style="text-align: center; font-size: 80%"><strong>waterfalls</strong></p>
<p>When I first set foot in the Edmund Niles Huyck Preserve (or, more precisely, set ski) it is on a clear day in February after several days of steady snowfall. There’s nothing quite like being on new snow on a new day in a new place, the quiet of the forest brushed only by the swoosh of the skis, the never-quite-knowing what comes after the next curve.</p>
<p>After a steep and woolly ascent along the waterfall and some bobbing through hardwoods, I find myself cutting across Lake Myosotis towards yellow cattail stems poking through the ice on the northern shore. Near the bank, my ski suddenly sticks, and – after a brief moment – I feel cold fingers of water tickling my ankles. Trying to steady myself, I topple. Now my gloved hand is firmly shaking hands with the water below. This,the north end of the lake, is where the feeder creek flows in to the lake, and it hasn’t frozen quite as solidly as the rest.</p>
<p>It is at that moment – awkwardly splayed and mainly immobile – that I glance up into the trees and lock eyes with a coyote. Larger than any I’ve seen before, its mane is broad, its coat is thick – a pale grey blended with tan – its ears are turned forward, its body more still than the trees.</p>
<p>And, as my twisting self-extraction out of the sludge turns into a no-less desperate dance to ready and aim my camera, the coyote swivels and lopes, easily, straight up the steep snowed-in bank. The shutter clicks at nothing but trees.</p>
<p style="text-align: center;"><img src="../../assets/post03/hemlocks_in_snow_by_scott.jpg" alt="hemlocks" height="300px" width="400px" /></p>
<p style="text-align: center; font-size: 80%"><strong>hemlock stand (photo by Scott)</strong></p>
<p>Now two years later, I am back in the Huyck Preserve, setting out with my friend Dr. Scott LaPoint (an expert in the mesopredator fauna of New York State) and Dr. Anne Rhodes (an expert in everything Huyck Preserve) from the sturdy old research center on Lincoln Pond to find a suitable site for a winter field study.</p>
<p>Our plan is to find a patch of woods, strap 54 cameras onto trees in a dense grid, track animals in the snow, match the tracks to photos, and learn what we can about the coyotes and the fishers and the foxes (and if they don’t show up, about the squirrels and the deer). But between the lines of our scientific plan is a simple, selfish, desire to tromp around the woods in winter. If we cross tracks with my old friend the coyote, so much the better.</p>
<p>There are no deep drifts this bright mid-December morning, just a dusting of snow. The Preserve is like overly powdered sugar donut in a glowing display case, the dusting slowly crusting and fading in the bright sun.</p>
<p>Even still, the snow has stories to tell.</p>
<p>Here, for example, along the creek by a bridge, an otter struts along the bank, betrayed by the wide splay of its feet,the furrow of its tail, and the eventual belly slide into an exposed riffle. Along the wood railing of that bridge, the tidy tracks of several mice (<em>deer mice</em>? <em>white-footed mice</em>?): tiny clusters of feet, each foot a cluster of four round-looking toes. Just below, the small, close predatory steps of a weasel (<em>short-tailed</em>? <em>long-tailed</em>?), entering the rushes at the pond’s edge.</p>
<p style="text-align: center;"><img src="../../assets/post03/mouse.jpg" alt="mouse" height="200px" width="400px" /></p>
<p style="text-align: center; font-size: 80%"><strong>mouse</strong></p>
<p>Nearby, a foot-wide boulder is covered with a tidy toupée of snow. In the middle of it four squirrel feet. The rear haunches wide, the front feet prim and forward. There are no tracks leading to it or away, just a birch creaking lightly in the breeze.</p>
<p>Tucked between remnants of three, centuries old, stone pasture walls, we find a stand of slender sugar maples. The area is flat, neatly delineated, open to sky and snowfall. The trees are almost uniform in their layout. A coyote (the coyote?) has been here recently, has sliced the field diagonally, northeast to southwest, straight as an arrow, not pausing to sniff or mark or tarry under the sky watching through the sparse and naked maples.</p>
<p>We have found our site.</p>
<p>Conditions are perfect for dragging stakes, tying bright pink ribbons to trunks, measuring distances between trees that will soon be strapped with cameras. The day is long and tiring and wholly satisfying. But by the time we are done, the snow is also very nearly gone. All the life in the forest that scurries and hops and saunters and trots and sniffs at this and snorts at that now leaves no imprint on the bare ground.</p>
<p>It feels tragic, somehow. As if a great civilization had lost the art of recording its own history.</p>
<p style="text-align: center;"><img src="../../assets/post03/stories_in_the_snow.jpg" alt="stories" height="100px" width="300px" /></p>
<p>In the sturdy old laboratory at the south end of the pond, we are camped on cots next to dusty sample cabinets of small mammal skulls, beetles and damselflies pinned to frames by ecological collectors past. At the end of a long day, we, at least, have not yet lost the ability to write. I jot down some preliminary “findings”:</p>
<p><em>Creatures – abundant. Snow – unreliable. Soon – the trees will be our eyes.</em></p>
<p>Outside, the night is dark and moonless and silent, except for the occasional shriek of a barred owl.</p>
<p><strong><em>Post scriptum:</em></strong></p>
<p style="text-align: center;"><img src="../../assets/post03/coyote.jpg" alt="coyote" height="200px" width="500px" /></p>
<p style="text-align: center; font-size: 80%"><strong>a camera “traps” a coyote</strong></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Another component of the grant, of course, is to actually analyze the results of our study. That bit, like so much else, is a work in progress - but might be the subject of a future post. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Elie Gurarieegurarie@esf.eduTales told in tracks in the snowy woods.What is co-occurrence in continuous space?2020-07-05T00:00:00-05:002020-07-05T00:00:00-05:00https://eligurarie.github.io/CoOccurrence-And-RSFs<p>An interesting question came up the other day from a colleague who is modeling the spatial interaction between two animals (caribou and muskox) whose ranges overlap in a study area in a portion of the Canadian Arctic. She has fitted two resource selection functions (RSF’s) - one for the caribou, one for the muskox - identified some relevant differences in their preferences for different elevations, vegetation types, distance from water, etc., and is now wondering if she can use those results to create a map of “co-occurrence” of the two species on the landscape. This question is not entirely abstract - there is local concern in this particular region that the muskox (which had been extirpated and more recently reintroduced) are competing with the caribou, which are an important local resource for subsistence. The socio-ecological context - as always - is interesting and complicated. But the statsy question itself is also interesting, as it forces us to think about what does an RSF really tell us, and what does co-occurrence really mean.</p>
<p><strong>Muskox are amazing. When you see them, they look like paleolithic aliens that have been teleported from Pluto. Here are some that I saw on a caribou survey (photo: Dean Cluff).</strong></p>
<p><img src="../../assets/post02/muskox_cluff.jpg" alt="" /></p>
<h2 id="an-aside-to-grumble-about-rsfs">An aside to grumble about RSFs</h2>
<p>To be honest, I have some (maybe many) issues with RSF’s, even though I have dealt with them quite a bit (<a href="https://terpconnect.umd.edu/~egurarie/research/NWT/">here’s even a link to a series on primers</a>), especially when applied to movement data. They can be useful for generating easy to interpret maps of habitat suitability, and for identifying some general patterns of preference or avoidance, and those maps have real value, both for communication and decision-making. And, since movement data from GPS collared animals is one of the most commonly available kinds of data for “observing” animals in the wild, it is tempting to use those data to make habitat suitability maps - as well as some inferences on habitat-specific preference or avoidance.</p>
<p>However, there are a lot of weird assumptions that underly them. For example: that an animal walking around a landscape has a bird’s eye GIS-layer type knowledge of accessible habitats and can spread itself around space. Or that an “availability” set - which is usually just sampled randomly from some landscape - meaningfully reflects “lack of use”. Also, in my experience, RSF’s are just fussy and slippery and (computer) resource intensive.</p>
<p>Finally, a problem with RSF’s is confusion regarding how to interpret the predictions. You perform some kind of logistic regression, which gives predictions on a “logit” (or log-odds) scale, which - in a normal logistic regression - you can just convert to a probability. But probabilites aren’t really meaningful if you yourself artificially set the availability data set! If you have as many available points as used points, the overall probaiblity is 1/2. If you have 10 times as many “available” points, it is 1/11. So rather than back-transform, RSF’s are usually defined as JUST the exponential bit.</p>
<p>All that said, my own interpreration of RSF’s has been very enjoyably “paradigm shifted”<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> by some recent reinterpretation of RSF’s in terms of <strong>point process intensities</strong>. And that reinterpretation really helps answer a question like this one. Though - as usual - there were a few unexpected twists along the way.</p>
<h2 id="point-processes">Point processes</h2>
<p>But first … what does “co-occurrence” for a bunch of points in space?</p>
<p>In theory - a point takes up zero space. So the probability that two points “co-occur” is, most strictly speaking, zero. So the question only means anything in terms of some area of co-occurrence (or alternatively, but really very similarly, some distance of interaction). At the scale of the study area in question (the portion of the Canadian Arctic with caribou and muskox) co-occurrence is 100%, and at the extreme spatial scale of the entire world, we are <em>all</em> co-occurring. While that makes for a terrific <em>bumper sticker</em>-worthy sentiment, it’s not actually that useful.</p>
<p>So let’s go back and think about points. Below, we have two species, 100 individuals each, sharing 100 squared units of space. The density of each is 1 animal per unit squared, but the spatial distribution is completely random.</p>
<p><img src="../../assets/post02/Fig1-1.png" alt="" /><!-- --></p>
<p>This is the simplest (homogeneous) two-dimensional <em><a href="https://en.wikipedia.org/wiki/Poisson_point_process">Poisson point process</a></em>. Why Poisson - which many associate with a <a href="https://en.wikipedia.org/wiki/Poisson_point_process">discrete distribution</a> frequently used to model count data? Because if we subset the space equally, the number of these points per unit area will be distributed as a Poisson random variable. Here’s an illustration, where we took the data above and broke it down into 100 square units and counted the number of red and blue points in each.</p>
<p><img src="../../assets/post02/Fig2-1.png" alt="" /><!-- --></p>
<p>Because the <em>intensity parameter</em> is equivalent to the mean density of points, $\lambda = 1$ and the resulting count distribution is a <a href="https://en.wikipedia.org/wiki/Poisson_distribution">Poisson distribution</a> with intensity 1. We can write that in terms of the density function as: \(P(A = k, \lambda_a) = {\lambda_a^k e^{-k} \over k!}\).</p>
<h2 id="co-occurring-point-processes">Co-occurring point processes</h2>
<p>We can define co-occurrence - reasonably - as the probability that there is at least one of both species in a given cell. This definitian is, of course, contingent on the area of that cell. We might write that something like:
\({\cal C_{occ}} = P(A \geq 1|\lambda_a) \, P(B \geq 1|\lambda_b).\)
The probability that there is <em>at least one</em> of something is, conveniently, equal to 1 minus the probability that there is <em>none</em> of that something, so the probability of co-occurrence is \({\cal C_{occ}} = (1 - P(A = 0)) \, (1 - P(B = 0)) = (1 - e^{\lambda_a})(1 - e^{\lambda_b}).\) Note, we’re letting the $\lambda$ be unique for each of the two species.</p>
<p>This is a nice tidy result. In our case, $(1 - 1/e)^2 \approx 0.4$ and, sure enough, there are 38 out of 100 co-occupied squares in the simulation.</p>
<p>Even more conveniently, if $\lambda$ is very small (which would happen if, say, the area of our squares were smaller, or the density of points were lower), then $1 - e^\lambda$ looks an awful lot like just $\lambda$, which is an even more convenient result, because it just says that the co-occurence probability is $\lambda^2$.</p>
<p><img src="../../assets/post02/ExpLambda-1.png" alt="" /><!-- --></p>
<p>The intensity / density parameter $\lambda$ captures the general likelihood of an animal being found in a general location. It is “scale robust”, because it is a nice, useful, meaningful measure despite the fact that the probability of being in any one (infinitesimally small) location is always 0, and being anywhere in the universe is 1.</p>
<p>So, <em>in a very similar way</em>, that $\lambda^2$ (which has weird units of density² - or, e.g., $n_A n_B/km^4$) is a legitimate measure of “co-occurrence intensity” that can be considered similarly “scale-robust” and meaningful for any location in space.</p>
<h2 id="what-does-this-have-to-do-with-rsfs">What does this have to do with RSF’s?</h2>
<p>The “paradigm shift” I alluded to earlier is to abandon any pretense or interest in RSF’s as modeling a probability (as <a href="https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2656.12132">McDonald 2013</a> would suggest - don’t even think of RSF’s a a “logistic regression” at all), or even as a “weighted spatial distribution” (as per. <a href="https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1890/0012-9658(2006)87[3021:WDAEOR]2.0.CO;2">Lele and Keim 2006</a>), but to embrace them as an estimate - specifically - of the <em>spatial intensity of an inhomogeneous Poisson process</em>, i.e. as spatially explicit estimates of $\lambda$ above.</p>
<p>I’ll unpack this a bit below, but really, I can’t recommend enough this excellent video lecture by <a href="https://fwcb.cfans.umn.edu/personnel/john-fieberg">John Fieberg</a> at <a href="https://www.youtube.com/watch?v=6IXM8DZ6qVc">this link</a>, also associated online materials <a href="https://movebankworkshopraleighnc.netlify.com/presentations">here</a>, especially <a href="https://movebankworkshopraleighnc.netlify.app/presentations2019/IntroRSFandSDM.pdf">Fieberg’s slides</a>. It’s an “introductory lecture”, but one that presents the RSF machinery explicitly in the context of inhomogeneous point processes (what Fieberg rightly calls “The Great Unifier”).</p>
<p>So, when we fit RSF’s we ususally have a Used / Available (ideal would be <em>“Not Used”</em> … but that’s far too rare) design where environmental variables are collected for the observed set and the “available” set, which is usually some sort of sampling reflecting some sort of null hypothesis. We then do what looks like a logistic regression, which fits the following model</p>
\[P(Y = 1) = {\exp(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + ...) \over {1 + \exp(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + ...)}}\]
<p>where the $X_i$’s are the covariates we think are useful predictors, and the corresponding $\beta$’s are the regression coefficients. Positive values of $\beta$ indicate higher probabilities of use, negative values indicate lower probabilities of use.</p>
<p>However, since we are not actually interested in probabilities (e.g. in the intercept of the logistic regression above), the Resource Selection <em>Function</em> itself is usually just defined on the numerator, with the intercept simply tossed out:</p>
\[w_i(x, \beta) = \exp(\beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + ...)\]
<p>This has the advantage of being simpler to look at, but opens up the question of what <em>is</em> $w(x,\beta)$.</p>
<p>So, bypassing other interpretations, approaches and debates, the bit of weird, deep, almost magical insight is that if you give the “available” subset <em>arbitrarily large weights</em>, the function $w$ leads directly to a a good estimate of $\lambda(x)$! Specifically:
\(\widehat{\lambda}(x) = {n \, w(x, \beta) \over \int_A w(x, \beta) \, dx}\)</p>
<p>Let’s illustrate this. We’re going to simulate species $A$, but with a single covariate that’s just the $X$ coordinate, i.e. the points are more concentrated to the east than in the west:</p>
<p><img src="../../assets/post02/Simulation1-1.png" alt="" /><!-- --></p>
<p>Now, I <em>know</em> the density increases <em>linearly</em> with <em>x</em>, and to capture that we actually have to fit the logistic regression against $\log(x)$ as the covariate. This is a strong argument, by the way, for taking the log transform of any “distance-to” variable in an RSF if you want to model a linear relationship between densities and distances, AND if you want the density to be 0 at the 0 edge. Otherwise, that modeled relationship will always increase exponentially, which causes all sorts of problems.</p>
<p>Below, in a few lines of code, I simulate the inhomogeneous point process and fit the model. NOTE the arbitratily high weight (1000) for the null locations, even though in reality I have the same number of both null and available:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">z1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="m">10</span><span class="o">*</span><span class="n">rbeta</span><span class="p">(</span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1i</span><span class="o">*</span><span class="n">runif</span><span class="p">(</span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">10</span><span class="p">)</span><span class="w">
</span><span class="n">null</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">runif</span><span class="p">(</span><span class="m">300</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">10</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1i</span><span class="o">*</span><span class="n">runif</span><span class="p">(</span><span class="m">300</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">10</span><span class="p">)</span><span class="w">
</span><span class="n">df</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">rbind</span><span class="p">(</span><span class="n">data.frame</span><span class="p">(</span><span class="n">Used</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w"> </span><span class="n">X</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">Re</span><span class="p">(</span><span class="n">z1</span><span class="p">),</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">),</span><span class="w">
</span><span class="n">data.frame</span><span class="p">(</span><span class="n">Used</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">X</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">Re</span><span class="p">(</span><span class="n">null</span><span class="p">),</span><span class="w"> </span><span class="n">w</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1000</span><span class="p">))</span><span class="w">
</span><span class="n">fit1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">glm</span><span class="p">(</span><span class="n">Used</span><span class="o">~</span><span class="nf">log</span><span class="p">(</span><span class="n">X</span><span class="p">),</span><span class="w"> </span><span class="n">weights</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">w</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">df</span><span class="p">,</span><span class="w"> </span><span class="n">family</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"binomial"</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">fit1</span><span class="p">)</span><span class="o">$</span><span class="n">coef</span><span class="w">
</span></code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -9.826814 0.3838600 -25.599998 1.52561e-144
## log(X) 1.140449 0.2049926 5.563366 2.64620e-08
</code></pre></div></div>
<p>The estimated coefficient on the $\log(X)$ variable is 1.14 (se. 0.2). By making the equivalence between that (intercept free) exponential bit and the intensity function, the inhomogeneous intensity $\lambda$ (recall, <em>inhomogeneous</em> here just means <em>a function of x and y</em>):
\(\widehat{\lambda}(x,y) ={ n \, \exp(\beta \log(x)) \over \int_0^{Y} \int_0 ^ {X} \exp(\beta \log(x)) \, dx \, dy}\)
where <em>X</em> and <em>Y</em> are the dimensions of the rectangular area. For this particular log-distance model, the whole thing breaks down into the following result:</p>
\[\widehat{\lambda}(x,y) = { n \, x^\beta \over Y \int_0^X x^\beta \, dx} = {1 + \beta \over A} \left({x \over X}\right)^\beta n\]
<p>where $A$ is the overall area <em>XY</em>.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
<p>You can see our prediction below against the observations:</p>
<p><img src="../../assets/post02/IllustratingRSFLambda-1.png" alt="" /><!-- --></p>
<p>It’s a good model! And it shows how to directly link an RSF result to an intensity / density.</p>
<h2 id="nice-rsf-but-what-about-co-occurrence">Nice RSF, but what about co-occurrence?</h2>
<p>We’re just about there.</p>
<p>Here’s another - final - simulated data set. In this version, there are two species: 100 muskAxe (A) and 400 cariBoo (B)<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>. They have somewhat different responses to two covariates, which (for simplicity) are again just the geographical coordinates X and Y. Specifically, the muskaxes are more likely to be found towards the north and east, and the cariboos are concentrated near the south - in both cases in somewhat non-linear ways (quiet shout-out to the all-versatile <a href="https://en.wikipedia.org/wiki/Beta_distribution">Beta distribution</a>). We’ll put these animals on a raster, which has 100 m x 100 m resolution over an area of 100x100 km².</p>
<p><img src="../../assets/post02/Simulation2-1.png" alt="" /><!-- --></p>
<p>We fit some (logistic) GAM’s, with, say, 1000 uniformly sampled null points from the availability set</p>
<p><img src="../../assets/post02/modelFitting-1.png" alt="" /><!-- --></p>
<p>In an RSF analysis, usually people work with rasters of the covariates (in this case, just X and Y values). We can convert our GAM’s in a surface of <em>intensities</em> with the following steps:</p>
<ul>
<li>(a) predict over the raster,</li>
<li>(b) subtract away the intercept and place that in the exponent (that’s $w$),</li>
<li>(c) “normalize” that $w$ by dividing by its sum and multiply by the grid-cell size $\Delta x \Delta y$<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></li>
<li>(d) multiply the whole thing by the number of total individuals.</li>
</ul>
<p>The code will look something like this:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">A.intercept</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">glm.A</span><span class="o">$</span><span class="n">coefficients</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w">
</span><span class="n">B.intercept</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">glm.B</span><span class="o">$</span><span class="n">coefficients</span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w">
</span><span class="n">w.A</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">exp</span><span class="p">(</span><span class="n">predict</span><span class="p">(</span><span class="n">xy.brick</span><span class="p">,</span><span class="w"> </span><span class="n">glm.A</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">A.intercept</span><span class="p">)</span><span class="w">
</span><span class="n">w.B</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">exp</span><span class="p">(</span><span class="n">predict</span><span class="p">(</span><span class="n">xy.brick</span><span class="p">,</span><span class="w"> </span><span class="n">glm.B</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">B.intercept</span><span class="p">)</span><span class="w">
</span><span class="n">lambda.brick</span><span class="p">[[</span><span class="m">1</span><span class="p">]]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">w.A</span><span class="o">/</span><span class="p">(</span><span class="nf">sum</span><span class="p">(</span><span class="n">getValues</span><span class="p">(</span><span class="n">w.A</span><span class="p">))</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">res</span><span class="p">(</span><span class="n">w.A</span><span class="p">)[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">res</span><span class="p">(</span><span class="n">w.A</span><span class="p">)[</span><span class="m">2</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">n.A</span><span class="w">
</span><span class="n">lambda.brick</span><span class="p">[[</span><span class="m">2</span><span class="p">]]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">w.B</span><span class="o">/</span><span class="p">(</span><span class="nf">sum</span><span class="p">(</span><span class="n">getValues</span><span class="p">(</span><span class="n">w.B</span><span class="p">))</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">res</span><span class="p">(</span><span class="n">w.B</span><span class="p">)[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">res</span><span class="p">(</span><span class="n">w.B</span><span class="p">)[</span><span class="m">2</span><span class="p">])</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">n.B</span><span class="w">
</span></code></pre></div></div>
<p>And the resulting density predictions:</p>
<p><img src="../../assets/post02/Lambda_Fitted-1.png" alt="" /><!-- --></p>
<p>We can confirm that the numbers are “correct” by making sure that the mean densities are 100 ind./(100 km x 100 km) = 0.01 and 400 ind./(100 km x 100 km) = 0.04:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mean</span><span class="p">(</span><span class="n">getValues</span><span class="p">(</span><span class="n">lambda.brick</span><span class="p">[[</span><span class="m">1</span><span class="p">]]))</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">0.01</span><span class="w">
</span><span class="n">mean</span><span class="p">(</span><span class="n">getValues</span><span class="p">(</span><span class="n">lambda.brick</span><span class="p">[[</span><span class="m">2</span><span class="p">]]))</span><span class="w">
</span><span class="o">></span><span class="w"> </span><span class="p">[</span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="m">0.04</span><span class="w">
</span></code></pre></div></div>
<p>Looks good!</p>
<p>Finally, we’re ready to plot a <em>co-occurrence intensity</em> plot, and all we have to do is multiply the two intensities!</p>
<p><img src="../../assets/post02/CooccurrencePlot-1.png" alt="" /><!-- --></p>
<p>This, then, is a map of the “intensity” of co-occurrence, which - again - is in weird units of ind² / km⁴, but is actually a fairly straightforward measure. It says that per km², you can expect at most about 0.0016 muskax and cariboo to share that unit of space (compared to an over-all co-occurrence density of 0.004). Or - you can state that as a probability (and expand the geographic range) and say that in a 10x10 km² area, the probability of encountering at least one cariboo and muskax peaks at something like 16%.</p>
<blockquote>
<p>Note that this co-occurrence intensity is weighted more towards the cariboo; a reflection of the fact that there are 4 times more caribou <em>in the data</em>. However (and this is very important) presumably, what’s of actual interest is co-occurrence <em>across the entire population</em>. And for that, you need an important piece of information that is not always availble, namely <em>the population estimate of each species</em>!</p>
</blockquote>
<p>With that important piece, one can now take two (or more) RSF’s and turn those into a co-occurrence intensity map. Easy peasy.</p>
<p><strong>Higher order multi-species Arctic megafauna co-occurrence, at least in the imagination of a carved woolly mammoth, is illustrated below.<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup></strong></p>
<p><img src="../../assets/post02/dreamingmammoth.jpg" alt="" /></p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Can paradigm shifts even be personal? <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p><em>Actually</em> - we know that the “true” answer is $\beta = 1$ (which fits within the standard error of the regression estimate), which makes the whole thing reduce to simply: $\widehat{\lambda}(x,y) = {2n \over AX} x$, i.e. a linear density that ranges from 0 to $2n/A$, averaging out to $n/A$. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Out of something resembling principle, I refuse to give simulated animals names of actual species! <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>recall that the discrete version of $\int_0^{Y} \int_0 ^ {X} f(x,y) \, dx \, dy = \sum_{i = 1}^{n_x} \sum_{i = 1}^{n_y} f(x,y) \Delta x \Delta y$ <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>With an essential shout-out to the anonyous carvers from Taymir, Russia … this picture came from somewhere within this website: http://www.tdnt.org <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Elie Gurarieegurarie@esf.eduAn interesting question came up the other day from a colleague who is modeling the spatial interaction between two animals (caribou and muskox) whose ranges overlap in a study area in a portion of the Canadian Arctic. She has fitted two resource selection functions (RSF’s) - one for the caribou, one for the muskox - identified some relevant differences in their preferences for different elevations, vegetation types, distance from water, etc., and is now wondering if she can use those results to create a map of “co-occurrence” of the two species on the landscape. This question is not entirely abstract - there is local concern in this particular region that the muskox (which had been extirpated and more recently reintroduced) are competing with the caribou, which are an important local resource for subsistence. The socio-ecological context - as always - is interesting and complicated. But the statsy question itself is also interesting, as it forces us to think about what does an RSF really tell us, and what does co-occurrence really mean.Putting *p*-values on social aggregations2020-07-01T00:00:00-05:002020-07-01T00:00:00-05:00https://eligurarie.github.io/Social-P-Values<p>This is my first blog post on this site. It is therefore, in large part part, an experiment to see if this whole <a href="https://mmistakes.github.io/minimal-mistakes/about/">blogging mechanism</a> works. Foolishly, it is a long post, and an ambitious one (technically speaking). Also - foolishly - it is laden with both content and context.</p>
<p>The context is that I’ve spent too much of the past few days trying to answer a question that I thought would be fairly straightforward. Now that I have <em>a</em> solution - which is only mostly satisfactory, it seems it will make the overlong journey more worthwhile to record some of the twists and lessons along the way (<em>never mind that it took an</em> <strong><em>additional</em></strong> <em>few days trying to get this blog to post!</em>). There are some fun themes here, of randomness, contructing null hypotheses, spatial point processes, and a teeny bit of R code. In the end, I have a tool which (I hope) will be useful to getting real insights into animal behavior (in this case - caribou), but that might be of broader interest as well. Perhaps others can suggest better ways to get there (<em>assuming the comments feature below works!</em>)</p>
<h2 id="the-context">The context</h2>
<p>I spend a lot of time studying movement data on caribou in North America. Caribou are mysterious in many, many ways, and there are many unique challenges in analyzing their data - grist for many (<em>almost surely never-to-be-written</em>) blog posts. Not least of these challenges is that in a herd that can contain many tens of 100’s of thousands of animals - there are usually around 20 animals collared at a time. At best - 50-ish. So broad inferences have to be made (carefully) from very small samples.</p>
<p><strong>This, for the record, is what 50 (red) points out of 100,000 looks like:</strong></p>
<p><img src="../../assets/post01/SmallSample-1.png" alt="" /></p>
<p><strong>And this is what congregating caribou look like:</strong> <br /></p>
<p><img src="../../assets/post01/Joly1.png" alt="" /></p>
<p><em>(Image by K. Joly - from a recent <a href="https://www.mdpi.com/2072-6651/12/5/334">paper</a> published, unexpectedly, in the journal</em> Toxins).</p>
<p>Obviously caribou are social and aggregate. The question is: if we only observe only very few of them, can we detect <em>significant</em> social aggregation? And - more relevantly - see how those aggregations vary over time?</p>
<h2 id="a-super-straightforward-estimate">A super-straightforward estimate</h2>
<p>Let’s take a look at a very straightford measure: how many really <em>close</em> encounters are there in some subset of data? Below, a figure of the number of pair-wise distances among all caribou (from one particular herd, in one particular year) that are less than 200, 100 and 50 m. Note - the range size for this particular herd is on the order of 100’s of km in a given dimension, so these distances are really quite small.</p>
<p>Ok, here’s the graph:</p>
<p><img src="../../assets/post01/EncountersPlot-1.png" alt="" /><!-- --></p>
<p>A lot of variation here - plenty of days with no animals within those encounter radii, and then some days when there are 30 or more close encounters. And a lot of biologically interesting patterns. For example, inter-individual distances are (surprisingly) very low during the spring migration period (essentially - May), peak during the early summer calf-raising period, but are much lower during late summer.</p>
<p>Let’s zoom in on just a 10 day period and see what’s going on. The red croses indicate pairs of animals within 200 m of each other, with the count (in red) at the bottom of each panel. The grey blob indicates the 80% kernel density of the complete set of points (which neatly excludes those animals that are far off to the north):</p>
<p><img src="../../assets/post01/ZoomingIn-1.png" alt="" /><!-- --></p>
<p>Two things to note: The size of the blob stays pretty constant - and the number of individuals is the same. But the number of encounters varies A LOT, peaking at 27 on June 30 and crashing to 0 a week later on July 7.</p>
<p>Again - there’s a super interesting behavioral question here, and intriguing ecological hypotheses to explore. But the main question here is statistical, namely: are those numbers of encounters <em>more</em> than expected? Are others <em>less</em> than expected? Can we get to that oh so hotly desired crutch of all inference … a <em>p</em>-value from these observations?</p>
<h2 id="some-point-processes">Some point processes</h2>
<p>Before we get to a <em>p</em>-value, it’s helpful to simulate data so we can really, really know what’s going on. I used various <a href="https://www.rdocumentation.org/packages/spatstat/versions/1.64-1/topics/runifpoint">random</a> <a href="https://www.rdocumentation.org/packages/spatstat/versions/1.64-1/topics/rMatClust">point-generation</a> <a href="https://www.rdocumentation.org/packages/spatstat/versions/1.64-1/topics/rSSI">functions</a> in the package <a href="https://cran.r-project.org/web/packages/spatstat/index.html"><code class="language-plaintext highlighter-rouge">spatstat</code></a> to create three distributions of 32 points each, just as with the caribou data above:</p>
<p><img src="../../assets/post01/ThreeProcessesPlots-1.png" alt="" /><!-- --></p>
<p>The “clustered” process looks like there are a bunch of aggregations. The “inhibited” process looks like everyone is a superchamp social distancer. The third one - is, well, perfectly random (which - maybe to many people - looks like it’s bunched up in funny ways, but that’s to be attributed to our miserable human intuition for randomness).</p>
<p>If we count the number of encounters - which we’ll defined as the number of unique pairs less than 0.3 units distance of each other, we get the following results:</p>
<ul>
<li><strong>clustered:</strong> 28</li>
<li><strong>inhibited:</strong> 0</li>
<li><strong>random:</strong> 4</li>
</ul>
<p>So the “random” number is a Goldilocks number, not too few, not too many. And the process that generated it a decent working <em>null hypothesis</em>. That process is <a href="https://en.wikipedia.org/wiki/Complete_spatial_randomness">complete spatial randomness (sometimes CSR)</a>, or, more wonkily, a <a href="https://en.wikipedia.org/wiki/Poisson_point_process">homogeneous spatial Poisson point process</a>. This is quick to generate, and quick to summarize, so a super straightforward approach is just to simulate the process a bunch of times and count the encounters.</p>
<p>I don’t - generally - want to clutter these blog posts (<em>note how ambitiously I anticipate future posts!</em>) with too much R code, but the following code is maybe worth sharing (despite the horrors of a loop!) because it relies entirely on very <a href="https://www.shakespeareswords.com/Public/Glossary.aspx?id=1414">“base”</a> functions:</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">getNwithinR</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">z</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="p">){</span><span class="w">
</span><span class="n">D</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">outer</span><span class="p">(</span><span class="n">z</span><span class="p">,</span><span class="w"> </span><span class="n">z</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">z1</span><span class="p">,</span><span class="n">z2</span><span class="p">)</span><span class="w"> </span><span class="nf">Mod</span><span class="p">(</span><span class="n">z2</span><span class="o">-</span><span class="n">z1</span><span class="p">))</span><span class="w">
</span><span class="nf">sum</span><span class="p">(</span><span class="n">D</span><span class="p">[</span><span class="n">upper.tri</span><span class="p">(</span><span class="n">D</span><span class="p">)]</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">r</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">density</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1</span><span class="p">;</span><span class="w"> </span><span class="n">n.ind</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">32</span><span class="p">;</span><span class="w"> </span><span class="n">area</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">n.ind</span><span class="o">/</span><span class="n">density</span><span class="w">
</span><span class="n">N.enc.null</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="kc">NA</span><span class="p">,</span><span class="w"> </span><span class="m">1e4</span><span class="p">)</span><span class="w">
</span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="nf">length</span><span class="p">(</span><span class="n">N.enc.null</span><span class="p">)){</span><span class="w">
</span><span class="n">z</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">runif</span><span class="p">(</span><span class="n">n.ind</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="nf">sqrt</span><span class="p">(</span><span class="n">area</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1i</span><span class="o">*</span><span class="n">runif</span><span class="p">(</span><span class="n">n.ind</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="nf">sqrt</span><span class="p">(</span><span class="n">area</span><span class="p">))</span><span class="w">
</span><span class="n">N.enc.null</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">getNwithinR</span><span class="p">(</span><span class="n">z</span><span class="p">,</span><span class="w"> </span><span class="m">.3</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>Here’s the resulting histogram, with our three observations (inhibited, random, clustered):</p>
<p><img src="../../assets/post01/HistogramNull-1.png" alt="" /><!-- --></p>
<p>We can get pretty precise empirical (randomization) p-values on our three observations: The 28 encounters in the clustered pattern (red line) is WAY HIGHER than expected: $p$-value = 0. The p-value on the <em>inhibited</em> pattern (with 0 observations) is also not too likely, though the probability is at least detectable: 116 of 10000 simulations for a <em>p</em>-value of 0.0116. The random result is slap dab in the middle of the null distribution.</p>
<h2 id="analytical-results">Analytical results!</h2>
<p>Randomization can get you really, really far in inferential life. It is, unfortunately, not really practical with very large amounts of data. And, it turns our, we can - with not too much difficulty - derive a formula that takes the number of individuals (<em>n</em>), an area of use (<em>a</em>), and an encounter radius (<em>r</em>) to provide a good null distribution, against which observations of encounters can be compared.</p>
<p>The homogeneous Poisson process is defined by only one parameter, $\lambda$, called the <em>intensity</em> (generically). In two dimensions, the intensity is just the <em>density</em> of points. That’s the process, but the <em>statistic</em> we observed is the Number of Encounters, given <em>r</em>, <em>a</em>, and <em>n</em> (let’s call that $E_{total}(r,n,a)$), and the real question is how is THAT random variable distributed.</p>
<p>The trick is to think of the pair-wise encounters as (independent) events, and compute their probability. The <em>number</em> of unique pairs is <em>n(n-1)/2</em> … that’s maybe most easily seen as the upper triangle of a matrix of all links:</p>
<p><img src="../../assets/post01/DMatrix-1.png" alt="" /><!-- --></p>
<p>Every cell of the (upper half) of this matrix represents one pair-wise link among 32 individuals, and - in this case - the smattering of dots represents those pairs whose distance is less than 1. Those pairwise distances are (and this is a very important point) are <em>themselves</em> completely random. And their probability is easy to calculate. If you randomly pick a point in space - anywhere - the probability that another individual is within radius <em>r</em> of that point is the area of a circle or radius <em>r</em> divided by the total area <em>a</em>, so: $p = \pi r^2 / a$.</p>
<p>So the distribution of the number of unique encounters is just the sum of a bunch of (low) probability events of probability $p$ over $n(n-1)/2$ possible pairs, i.e. a <a href="https://en.wikipedia.org/wiki/Binomial_distribution"><em>Binomial distribution</em></a>, with the two parameters $p = \pi r^2 / a$ and $n’ = n(n-1)/2$. This is a really easy distribution to work with!</p>
\[E_{total}(r,n,a) \sim Binomial\left(\pi r^2 / a,\ {n(n-1)\over2}\right)\]
<p>Furthermore, if $p$ is very small (which is generally will be) and $n’$ is reasonably large, which it outght to be, we can <a href="https://en.wikipedia.org/wiki/Binomial_distribution#Poisson_approximation">approximate it as a Poisson distribution</a>, with intensity $\lambda = p n’$, so, to a very good approximation:</p>
<p>\(E_{total}(r,n,a) \sim Poisson\left(\lambda = {\pi r^2 \, n(n-1) \over 2 a}\right)\)
The expected value (mean) of both of these distributions is $\lambda$.</p>
<p>A very tidy and simple result! Let’s compare these with our simulated result:</p>
<p><img src="../../assets/post01/ComparingModels-1.png" alt="" /><!-- --></p>
<p>You can see pretty darned good correspondence (the binomial and Poisson models are indistinguishable), though there is a slight shift towards fewer observations than predicted. In fact, the expected number of encounters (according to our model - with <em>n</em> = 32, r = 0.3, area = 32) is 4.38, whereas the actual (simulated) mean is somewhat lower at 4.21. This is almost certainly because of edge effects: points closer to the edge of the area are somewhat less likely to have a close neighbor. But, that seems like a minor effect (and certainly a very difficult one to correct for).</p>
<h2 id="revisiting-the-data">Revisiting the data</h2>
<p>So … we now have all the tools needed to apply a statistical test to observations of encounters! The steps are as follows:</p>
<ol>
<li>Find all of the encounters on a given day for a given radius.</li>
<li>Compute a “ranging area” (which - in the most hand-wavy bit of this analysis - I’ll set to an 80% kernel density estimate … because I <em>know</em> using my familiarity with the data and all-powerful and too-little-used “biological intuition” that there are always a few stragglers and idiosyncratic, free-thinking ne’er-do-wells in <em>every</em> animal population that are best left out of the whole computation).</li>
<li>Use the binomial distribution under the null assumption of random uniform distribution within that ranging area to obtain an expected number of encounters and a <em>p</em>-value for (either) the hypothesis of too few encounters, or too many encounters.</li>
</ol>
<p>Here’s how that looks:</p>
<p><img src="../../assets/post01/Results-1.png" alt="" /><!-- --></p>
<h2 id="what-to-make-of-this">What to make of this!?</h2>
<p>There are a few things to unpack here. The “ranging area” - as inferred from this group of individuals - fluctuates a LOT over these months, and the peak of encounters corresponds to a period in late June / early July when that area is particularly small (about 3000 km$^2$ compared to over 30,000 km$^2$).</p>
<p>Most dramatically: the expected number of encounters is VERY VERY SMALL! It never even reaches 0.1. That means that even a single close encounter is going to be statistically significant, and 30 encounters is astronomical. The variation in that curve, however, is very interesting, as it does reflect that shift in total range area, but they don’t entirely line up - and that plummeting cliff in the number of encounters is unexplained by that variation.</p>
<p>This suggests that the “forces” that drive the caribou - in general - to a smaller area might also be related to forces that drive caribou to become exceptionally “close,” though the immediate drivers of high encounters remain unexplained (though we have some ideas!).</p>
<p>More relevant to this post, these results do strongly suggest that even a rather small sample of data from an enormous population is enough to reveal a very, very strong signal of social aggregation, which (frankly) I didn’t necessarily expect. On the other hand, it is important to consider whether the null hypothesis is just too unrealistic. It might be that the 80% kernel density is just too darned large, though even picking a very small core range won’t affect the results <em>that</em> much. It also might be that spatial randomness is not the best way to distribute null points, that it would be better to account for higher densities in core areas of the range. That could have a stronger effect, though it would be hard to know how to generate a null distribution from an inhomogeneous process, or to trust any inference from the idiosyncratic sampling on higher-level properties of the distrubution.</p>
<p>It is likely, for example, that not all space within the “ranging area” is similarly available and that there are topographic and geographic features which will tend to cluster the caribou. I am a little bit familiar with the portions of the Canadiand Arctic that these animals hail from … and, while there are NO mountains or valleys, there might be significant patches which are simply too barren to be of any use. To account for that, we would need a good resource selection model as a foundation for the null distribution. This is a whole step more complex but - in this context - perhaps a worthwhile direction to pursue.</p>
<p>But, for now, I think the Binomial Aggregation Distance test (BAD) is - well - not <em>too</em> bad for these purposes.</p>
<p>(Sorry about that. I should quit while I’m ahead.)</p>Elie Gurarieegurarie@esf.eduThis is my first blog post on this site. It is therefore, in large part part, an experiment to see if this whole blogging mechanism works. Foolishly, it is a long post, and an ambitious one (technically speaking). Also - foolishly - it is laden with both content and context.