R: How to efficiently identify overlapping time intervals with two more criteria?

المشرف العام

Administrator
طاقم الإدارة
I have to check birds observations made over a longer period for duplicate/overlapping entries.

Observers from different points (A,B,C) made observations and marked them on paper maps. Those lines where brought into a line feature with additional data for the species, the observation point and the time intervals they were seen.

Normally, the observers communicate with each others via phone while observing, but sometimes they forget, so I get those duplicate lines.

I already reduced the data to those lines which touch the circle, so I do not have to make a spatial analysis, but only compare the time intervals for each species and can be quite sure that it is the same individual that is found by the comparison.

I'm now looking for a way in R to identify those entries which:

  • are made on the same day with an overlapping interval
  • and where it is the same species
  • and which were made from different observation points (A or B or C or ...))


In this example, I manually found possibly duplicated entries of the same individual. Observation point is different (A B), species is the same (Sst) and the interval of the start and end times overlaps.



I would now create a new field "duplicate" in my data.frame, giving both rows a common id to be able to export them and later decide on what to do.

I searched around a lot for already available solutions, but didn't find any concerning the fact that I have to subset the process for the species (preferably without a loop) and have to compare the rows for 2 + x observation points.

Some data to play around with:

testdata
 
أعلى