The problem is resolved, and I will explain what I found if it benefits anyone else.
This was due to streaming (as Smarx avoided), the fact that I was thinking as a SQL developer, and that I would think of rather strange / unexpected behavior using Azure code and the real lack of in-depth Examples!
So, to solve this problem, I have simplified the problem as much as possible.
I created a table containing one object, PartitionKey 'a', RowKey '1'.
I created a console application that selected "a" from the table, deleted it, changed it to "b", and inserted it again.
I ran the code in a loop, thousands of times, and everything worked fine.
Then I moved the code to a thread and started two threads. Immediately, I ended up with errors and lost an entity somewhere between deletion and insertion.
Problem 1: two threads can select an object at the same time, both can check if it exists, and both can try and delete it. When deleting, only one will succeed. Therefore, you need to catch the error, make sure that the error contains "ResourceNotFound", and if so, believe that the object has really gone and continues as usual.
Problem 2: tableContext remembers the last failed action, so the thread in which the delete error will not cause another error in SaveChangesWithRetries after calling AddObject. So you need to use the new tableContext for AddObject
Problem 3: Both threads have a chance to add an object, but only one of them will be successful. Even if both threads check to see if objects exist before adding them, they might both think that it does NOT exist and both are trying to add it. Therefore, for simplicity, let both threads try to add and add it, one succeeds and someone throws an "EntityAlreadyExists" error. Just catch this mistake and continue.
Here is my working code for this simple example, I changed it for my more complex example in the original question and now I am not getting any errors at all.
//method for shifting an entity backwards and forwards between two partitions, a and b private static void Shift(int threadNumber) { Console.WriteLine("Launching shift thread " + threadNumber); //set up access to the tables _storageAccount = CloudStorageAccount.Parse(_connectionString); _tableClient = new CloudTableClient(_storageAccount.TableEndpoint.AbsoluteUri, _storageAccount.Credentials); _tableClient.RetryPolicy = RetryPolicies.Retry(_retryAmount, TimeSpan.FromSeconds(_retrySeconds)); int lowerLimit = threadNumber * _limit; int upperLimit = (threadNumber + 1) * _limit; for (int i = lowerLimit; i < upperLimit; i++) { try { TableServiceContext tableServiceContextDelete = _tableClient.GetDataServiceContext(); tableServiceContextDelete.IgnoreResourceNotFoundException = true; string partitionKey = "a"; if (i % 2 == 1) { partitionKey = "b"; } //find the object with this partition key var results = from table in tableServiceContextDelete.CreateQuery<TableEntity>(_tableName) where table.PartitionKey == partitionKey && table.RowKey == "1" select table; TableEntity tableEntity = results.FirstOrDefault(); //shallow copy it if (tableEntity != null) { TableEntity tableEntityShallowCopy = new TableEntity(tableEntity); if (tableEntityShallowCopy.PartitionKey == "a") { tableEntityShallowCopy.PartitionKey = "b"; } else { tableEntityShallowCopy.PartitionKey = "a"; } //delete original try { tableServiceContextDelete.Detach(tableEntity); tableServiceContextDelete.AttachTo(_tableName, tableEntity, "*"); tableServiceContextDelete.DeleteObject(tableEntity); tableServiceContextDelete.SaveChangesWithRetries(); Console.WriteLine("Thread " + threadNumber + ". Successfully deleted. PK: " + tableEntity.PartitionKey); } catch (Exception ex1) { if (ex1.InnerException.Message.Contains("ResourceNotFound")) { //trying to delete an object that already been deleted so just continue } else { Console.WriteLine("Thread " + threadNumber + ". WTF?! Unexpected error during delete. Code: " + ex1.InnerException.Message); } } //move into new partition (a or b depending on where it was taken from) TableServiceContext tableServiceContextAdd = _tableClient.GetDataServiceContext(); tableServiceContextAdd.IgnoreResourceNotFoundException = true; try { tableServiceContextAdd.AddObject(_tableName, tableEntityShallowCopy); tableServiceContextAdd.SaveChangesWithRetries(); Console.WriteLine("Thread " + threadNumber + ". Successfully inserted. PK: " + tableEntityShallowCopy.PartitionKey); } catch (Exception ex1) { if (ex1.InnerException.Message.Contains("EntityAlreadyExists")) { //trying to add an object that already exists, so continue as normal } else { Console.WriteLine("Thread " + threadNumber + ". WTF?! Unexpected error during add. Code: " + ex1.InnerException.Message); } } } } catch (Exception ex) { Console.WriteLine("Error shifting: " + i + ". Error: " + ex.Message + ". " + ex.InnerException.Message + ". " + ex.StackTrace); } } Console.WriteLine("Done shifting"); }
I am sure that there are much better ways to do this, but due to the lack of good examples, I am just going with something that works for me!
thanks