I'm testing the internal test data(about 750 data set) and then checking for performance(about accuracy 88%) and testing the external data(about 180 data set).
The problem is that the performance is completely reversed. For example, those that were true positives, true negatives in internal data enter false positives, false negatives in external data.
I wondering why happen this situation.
I thought about 2 reason. First, external data have different local minima with internal data set Second, it's because there is not enough data.