4月1号收到兼职端的反馈,说进入“已分配“页面失败,显示系统错误。使用我自己账号登录,无法复现问题,但是使用兼职同学提供的账号进行测试,确实存在这个问题。

定位问题

首先查看日志,有这样一段异常信息:

java.lang.IllegalArgumentException: Comparison method violates its general contract!
        at java.util.TimSort.mergeLo(TimSort.java:773) ~[na:1.8.0_51]
        at java.util.TimSort.mergeAt(TimSort.java:510) ~[na:1.8.0_51]
        at java.util.TimSort.mergeForceCollapse(TimSort.java:453) ~[na:1.8.0_51]
        at java.util.TimSort.sort(TimSort.java:250) ~[na:1.8.0_51]
        at java.util.Arrays.sort(Arrays.java:1512) ~[na:1.8.0_51]
        at java.util.ArrayList.sort(ArrayList.java:1454) ~[na:1.8.0_51]
        at java.util.Collections.sort(Collections.java:175) ~[na:1.8.0_51]
        at xyz.hlj.pttms.service.TaskService.polymerizeTaskByRoom(TaskService.java:923) ~[pttms-server.jar!/:na]

感觉很奇怪,这是一个从来没见过的异常。定位到代码看一下

Collections.sort(resList, new Comparator<Map<String, Object>>() {
    @Override
    public int compare(Map<String, Object> o1, Map<String, Object> o2) {
        String o1Room = String.valueOf(o1.get("room"));
        String o2Room = String.valueOf(o2.get("room"));
        // 纯数字寝室号排序,由小到大排序
        if (RegexUtils.checkDigit(o1Room) && RegexUtils.checkDigit(o2Room)) {
            return new BigInteger(o1Room).compareTo(new BigInteger(o2Room));
        }
        return o1Room.compareTo(o2Room);
    }
});

抛出异常的地方是一个使用自定义比较器对集合进行排序的方法,逻辑也比较简单:如果两个比较对象都是纯数字,按照数字大小排;否则按字符串顺序排。

仔细看了一遍比较器的实现代码,发现这段代码没有完全满足业务上的要求,但是并没有看出异常的原因在哪里。

只好再去查异常,找到了jdk的一个不兼容说明(这个很关键,看到它之前被其他信息误导过):

Area: API: Utilities Synopsis: Updated sort behavior for Arrays and Collections may throw an IllegalArgumentException Description: The sorting algorithm used by java.util.Arrays.sort and (indirectly) by java.util.Collections.sort has been replaced. The new sort implementation may throw an IllegalArgumentException if it detects a Comparable that violates the Comparable contract. The previous implementation silently ignored such a situation. If the previous behavior is desired, you can use the new system property, java.util.Arrays.useLegacyMergeSort, to restore previous mergesort behavior. Nature of Incompatibility: behavioral RFE: 6804124

这份说明的意思是JDK比较器的排序算法更换了实现方式,新的实现在自定义比较器违背比较规则的情况下有可能会抛出异常,原来的实现忽略了这个异常。 如果需要原实现,可以通过增加系统属性java.util.Arrays.useLegacyMergeSort 恢复使用原来的排序规则。 看到这个,就确定了问题的大概原因,是我们的比较器违反了比较规则。但是因为是线上错误,所以暂不分析具体原因,先按照文档给的方法快速解决问题。

解决

在系统启动时,设置useLegacyMergeSort 属性为true

System.setProperty("java.util.Arrays.useLegacyMergeSort", "true");
SpringApplication.run(PttmsServerApp.class, args);

重新发布线上环境,测试通过。

分析原因

问题的原因已经基本确定在违反比较器规则了,但是具体规则是什么?又是怎么违反的呢?摘抄一段java8 compare方法的java doc,我们就可以清楚的知道规则是什么:

Compares its two arguments for order. Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second.In the foregoing description, the notation sgn(expression) designates the mathematical signum function, which is defined to return one of -1, 0, or 1 according to whether the value ofexpression is negative, zero or positive. The implementor must ensure that sgn(compare(x, y)) == -sgn(compare(y, x)) for all x and y. (This implies that compare(x, y) must throw an exception if and only if compare(y, x) throws an exception.) The implementor must also ensure that the relation is transitive: ((compare(x, y)>0) && (compare(y, z)>0)) implies compare(x, z)>0. Finally, the implementor must ensure that compare(x, y)==0 implies that sgn(compare(x, z))==sgn(compare(y, z)) for all z. It is generally the case, but not strictly required that (compare(x, y)==0) == (x.equals(y)). Generally speaking, any comparator that violates this condition should clearly indicate this fact. The recommended language is “Note: this comparator imposes orderings that are inconsistent with equals.”

对任意对象x,y,z必须满足: 1、sgn(compare(x, y)) == -sgn(compare(y, x)) 2、如果compare(x, y)>0,并且compare(y, z)>0, 意味着 compare(x, z)>0 3、如果compare(x, y)==0 意味着 sgn(compare(x, z))==sgn(compare(y, z))

符号函数sgn, x>0时,sgn(x)=1; x<0时,sgn(x)=-1; x=0时,sgn(x)0

那么,我们是怎么违反规则的呢?

查出问题楼栋的所有宿舍名观察数据,

803,1711,66666666636,429,211,345,1813,353,815,820,1403,1702,1306,439,1618,1514,355,355,1528,1528,401,1603,1111,1040,211,803,603,352,803,603,1015,1603,1040,1418,1426,102,1418,5~216,1731,351,901,703,1113,805,1104,1007,888888888888888888,833,1407,636,704,652,812,102,1714,843,944,412,209,1611,1809,1418,410,1701,1404,351,5-427,740,812,1408,1531,1530,1828,809,1808,809,234

从中可以看到有5-427、5~216这样的数据,Get。 说明一下问题,摘出3条数据803,1711和5-427分别记为x, y, z,有compare(x, y)<0, compare(y, z)<0,按照规则应该得出compare(x, z)<0,但是实际上compare(x, z)>0。

更好的解决

重新实现比较器方法: 二者都是数字,按数字大小排序 二者都不是数字,按字符串大小排序 一个是数字,一个不是数字,始终保持数字在前

总结

1、为什么说原来的实现并没有完全满足业务的需求? 原实现会排列出“2栋301,505,5-427,803,1711”这样的结果,实际需要的是“505,803,1711,2栋301,5-427”。

2、这个问题能不能避免? 能。原代码是有逻辑冲突的,想到就能避免。