性能数据迁移纪要

Published on
9

背景

性能数据平台的更新,后续将一直在新平台开展性能测试数据分析工作。当前的性能平台积累了大量的版本性能数据,如果不用起来实在可惜。但是和新平台的数据库表字段的不同,以及字段的缺失,需要在本次的迁移任务中进行一系列的数据清洗和补全。特此整理此文档,分享过程中的思路。

数据筛选

新的性能报告平台和现有的报告平台仔细对比还有一些不同,引入了一些新的功能点,比如:taskID模块机器等级选择去异常值分析最小值分析等等。

要想历史数据能在新平台上正常使用,就需要对缺失的字段进行弥补。现有数据缺失2020年10月份前的性能设备的信息,且无法查证。设备的差异性导致性能数据的结果偏差是很大的,因此不得不忍痛割爱,舍弃掉10月份之前的数据。

不过好在10月份后有记载性能设备信息的数据也不少,于是本次迁移决定只取在性能测试平台运行,且记录有设备信号的数据。于是写下了第一行sql:

SELECT * from performance_performancedata where product ="GQI" and host_name not in ("","null");

补全taskID

结合现有的数据分析,决定使用相同version, submodule,host_name的数据作为同一批次的任务数据,将三条数据拼接就可以得到唯一的批次,版本,设别对应关系。所以,初步的定义任务名称 taskName:

select version || submodule || host_name as taskName from performance_performancedata where product =  "GQI" and host_name in ("gtcautotest", "GQI-PERFORMANCE") order by taskName;

查询结果如下:

7.0.0.5732大量级测试重跑1GQI-PERFORMANCE
7.0.0.5732大量级测试重跑1GQI-PERFORMANCE
7.0.0.5774大量级测试GQI-PERFORMANCE
7.0.0.5774大量级测试GQI-PERFORMANCE
7.0.0.5774大量级测试GQI-PERFORMANCE
7.0.0.5774大量级测试重跑1GQI-PERFORMANCE
7.0.0.5774大量级测试重跑1GQI-PERFORMANCE
7.0.0.5774大量级测试重跑2GQI-PERFORMANCE
7.0.0.5774大量级测试重跑2GQI-PERFORMANCE
7.0.0.5800大量级测试gtcautotest
7.0.0.5800大量级测试gtcautotest
7.0.0.5800大量级测试gtcautotest
... ...

上述数据可见,在相同设备,相同版本号下,重复运行,就产生了不同任务批次。所以,我们只需要定义这种对应关系就能定义出任务号。

-- 创建临时表,建立taskName和taskID的对应关系
BEGIN TRANSACTION;
DROP TABLE
if EXISTS ptemp;
CREATE TEMPORARY TABLE ptemp(
    taskName VARCHAR,
    id integer PRIMARY KEY autoincrement
);
INSERT INTO ptemp(taskName) select distinct(a.taskName) from(select version ||
    submodule || host_name as taskName from performance_performancedata where product =
    "GQI"
    and host_name in ("gtcautotest", "GQI-PERFORMANCE") group BY taskName,
    version, submodule, host_name order by version) AS a;
DROP TABLE ptemp;
COMMIT;

新建表,再将taskID分配给,对应性能数据。于是,就有了完整的清洗脚本:

BEGIN TRANSACTION;
DROP TABLE
if EXISTS ptemp;
CREATE TEMPORARY TABLE ptemp(taskName VARCHAR, id integer PRIMARY KEY autoincrement);
DROP TABLE
if EXISTS res;
CREATE TABLE res(taskID integer, taskName VARCHAR, version VARCHAR,
    subfunction VARCHAR, operateTime integer, begin_mem integer,
    end_mem integer, high_mem integer, host_name VARCHAR, product VARCHAR
);
INSERT INTO ptemp(taskName)
SELECT distinct(a.taskName) from(SELECT version || submodule ||
    host_name AS taskName FROM performance_performancedata WHERE product =
    "GQI"
    AND host_name IN("gtcautotest", "GQI-PERFORMANCE") GROUP BY taskName,
    version, submodule, host_name ORDER BY version) AS a;
insert INTO resSELECT c.id AS taskID,
    b.taskName,
    b.version,
    b.subfunction,
    b.`time` AS operateTime,
	b.begin_mem,
    b.end_mem,
    b.high_mem,
    b.host_name,
    'GQI'
AS product
FROM ptemp AS c left join
    (SELECT version || submodule || host_name AS taskName,
        version,
        submodule,
        subfunction,
        `time`,
        begin_mem,
        end_mem,
        high_mem,
        host_name FROM performance_performancedata WHERE product =
        "GQI"
        AND host_name IN("gtcautotest", "GQI-PERFORMANCE") GROUP BY taskName,
        version, submodule, subfunction, `time`, begin_mem, end_mem,
        high_mem, host_name ORDER BY version) AS b
ON c.taskName = b.taskName;
DROP TABLE ptemp;
COMMIT;

清洗的数据结果:

1.png

补全moudle

清洗完成后还不是最中结果,测试点需要按照测试点进行划分,因此还需要根据原始脚本进行提取,然后再补全各个模块名称。

#读取rb文件,提取模块名和测试点之间的对应关系
def func(fileName):
	f = codecs.open(rf"{fileName}","r","utf-8")
	content = f.read()

	p = re.compile('GTF.get_bench_time\("(.*?)"\){', re.S)
	cases = re.findall(p , content) 
	return cases 



Sql = f"""SELECT taskID,
         taskName,
         version,
         subfunction,
         operateTime,
         begin_mem,
         end_mem,
         high_mem,
         host_name,
         '{someModule}' AS module,'GQI' AS product
		FROM res
		WHERE product ="GQI"
        AND subfunction IN({caseForSomeModule})
		"""
	

写入新的库即得到:
2.png

数据写入新的性能平台库

通过接口掉用,直接写入新的平台。这里为了避免重复设置了团队的起始taskID为-100000,新写入taskID = 起始taskID-历史数据taskID,这样就避免和新数据重复,也避免和其他团队重复。

for i in select():
		#安装团队的taskID初始值
		taskIDAZ = -100000
		
		moduleName = i[7]
		version = i[1]
		subfunction = i[2]
		startMemory = i[4]
		endMemory = i[6]
		peakMemory =i[5]
		timeCost=i[3]
		taskId = taskIDAZ - i[9]

		data = f"""{{
		    "productName": "GQI2021",
		    "moduleName": "{moduleName}",
		    "version": "{version}",
		    "computerName": "GQI-PERFORMANCE",
		    "taskId": {taskId},
		    "tester": "historyData",
		    "verificationPointsData": [
		        {{
		            "verificationPointsName": "{subfunction}",
		            "startMemory": {startMemory},
		            "endMemory": {endMemory},
		            "peakMemory": {peakMemory},
		            "timeCost": {timeCost}
		        }}
		    ],
		    "subfunction": "{subfunction}"
			}}"""
		
		insertApi(data)

至此,就完成了团队内的数据迁移任务。
13.png