Python_机器学习_李弘毅老师作业1
2021/6/7 12:23:18
本文主要是介绍Python_机器学习_李弘毅老师作业1,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
# By Richard import sys import pandas as pd import numpy as np import math """ sys:该模块提供对解释器使用或维护的一些变量的访问,以及与解释器强烈交互的函数 pandas:一个强大的分析结构化数据的工具集 numpy: Python的一个扩展程序库,支持大量的维度数组与矩阵运算 math:数学运算的库 作业一:由每天前9个小时的18个空气的影响因素(如:NO,CO,SO2,PM2.5等等)来预测第10个小时的PM2.5,train.csv是一年的数据,每个月取了20天,每天24小时 """ data = pd.read_csv(r"G:\课程学习\机器学习\Mr_Li_ML\HomeWorks\数据\hw1\train.csv", encoding="big5") # big5为了兼容台湾的编码集 data_train = data.iloc[:, 3:] # iloc:通过行/列号选取数据 数据从第二行开始选择 data_train[data_train == "NR"] = 0 # print(data_train) data_train_row = data_train.to_numpy() # DataFrame 转换成numpy # print(data_train_row) #4320×24 # print(type(data_train_row)) # 将4320*24调整为12*18*480 month_data = {} for month in range(12): sample = np.zeros([18, 480]) for day in range(20): sample[:, 24 * day:(day + 1) * 24] = data_train_row[18 * (20 * month + day): 18 * (20 * month + day + 1), :] month_data[month] = sample # print(month_data) # 构建以将20天的480个小时看成连续的,所以一个月就有480-9=471个data,一年有471×12=5652个data,同样有5652个Label(第10个小时的PM2.5),采用这种方法可以构建较多的data x = np.empty([12 * 471, 18 * 9], dtype=float) y = np.empty([12 * 471, 1], dtype=float) for month in range(12): for day in range(20): for hour in range(24): if day == 19 and hour > 14: # 取消最后一天的往后延迟 continue x[471 * month + day * 24 + hour, :] = month_data[month][:, day * 24 + hour:day * 24 + hour + 9].reshape(1, -1) # 转换成一行 y[471 * month + day * 24 + hour, 0] = month_data[month][9, day * 24 + hour + 9] # print(x) # print(y) # 归一化 x_mean = np.mean(x, axis=0) x_std = np.std(x, axis=0) for i in range(x.shape[0]): for j in range(x.shape[1]): if x_std[j] != 0: x[i][j] = (x[i][j] - x_mean[j]) / x_std[j] # print(x) # 将训练数据分成训练集-验证集, x_train_set = x[:math.floor(len(x) * 0.8), :] #4521 y_train_set = y[:math.floor(len(x) * 0.8), :] x_validation = x[math.floor(len(x) * 0.8):, :] y_validation = y[math.floor(len(x) * 0.8):, :] # Training is readying # 设置维度,存在偏差 bias ax+b dim = 18 * 9 + 1 w = np.zeros([dim, 1]) # 设置x_train_set 维度为4521*163 # 连接两个矩阵 列连接 x_train_set = np.concatenate((np.ones([len(x_train_set), 1]), x_train_set), axis=1).astype(float) # set learing rate learning_rate = 1 # ser iter time iter_time = 30000 # set RMSprop parameters adagrad = np.zeros([dim, 1]) eps = 0.00000001 # beta = 0.9 # iter is runing for t in range(iter_time): loss = np.sqrt(np.sum(np.power(np.dot(x_train_set,w)-y_train_set,2))/len(x_train_set)) # dot函数为numpy库下的一个函数,主要用于矩阵的乘法运算,其中包括:向量内积、多维矩阵乘法和矩阵与向量的乘法。 if (t % 100 == 0): #显示第一百次的结果 print("迭代次数: %i, 损失值: %i" % (t, loss)) # calculate gradient transpose换了轴 gradient = (np.dot(x_train_set.transpose(), np.dot(x_train_set, w) - y_train_set)) / ( loss * len(x_train_set)) adagrad += (gradient ** 2) # reset parameters w = w - learning_rate * gradient / np.sqrt(adagrad + eps) # save parameters np.save("weight.npy",w) # 验证 x_validation = np.concatenate((np.ones([len(x_validation),1]),x_validation),axis=1).astype(float) for m in range(len(x_validation)): Loss = np.sqrt(np.sum(np.power(np.dot(x_validation,w)-y_validation,2))/len(x_validation)) print("The Loss on val data is %f"%(Loss))
# Author:Richard import sys import pandas as pd import numpy as np import math testdata = pd.read_csv(r"G:\课程学习\机器学习\Mr_Li_ML\HomeWorks\数据\hw1\test.csv", header=None, encoding="big5") test_data = testdata.iloc[:, 2:] test_data[test_data == "NR"] = 0 test_data = test_data.to_numpy() test_x = np.empty([240, 18 * 9], dtype=float) mean_x = np.mean(test_x, axis=0) std_x = np.std(test_x, axis=0) for i in range(240): test_x[i, :] = test_data[18 * i:18 * (i + 1), :].reshape(1, -1) for i in range(len(test_x)): for j in range(len(test_x[0])): if std_x[j] != 0: test_x[i][j] = (test_x[i][j] - mean_x[j]) / std_x[j] test_x = np.concatenate((np.ones([240, 1]), test_x), axis=1).astype(float) #print(test_x) # Prediction w = np.load("weight.npy") #print(w) ans_y = np.dot(test_x, w) print("----预测PM2.5的值----") print(ans_y)
这篇关于Python_机器学习_李弘毅老师作业1的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-09-27使用python 将ETH账户的资产打散
- 2024-09-26Python编程基础
- 2024-09-2610 种方法写出更好的 Python 代码
- 2024-09-25Python编程基础详解
- 2024-09-25Python编程入门教程
- 2024-09-25从零开始使用Python构建LLaMA 3
- 2024-09-23Python中理解和使用树形结构的简单教程
- 2024-09-23Python 编程基础入门
- 2024-09-18初探Python股票自动化交易:入门指南
- 2024-09-18Python量化入门:轻松掌握量化分析基础与实战