Python学习笔记

2025-06-01

Python学习笔记

Python基础语法
- 行与缩进
- 多行语句
- 标识符与保留字
- 注释
- 输入与输出
- 运算符
- 三元操作符
- 多重赋值
- * 和 ** 运算符的特殊使用
- 独一无二的变量 id
Python基本数据类型
- 整型
  - 不同进制整数的表示
  - 数字分隔符
  - 整型常量池
- 浮点型
  - 基本操作
  - 浮点数转分数
  - 浮点数取整
  - 浮点数四舍五入
  - decimal 模块
- 布尔型
  - 布尔值的判断
  - 特殊值的布尔值
- 字符串
  - 字符串的定义与操作
  - 字符串索引与切片
  - 转义字符
  - 字符串前缀
  - 字符串拼接
  - 字符串基本函数
  - 字符串查找与替换
  - str.maketrans() 和 str.translate()
  - 字符串比较与驻留
  - 字符串编译时拼接
- 列表
  - 列表的创建
  - 列表推导式
  - 列表元素操作
  - 列表排序
  - 数值列表与统计函数
  - 切片
  - 列表拷贝
  - 列表反转
- 元组
  - 元组与列表的区别
  - 元组的创建与操作
  - 元组内置函数
- 字典
  - 初始化方式创建字典
  - 字典的操作
  - 字典与列表的嵌套
  - 字典排序
- 集合
  - 集合的特性与操作
- collections 模块
  - 命名元组 (namedtuple)
  - 计数器 (Counter)
  - 双向队列 (deque)
  - 有序字典 (OrderedDict)
  - 默认字典 (defaultdict)
Python函数
- 函数的参数
  - 必选参数
  - 默认参数
  - 可变参数
  - 关键字参数
  - 参数组合
- 函数的返回值
- Python的作用域
- Python的循环
- 迭代器
- 内置函数
- globals() 和 locals() 函数
- eval() 函数
- 定义函数与调用函数的顺序
- 正确定义与使用参数
模块
- pip 使用指南
- 模块的引入
- 自定义模块
- 内置模块
- reload() 函数
- 递归函数
- 匿名函数（lambda 函数）
- Python内置的高阶函数
面向对象
- 类的声明
- 类的定义
- 类的变量（类的属性）
- 类的方法
- 关于 self
- 绑定方法与非绑定方法
- 类的实例化
- @staticmethod 和 @classmethod
- 类的继承
- 父类与子类
- 继承
- super() 函数
- 对象的创建
- 面向对象相关的内建函数
爬虫
- 爬虫简介
- urllib 库的使用
  - urllib.request 模块
  - urlopen() 函数
  - HttpRequest 类的函数
  - urlretrieve() 函数
  - 请求对象定制
- HTTP与HTTPS
- urllib.parse 模块
- HTTP请求
- AJAX介绍
- Handler处理器
- 代理服务器
- 解析
- XPath
- HTML与XML
- lxml 库
- JSONPath
- Selenium库使用
- 验证码识别
- Requests库
- 会话保持
- session与cookie
- 实战案例分析

正文

1. Python基础语法

行与缩进

Python使用缩进来表示代码块，缩进的空格数必须相同。例如：

if True:
    print("true")
else:
    print("false")

多行语句

Python的代码通常一行一条语句，语句后可加分号;。一行写多条语句时用;隔开。长语句可用\换行，或在括号内自动换行。例如：

print("hello")
print("world");

def add_numbers(x, y):
    return x + y

标识符与保留字

标识符用于命名变量、函数等，需遵循特定规则。保留字是 Python 内置的关键字，不能用作标识符。例如：

1
2
3

num1 = 1
float1 = 0.5
true = True  # 合法，Python对大小写敏感

可使用 keyword 模块查看保留字：

1 2	`import keyword print(keyword.kwlist)`

注释

单行注释以 # 开头，多行注释可用多个 # 或一对 '''/"""。例如：

1 2	`# 单行注释 '''多行注释'''`

输入与输出

print() 函数用于输出，可输出多个内容，内容间自动加空格。% 操作符用于格式化输出。例如：

1 2	`print("hello", "world") # 输出：hello world print("%d + %d = %d" % (1, 2, 3)) # 输出：1 + 2 = 3`

input() 函数用于获取用户输入，返回值为字符串。可使用类型转换函数将其转为其他类型。例如：

1 2	`str1 = input("请输入：") num1 = int(str1)`

运算符

Python 提供多种运算符，包括算术、比较、赋值、逻辑、位、成员和身份运算符。例如：

# 算术运算符
result = 2 + 3 * 4  # 结果为 14
# 逻辑运算符
print(True and False)  # 输出 False

三元操作符

三元操作符的基本格式为 result = x if condition else y。例如：

1	`result = 5 if 3 > 2 else 2 # result 为 5`

多重赋值

可同时为多个变量赋值，也可交换变量值。例如：

1 2	`a, b = 1, 2 a, b = b, a # a=2, b=1`

`*` 和 `**` 运算符的特殊使用

* 用于乘法和重复序列元素。** 用于幂运算和将字典元素作为关键字参数传递给函数。例如：

print(2 * 3)  # 输出 6
print('a' * 3)  # 输出 'aaa'
def f(**kwargs):
    print(kwargs)
f(x=1, y=2)  # 输出 {'x': 1, 'y': 2}

独一无二的变量 `id`

每个变量都有一个独一无二的 id，表示变量在内存中的地址。例如：

1
2
3

a = 1
b = 2
print(id(a), id(b))  # 输出两个不同的 id

2. Python基本数据类型

整型

Python 的整型可以表示正数、负数和零，支持多种进制表示。例如：

1 2	`hex1 = 0x45 # 十六进制 bin1 = 0b101 # 二进制`

可使用 bin(), oct(), hex() 函数进行进制转换。

浮点型

浮点型用于表示小数或科学计数法表示的数。可使用 float() 将字符串转为浮点数。例如：

1	`x = float("3.14") # x 为 3.14`

布尔型

布尔型只有两个值：True 和 False。例如：

1	`print(True and False) # 输出 False`

字符串

字符串是字符序列，支持索引和切片操作。例如：

1
2
3

s = "hello world"
print(s[0])  # 输出 'h'
print(s[6:11])  # 输出 'world'

字符串支持多种操作，如拼接、格式化、查找等。例如：

1
2
3

str1 = "hello"
str2 = "world"
result = str1 + " " + str2  # 结果为 "hello world"

列表

列表是有序的元素集合，支持多种操作。例如：

1
2
3

lst = [1, 2, 3]
lst.append(4)  # 添加元素
lst.remove(2)  # 删除元素

元组

元组与列表类似，但不可变。例如：

1 2	`tup = (1, 2, 3) print(tup[0]) # 输出 1`

字典

字典是键值对的集合，支持多种操作。例如：

1 2	`dict1 = {"name": "Alice", "age": 25} print(dict1["name"]) # 输出 "Alice"`

集合

集合是无序的元素集合，支持多种操作。例如：

1
2
3

set1 = {1, 2, 3}
set2 = {3, 4, 5}
print(set1.union(set2))  # 输出 {1, 2, 3, 4, 5}

`collections` 模块

collections 模块提供了多种数据结构，如 namedtuple、Counter、deque、OrderedDict、defaultdict 等。例如：

from collections import namedtuple
Point = namedtuple("Point", ["x", "y"])
p = Point(x=1, y=2)
print(p.x, p.y)  # 输出 1 2

3. Python函数

函数的参数

函数可以有多种参数类型，包括必选参数、默认参数、可变参数和关键字参数。例如：

1
2
3

def func(a, b=2, *args, **kwargs):
    print(a, b, args, kwargs)
func(1, 3, 4, 5, x=6, y=7)  # 输出 1 3 (4, 5) {'x':6, 'y':7}

函数的返回值

函数可以返回值或函数。例如：

def add(a, b):
    return a + b
def get_func():
    def inner():
        print("inner function")
    return inner

Python的作用域

Python 的作用域规则决定了变量的可见性。例如：

x = 1
def func():
    global x
    x = 2
func()
print(x)  # 输出 2

Python的循环

for 循环和 while 循环可用于迭代序列或执行重复操作。例如：

1 2	`for i in range(5): print(i)`

迭代器

迭代器用于遍历序列元素。例如：

1
2
3

lst = [1, 2, 3]
iter_lst = iter(lst)
print(next(iter_lst))  # 输出 1

内置函数

Python 提供了许多内置函数，如 abs(), divmod(), max(), min(), sum(), round() 等。例如：

1	`print(abs(-5)) # 输出 5`

`globals()` 和 `locals()` 函数

globals() 返回当前模块的全局变量字典，locals() 返回当前函数的局部变量字典。例如：

Python

复制

x = 10
def func():
    y = 20
    print(locals())  # 输出 {'y':20}
print(globals())  # 输出包含 x 的字典

`eval()` 函数

eval() 函数用于动态执行 Python 表达式。例如：

1	`result = eval("2 + 3") # result 为 5`

定义函数与调用函数的顺序

函数必须先定义后调用。例如：

1
2
3

def func():
    print("hello")
func()

正确定义与使用参数

调用函数时，参数数量和类型必须匹配。例如：

def func(a, b):
    print(a, b)
func(1, 2)  # 正确
func(1)  # 报错

4. 模块

`pip` 使用指南

pip 是 Python 的包管理工具。常用命令包括：

1
2
3

pip install 库名  # 安装库
pip uninstall 库名  # 卸载库
pip freeze  # 列出已安装库及其版本

模块的引入

使用 import 关键字引入模块。例如：

1 2	`import math print(math.sqrt(4)) # 输出 2.0`

自定义模块

创建 .py 文件即可定义模块。例如：

1
2
3

# mymodule.py
def greet(name):
    print("Hello, " + name)

在其他文件中引入：

1 2	`from mymodule import greet greet("Alice")`

内置模块

Python 提供了许多内置模块，如 os, sys, json, logging, time, datetime, hashlib, random 等。例如：

1 2	`import os print(os.getcwd()) # 输出当前工作目录`

`reload()` 函数

reload() 函数用于重新加载模块。例如：

1 2	`import importlib importlib.reload(mymodule)`

递归函数

递归函数是调用自身的函数。例如，汉诺塔问题的递归解决方案：

def hanoi(n, x, y, z):
    if n == 1:
        print(x, "-->", z)
    else:
        hanoi(n-1, x, z, y)
        print(x, "-->", z)
        hanoi(n-1, y, x, z)
hanoi(3, 'A', 'B', 'C')

匿名函数（`lambda` 函数）

lambda 函数是匿名函数，用于定义简单函数。例如：

1 2	`func = lambda x: x + 1 print(func(2)) # 输出 3`

Python内置的高阶函数

高阶函数如 map(), filter(), reduce() 等。例如：

1 2	`list1 = [1, 2, 3, 4] squared = list(map(lambda x: x**2, list1)) # 输出 [1,4,9,16]`

5. 面向对象

类的声明

使用 class 关键字声明类。例如：

1
2
3

class MyClass:
    """类文档字符串"""
    pass

类的定义

类中可定义变量和方法。例如：

class Book:
    def __init__(self, name, author):
        self.name = name
        self.author = author
    def display(self):
        print(self.name, self.author)
book = Book("Python", "Guido")
book.display()

类的变量（类的属性）

类变量是所有实例共享的变量，实例变量是每个实例独有的变量。例如：

class MyClass:
    class_var = 100
    def __init__(self, instance_var):
        self.instance_var = instance_var

类的方法

类的方法是定义在类中的函数。例如：

1
2
3

class MyClass:
    def my_method(self):
        print("方法调用")

关于 `self`

self 是实例对象的引用，用于访问实例变量和方法。例如：

1
2
3

class MyClass:
    def __init__(self, value):
        self.value = value

绑定方法与非绑定方法

绑定方法是与实例绑定的方法，非绑定方法是与类绑定的方法。例如：

class MyClass:
    def instance_method(self):
        print("实例方法")
    @classmethod
    def class_method(cls):
        print("类方法")
    @staticmethod
    def static_method():
        print("静态方法")

类的实例化

使用类名加括号创建实例。例如：

1	`obj = MyClass()`

`@staticmethod` 和 `@classmethod`

@staticmethod 定义静态方法，@classmethod 定义类方法。例如：

class MyClass:
    @staticmethod
    def static_method():
        print("静态方法")
    @classmethod
    def class_method(cls):
        print("类方法")

类的继承

子类继承父类的属性和方法。例如：

class Parent:
    def parent_method(self):
        print("父类方法")
class Child(Parent):
    def child_method(self):
        print("子类方法")
child = Child()
child.parent_method()
child.child_method()

父类与子类

父类是基类，子类是派生类。子类可重写父类的方法。例如：

class Parent:
    def my_method(self):
        print("父类方法")
class Child(Parent):
    def my_method(self):
        print("子类方法")
child = Child()
child.my_method()  # 输出 "子类方法"

继承

Python 支持多继承。例如：

class A:
    pass
class B:
    pass
class C(A, B):
    pass

`super()` 函数

super() 函数用于调用父类的方法。例如：

class Parent:
    def my_method(self):
        print("父类方法")
class Child(Parent):
    def my_method(self):
        super().my_method()
        print("子类方法")
child = Child()
child.my_method()

对象的创建

对象创建时，__new__() 方法用于创建对象，__init__() 方法用于初始化对象。例如：

class MyClass:
    def __new__(cls):
        print("创建对象")
        return super().__new__(cls)
    def __init__(self):
        print("初始化对象")
obj = MyClass()

面向对象相关的内建函数

内建函数如 issubclass(), isinstance(), hasattr(), getattr(), setattr(), delattr(), dir(), super(), vars() 等。例如：

1	`print(issubclass(Child, Parent)) # 输出 True`

6. 爬虫

爬虫简介

爬虫是一种自动抓取网页信息的程序。常见的爬虫类型包括通用爬虫、聚焦爬虫、增量式爬虫和深层爬虫。

`urllib` 库的使用

urllib 是 Python 的一个网络爬虫库，包含多个模块。

`urllib.request` 模块

urllib.request 模块用于打开和读取 URL。例如：

import urllib.request
response = urllib.request.urlopen("https://www.python.org/")
html = response.read().decode("utf-8")
print(html)

`urlopen()` 函数

urlopen() 函数用于打开 URL。例如：

1	`response = urllib.request.urlopen("https://www.python.org/")`

`HttpRequest` 类的函数

HttpRequest 类用于创建请求对象。例如：

1
2
3

from urllib import request
req = request.Request("https://www.python.org/")
response = request.urlopen(req)

`urlretrieve()` 函数

urlretrieve() 函数用于下载文件。例如：

1	`urllib.request.urlretrieve("https://example.com/image.jpg", "image.jpg")`

请求对象定制

可定制请求头以模拟浏览器访问。例如：

headers = {
    "User-Agent": "Mozilla/5.0"
}
req = request.Request("https://www.python.org/", headers=headers)

HTTP与HTTPS

HTTP 是明文传输协议，HTTPS 是加密传输协议。HTTPS 更安全。

`urllib.parse` 模块

urllib.parse 模块用于解析 URL。例如：

1
2
3

from urllib.parse import urlparse
result = urlparse("https://www.python.org:8080/path/to/page?name=python&age=30#fragment")
print(result)

HTTP请求

HTTP 请求包含请求方法、请求网址、请求头和请求体。常见的请求方法有 GET 和 POST。

GET请求与POST请求

GET 请求用于获取数据，POST 请求用于提交数据。例如：

# GET 请求
response = urllib.request.urlopen("https://www.python.org/s?wd=python")

# POST 请求
import urllib.parse
data = urllib.parse.urlencode({"wd": "python"}).encode("utf-8")
response = urllib.request.urlopen("https://www.python.org/s", data=data)

AJAX介绍

AJAX 是一种创建交互式网页应用的技术，可在不刷新页面的情况下与服务器交换数据。

Handler处理器

Handler 处理器用于处理 URL 请求。例如，使用代理处理器：

1
2
3

proxy_handler = urllib.request.ProxyHandler({"http": "http://proxy.example.com:8080"})
opener = urllib.request.build_opener(proxy_handler)
response = opener.open("http://www.example.com/")

代理服务器

代理服务器用于转发网络请求，可隐藏客户端 IP、突破访问限制等。

解析

解析网页内容可使用 XPath 或正则表达式等技术。

XPath

XPath 用于在 XML 或 HTML 文档中查找信息。例如：

1
2
3

from lxml import etree
html = etree.HTML(page_source)
titles = html.xpath("//title/text()")

HTML与XML

HTML 用于创建网页，XML 用于存储和传输数据。

`lxml` 库

lxml 库用于解析 HTML 和 XML 文档。例如：

1 2	`from lxml import etree html = etree.HTML(page_source)`

JSONPath

JSONPath 用于提取 JSON 数据。例如：

import jsonpath
data = {"store": {"book": [{"title": "Book1"}, {"title": "Book2"}]}}
titles = jsonpath.jsonpath(data, "$.store.book[*].title")
print(titles)  # 输出 ["Book1", "Book2"]

Selenium库使用

Selenium 用于自动化测试和爬虫，可模拟浏览器操作。

浏览器驱动程序

需下载对应浏览器的驱动程序。例如，Chrome 浏览器需下载 chromedriver。

安装 Selenium

1	`pip install selenium`

编写测试代码

1
2
3

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.python.org/")

Headless模式

Headless 模式用于无头浏览器操作。例如：

from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)

元素定位的方法

Selenium 提供多种元素定位方法，如通过 ID、名称、类名、标签名、链接文本、部分链接文本、XPath、CSS 选择器等。例如：

1	`element = driver.find_element_by_id("element_id")`

验证码识别

验证码识别可使用打码平台或 OCR 库。

Python的OCR库

pytesseract 和 ddddocr 是常用的 OCR 库。例如，使用 pytesseract：

from PIL import Image
import pytesseract
image = Image.open("captcha.png")
code = pytesseract.image_to_string(image)
print(code)

Requests库

Requests 库是 Python 的一个 HTTP 客户端库，用于发送 HTTP 请求。

与 `urllib` 库的区别

Requests 库的 API 更简单易用，支持连接池、会话保持等功能。

会话保持

使用 Session 对象保持会话。例如：

1
2
3

session = requests.Session()
response1 = session.get("https://www.example.com/page1")
response2 = session.get("https://www.example.com/page2")

session与cookie

Session 对象可自动管理 cookie，保持会话状态。

实战案例分析

例如，古诗文网登录爬虫：

import requests
from lxml import etree
from PIL import Image
import pytesseract

# 获取登录页面源码
url = "https://so.gushiwen.cn/user/login.aspx?from=http://so.gushiwen.cn/user/collect.aspx"
headers = {
    "User-Agent": "Mozilla/5.0"
}
response = requests.get(url, headers=headers)
content = response.text

# 解析页面源码
html_tree = etree.HTML(content)
VIEWSTATE = html_tree.xpath('//*[@id="__VIEWSTATE"]/@value')[0]
VIEWSTATEGENERATOR = html_tree.xpath('//*[@id="__VIEWSTATEGENERATOR"]/@value')[0]
code_url = html_tree.xpath('//*[@id="imgCode"]/@src')[0]
code_url = "https://so.gushiwen.cn" + code_url

# 下载验证码
code_response = requests.get(code_url, headers=headers)
with open("captcha.png", "wb") as f:
    f.write(code_response.content)
image = Image.open("captcha.png")
code = pytesseract.image_to_string(image)

# 登录
login_url = "https://so.gushiwen.cn/user/login.aspx?from=http%3a%2f%2fso.gushiwen.cn%2fuser%2fcollect.aspx"
data = {
    "__VIEWSTATE": VIEWSTATE,
    "__VIEWSTATEGENERATOR": VIEWSTATEGENERATOR,
    "from": "http://so.gushiwen.cn/user/collect.aspx",
    "email": "your_email",
    "pwd": "your_password",
    "code": code,
    "denglu": "登录"
}
response = requests.post(login_url, headers=headers, data=data)
print(response.text)

Python学习笔记

Python学习笔记

目录

正文

1. Python基础语法

行与缩进

多行语句

标识符与保留字

注释

输入与输出

运算符

三元操作符

多重赋值

* 和 ** 运算符的特殊使用

独一无二的变量 id

2. Python基本数据类型

整型

浮点型

布尔型

字符串

列表

元组

字典

集合

collections 模块

3. Python函数

函数的参数

函数的返回值

Python的作用域

Python的循环

迭代器

内置函数

globals() 和 locals() 函数

eval() 函数

定义函数与调用函数的顺序

正确定义与使用参数

4. 模块

pip 使用指南

模块的引入

自定义模块

内置模块

reload() 函数

递归函数

匿名函数（lambda 函数）

Python内置的高阶函数

5. 面向对象

类的声明

类的定义

类的变量（类的属性）

类的方法

关于 self

绑定方法与非绑定方法

类的实例化

@staticmethod 和 @classmethod

类的继承

父类与子类

继承

super() 函数

对象的创建

面向对象相关的内建函数

6. 爬虫

爬虫简介

urllib 库的使用

urllib.request 模块

urlopen() 函数

HttpRequest 类的函数

urlretrieve() 函数

请求对象定制

HTTP与HTTPS

urllib.parse 模块

HTTP请求

GET请求与POST请求

AJAX介绍

Handler处理器

代理服务器

解析

XPath

HTML与XML

lxml 库

JSONPath

`*` 和 `**` 运算符的特殊使用

独一无二的变量 `id`

`collections` 模块

`globals()` 和 `locals()` 函数

`eval()` 函数

`pip` 使用指南

`reload()` 函数

匿名函数（`lambda` 函数）

关于 `self`

`@staticmethod` 和 `@classmethod`

`super()` 函数

`urllib` 库的使用

`urllib.request` 模块

`urlopen()` 函数

`HttpRequest` 类的函数

`urlretrieve()` 函数

`urllib.parse` 模块

`lxml` 库

与 `urllib` 库的区别