酷应用

打造专属BGM，Python 深度学习教你

百家作者：AI100 2021-04-08 21:30:51

作者 | 李秋键

头图?|?下载于视觉中国

出品 | AI科技大本营（ID:rgznai100）

音乐+文字，组合食用，效果更佳。

引言：

“那些听不到音乐的人，以为跳舞的人疯了。” 尼采这句话好有趣，也告诉我们音乐对于日常生活的不可或缺之处。但是对于一般人来说，想要精通各种乐器难度较高。故今天我们来实践一个普通人可以制作的音乐项目，用深度学习的方法让计算机自动生成自己需要的音乐。完整代码见文末。

其中生成的效果如下可见：

模型建立

1.1 环境要求

本次环境使用的是python3.6.5+windows平台，主要用的库有：

Argparse库是python自带的命令行参数解析包，可以用来方便地读取命令行参数；

glob获取本地文件，在这里用来快速获取训练数据集；

Pickle用在机器学习中，可以把训练好的模型存储起来，这样在进行决策时直接将模型读出，而不需要重新训练模型，这样就大大节约了时间。它可以序列化对象并保存到磁盘中，并在需要的时候读取出来，任何对象都可以执行序列化操作。

Keras库是一个高层神经网络API，Keras由纯Python编写而成并基Tensorflow、Theano以及CNTK后端。Keras的核心数据结构是“模型”，模型是一种组织网络层的方式。Keras中主要的模型是Sequential模型，Sequential是一系列网络层按顺序构成的栈。在这里我们用它来建立BLSTM模型

1.2 数据集处理

本项目使用了音乐文件是midi文件，因为它们易于解析和学习使用midi文件给我们带来了很多好处，因为我们可以轻松地检测到音符的音高和持续时间。在本次项目中，时间步长和序列长度是网络的两个重要因素。时间步长决定了我们分析和产生每个音符的时间，而序列长度决定了我们如何学习歌曲中的模式。设定0.25秒的时间步长和每个时间步长8个音符。这对应于4/4的拍号，对我们来说意味着8个不同的序列，共4个音符。通过学习这些序列并重复它们，我们可以生成听起来像实际音乐的模式，并以此为基础进行构建。

音乐的重要组成部分是可变长度音符和休止符的动态和创造性使用。比如先是发出的长长的音符，然后是平静的停顿，可以在听我们听到演奏者的心灵倾泻而出的声音时，向听众发出一波情感。为了捕捉到这一点，引入长音符，短音符和休止符的方法，以便我们可以在整首歌曲中产生不同的情感。

（1）获取训练集所有的音符和和弦

notes?=?[]
for?file?in?self.songs:
????print("Parsing?%s"?%?file)
????try:
????????midi?=?converter.parse(file)
????except?IndexError?as?e:
????????print(f"Could?not?parse?{file}")
????????print(e)
????????continue
????notes_to_parse?=?None
????try:??
????????s2?=?instrument.partitionByInstrument(midi)
????????notes_to_parse?=?s2.parts[0].recurse()
????except:?
????????notes_to_parse?=?midi.flat.notes
????prev_offset?=?0.0
????for?element?in?notes_to_parse:
????????if?isinstance(element,?note.Note)?or?isinstance(element,?chord.Chord):
????????????duration?=?element.duration.quarterLength
????????????if?isinstance(element,?note.Note):
????????????????name?=?element.pitch
????????????elif?isinstance(element,?chord.Chord):
????????????????name?=?".".join(str(n)?for?n?in?element.normalOrder)
????????????notes.append(f"{name}${duration}")
???????????rest_notes?=?int((element.offset?-?prev_offset)?/?TIMESTEP?-?1)
????????????for?_?in?range(0,?rest_notes):
????????????????notes.append("NULL")
????????prev_offset?=?element.offset
with?open("notes/"?+?self.model_name,?"wb")?as?filepath:
????pickle.dump(notes,?filepath)

1.3 神经网络处理成序列

为了建立BLSTM网络，需要将数据处理成序列形式。

def?prepare_sequences(self,?notes,?n_vocab):
????#?获取所有pitch?名称
????pitchnames?=?sorted(set(item?for?item?in?notes))
????#?创建一个字典来映射音高到整数
????note_to_int?=?dict((note,?number?+?1)?for?number,?note?in?enumerate(pitchnames))
????note_to_int["NULL"]?=?0
????network_input?=?[]
????network_output?=?[]
????for?i?in?range(0,?len(notes)?-?SEQUENCE_LEN,?1):
????????sequence_in?=?notes[i?:?i?+?SEQUENCE_LEN]
????????sequence_out?=?notes[i?+?SEQUENCE_LEN]
????????network_input.append([note_to_int[char]?for?char?in?sequence_in])
????????network_output.append(note_to_int[sequence_out])
????n_patterns?=?len(network_input)
????network_input?=?numpy.reshape(network_input,?(n_patterns,?SEQUENCE_LEN,?1))
????network_input?=?network_input?/?float(n_vocab)
????print(network_output)
????network_output?=?np_utils.to_categorical(network_output)
????return?(network_input,?network_output)

1.4 模型网络建立

通过在歌曲中某个特定位置建立之前和之后的音符，可以生成听起来与人类相似的旋律。通常，在听音乐时，之前发生的事情可以帮助听众预测接下来的音节。很多时候我一直在听一首歌，我可以随着特定的节奏跳动，因为我可以预测接下来会发生什么。这恰恰是在增加一首歌曲时发生的情况。比如这首歌变得越来越强烈，这使听众在预期落下时会产生紧张感，并在最终击打时产生那种放松和兴奋的时刻。通过利用这一点，我们能够产生听起来自然的节奏，并产生出我们已经习惯于现代音乐中期望的相同情感。

对于BLSTM层中的节点数，我们选择512。对于激活函数，我们选择softmax。对于损失函数，我们选择交叉熵，因为它们可以很好地解决诸如音符预测之类的多类分类问题。最后，我们选择RMSprop作为优化程序，这是Keras为RNN推荐的。

def?train(self,?network_input,?network_output):
????????"""?train?the?neural?network?"""
????????filepath?=?(
????????????self.model_name?+?"-weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"
????????)
????????checkpoint?=?ModelCheckpoint(
????????????filepath,?monitor="loss",?verbose=0,?save_best_only=True,?mode="min"
????????)
????????callbacks_list?=?[checkpoint]
????????self.model.fit(
????????????network_input,
????????????network_output,
????????????epochs=self.epochs,
????????????batch_size=64,
????????????callbacks=callbacks_list,
????????)
def?create_network(network_input,?n_vocab):
????print("Input?shape?",?network_input.shape)
????print("Output?shape?",?n_vocab)
????"""?create?the?structure?of?the?neural?network?"""
????model?=?Sequential()
????model.add(
????????Bidirectional(
????????????LSTM(512,?return_sequences=True),
????????????input_shape=(network_input.shape[1],?network_input.shape[2]),
????????)
????)
????model.add(Dropout(0.3))
????model.add(Bidirectional(LSTM(512)))
????model.add(Dense(n_vocab))
????model.add(Activation("softmax"))
????model.compile(loss="categorical_crossentropy",?optimizer="rmsprop")
????return?model

音乐生成

创作音乐最重要的部分之一就是结构。我们设定结构形式如下，我们从随机音符中生成了第一节音律，然后根据第一条音律生成了第二节音律。实际上，这将生成一个两倍的长度并将其分成两半的部分。这里的思考过程是，如果我们创作一首音乐，那么第二首音乐仍应符合相同的氛围，并且通过将第一首音乐作为参考，我们可以实现这一目标。

（1）根据音符序列从神经网络中生成音符

def?get_start():
????#?pick?a?random?sequence?from?the?input?as?a?starting?point?for?the?prediction
????start?=?numpy.random.randint(0,?len(network_input)?-?1)
????pattern?=?network_input[start]
????prediction_output?=?[]
????return?pattern,?prediction_output
#?generate?verse?1
verse1_pattern,?verse1_prediction_output?=?get_start()
for?note_index?in?range(4?*?SEQUENCE_LEN):
????prediction_input?=?numpy.reshape(
????????verse1_pattern,?(1,?len(verse1_pattern),?1)
????)
????prediction_input?=?prediction_input?/?float(n_vocab)
????prediction?=?model.predict(prediction_input,?verbose=0)
????index?=?numpy.argmax(prediction)
????print("index",?index)
????result?=?int_to_note[index]
????verse1_prediction_output.append(result)
????verse1_pattern.append(index)
????verse1_pattern?=?verse1_pattern[1?:?len(verse1_pattern)]
#?generate?verse?2
verse2_pattern?=?verse1_pattern
verse2_prediction_output?=?[]
for?note_index?in?range(4?*?SEQUENCE_LEN):
????prediction_input?=?numpy.reshape(
????????verse2_pattern,?(1,?len(verse2_pattern),?1)
????)
????prediction_input?=?prediction_input?/?float(n_vocab)
????prediction?=?model.predict(prediction_input,?verbose=0)
????index?=?numpy.argmax(prediction)
????print("index",?index)
????result?=?int_to_note[index]
????verse2_prediction_output.append(result)
????verse2_pattern.append(index)
????verse2_pattern?=?verse2_pattern[1?:?len(verse2_pattern)]
#?generate?chorus
chorus_pattern,?chorus_prediction_output?=?get_start()
for?note_index?in?range(4?*?SEQUENCE_LEN):
????prediction_input?=?numpy.reshape(
????????chorus_pattern,?(1,?len(chorus_pattern),?1)
????)
????prediction_input?=?prediction_input?/?float(n_vocab)
????prediction?=?model.predict(prediction_input,?verbose=0)
????index?=?numpy.argmax(prediction)
????print("index",?index)
????result?=?int_to_note[index]
????chorus_prediction_output.append(result)
????chorus_pattern.append(index)
????chorus_pattern?=?chorus_pattern[1?:?len(chorus_pattern)]
#?generate?bridge
bridge_pattern,?bridge_prediction_output?=?get_start()
for?note_index?in?range(4?*?SEQUENCE_LEN):
????prediction_input?=?numpy.reshape(
????????bridge_pattern,?(1,?len(bridge_pattern),?1)
????)
????prediction_input?=?prediction_input?/?float(n_vocab)
????prediction?=?model.predict(prediction_input,?verbose=0)
????index?=?numpy.argmax(prediction)
????print("index",?index)
????result?=?int_to_note[index]
????bridge_prediction_output.append(result)
????bridge_pattern.append(index)
????bridge_pattern?=?bridge_pattern[1?:?len(bridge_pattern)]
return?(
????verse1_prediction_output
????+?chorus_prediction_output
????+?verse2_prediction_output
????+?chorus_prediction_output
????+?bridge_prediction_output
????+?chorus_prediction_output
)

（2）将预测输出转换为notes，并从notes创建midi文件。根据模型生成的值创建note和chord对象。

for?pattern?in?prediction_output:
????if?"$"?in?pattern:
????????pattern,?dur?=?pattern.split("$")
????????if?"/"?in?dur:
????????????a,?b?=?dur.split("/")
????????????dur?=?float(a)?/?float(b)
????????else:
????????????dur?=?float(dur)
????#?pattern?is?a?chord
????if?("."?in?pattern)?or?pattern.isdigit():
????????notes_in_chord?=?pattern.split(".")
????????notes?=?[]
????????for?current_note?in?notes_in_chord:
????????????new_note?=?note.Note(int(current_note))
????????????new_note.storedInstrument?=?instrument.Piano()
????????????notes.append(new_note)
????????new_chord?=?chord.Chord(notes)
????????new_chord.offset?=?offset
????????new_chord.duration?=?duration.Duration(dur)
????????output_notes.append(new_chord)
????#?pattern?is?a?rest
????elif?pattern?is?"NULL":
????????offset?+=?TIMESTEP
????#?pattern?is?a?note
????else:
????????new_note?=?note.Note(pattern)
????????new_note.offset?=?offset
????????new_note.storedInstrument?=?instrument.Piano()
????????new_note.duration?=?duration.Duration(dur)
????????output_notes.append(new_note)
????#?增加每次迭代的偏移量，这样笔记就不会堆积
????offset?+=?TIMESTEP
midi_stream?=?stream.Stream(output_notes)
output_file?=?os.path.basename(self.weights)?+?".mid"
print("output?to?"?+?output_file)
midi_stream.write("midi",?fp=output_file)

源码

完整代码下载链接：

https://pan.baidu.com/s/1uPflHi1u6Vl_J_L7Q_JFaA

提取码：8n1p

作者简介：李秋键，CSDN博客专家，CSDN达人课作者。硕士在读于中国矿业大学，开发有taptap竞赛获奖等。

2020-2021中国开发者调查报告重磅来袭，直接扫码或微信搜索「CSDN」公众号，后台回复关键词「开发者」，快速获取完整的报告内容！

更多精彩推荐

?市值达 58 亿美元，吴恩达的在线教育平台 Coursera 正式上市
?英特尔第三代 Ice Lake 发布正面与 AMD EPYC PK，结果令人大跌眼镜！
?AR 第一大单，微软 219 亿美元为美军打造高科技头盔


点分享
点收藏
点点赞
点在看

关注公众号：拾黑（shiheibook）了解更多

[广告]赞助链接：

*文章为作者独立观点，不代表爱尖刀立场

本文由 AI100发表，转载此文章须经作者同意，并请附上出处( 爱尖刀 )及本页链接。

原文链接 https://www.ijiandao.com/2b/baijia/407368.html

BGM Python python

图库

AI100

关注网络尖刀微信公众号
随时掌握互联网精彩

赞助链接

百度热搜榜

排名热点搜索指数

打造专属BGM，​Python 深度学习教你

?市值达 58 亿美元，吴恩达的在线教育平台 Coursera 正式上市

?英特尔第三代 Ice Lake 发布正面与 AMD EPYC PK，结果令人大跌眼镜！

?AR 第一大单，微软 219 亿美元为美军打造高科技头盔

打造专属BGM，Python 深度学习教你