level generation and style enhancement — deep learning for

18
Level generation and style enhancement — deep learning for game development overview * Piotr Migdal ECC Games SA Bartlomiej Olechno ECC Games SA Blażej Podgórski ECC Games SA Kozminski University Abstract We present practical approaches of using deep learning to create and enhance level maps and textures for video games – desktop, mobile, and web. We aim to present new possibilities for game developers and level artists. The task of designing levels and filling them with details is challenging. It is both time-consuming and takes effort to make levels rich, complex, and with a feeling of being natural. Fortunately, recent progress in deep learning provides new tools to accompany level designers and visual artists. Moreover, they offer a way to generate infinite worlds for game replayability and adjust educational games to players’ needs. We present seven approaches to create level maps, each using statistical methods, machine learning, or deep learning. In particular, we include: Generative Adversarial Networks for creating new images from existing examples (e.g. ProGAN). Super-resolution techniques for upscaling images while preserving crisp detail (e.g. ESRGAN). Neural style transfer for changing visual themes. Image translation – turning semantic maps into images (e.g. GauGAN). Semantic segmentation for turning images into semantic masks (e.g. U-Net). Unsupervised semantic segmentation for extracting semantic features (e.g. Tile2Vec). Texture synthesis – creating large patterns based on a smaller sample (e.g. InGAN). * Submitted to the 52 nd International Simulation and Gaming Association (ISAGA) Conference 2021 [email protected], https://p.migdal.pl 1 arXiv:2107.07397v1 [cs.CV] 15 Jul 2021

Upload: others

Post on 29-Dec-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Level generation and style enhancement — deep learning for

Level generation and style enhancement — deep learningfor game development overview∗

Piotr Migdał†

ECC Games SA

Bartłomiej OlechnoECC Games SA

Błażej PodgórskiECC Games SA

Kozminski University

Abstract

We present practical approaches of using deep learning to create and enhance level maps andtextures for video games – desktop, mobile, and web. We aim to present new possibilities for gamedevelopers and level artists.

The task of designing levels and filling them with details is challenging. It is both time-consumingand takes effort to make levels rich, complex, and with a feeling of being natural. Fortunately,recent progress in deep learning provides new tools to accompany level designers and visual artists.Moreover, they offer a way to generate infinite worlds for game replayability and adjust educationalgames to players’ needs.

We present seven approaches to create level maps, each using statistical methods, machinelearning, or deep learning. In particular, we include:

• Generative Adversarial Networks for creating new images from existing examples (e.g.ProGAN).

• Super-resolution techniques for upscaling images while preserving crisp detail (e.g.ESRGAN).

• Neural style transfer for changing visual themes.

• Image translation – turning semantic maps into images (e.g. GauGAN).

• Semantic segmentation for turning images into semantic masks (e.g. U-Net).

• Unsupervised semantic segmentation for extracting semantic features (e.g. Tile2Vec).

• Texture synthesis – creating large patterns based on a smaller sample (e.g. InGAN).

∗Submitted to the 52nd International Simulation and Gaming Association (ISAGA) Conference 2021†[email protected], https://p.migdal.pl

1

arX

iv:2

107.

0739

7v1

[cs

.CV

] 1

5 Ju

l 202

1

Page 2: Level generation and style enhancement — deep learning for

1. IntroductionAdvances in deep learning [1] offer new ways to create and modify visual content. They can

be used for entertainment, including video game development. This paper presents a survey ofvarious machine learning techniques that game developers can directly use for texture and levelcreation and improvement. These techniques offer a way to:

• Generate infinite content that closely resembles real data.

• Change or modify the visual style.

• Save the time for artists level designers and artists by extending the board or filling features.

These techniques differ from a classical approach of procedural content generation [2] usinghand-written rules, which is the keystone of some genres (e.g. roguelike games). In contrast,machine learning models learn from the training dataset [3, 4].

While we focus on images, it is worth mentioning progress in text-generation neural networks(e.g. transformers such as GPT-2 [5] and GPT-3 [6]). They are powerful enough not only tosupplement entertainment, e.g. with writing poetry [7] but be the core of a text adventure gamesuch as AI Dungeon [8, 9]. In this game, all narration is AI-generated and allows any text input bythe user (rather than a few options). GPT-3 network is powerful enough to generate abstract data(such as computer code) and may be able to create spatial maps encoded as text (e.g. as XMLor JSON). One of its extensions, DALL·E [10], is a text-image model able to generate seeminglyarbitrary images from the description, such as “an armchair in the shape of an avocado”.

Recent progress in reinforcement learning resulted in creating networks surpassing top humanplayers in Go [11], reaching grandmaster level in real-time strategy StarCraft II [12], and playing ata decent level in a multiplayer battle arena Dota 2 [13]. These methods learn from interacting withthe game environment, learning human-like strategies [14], and exploiting unintended propertiesof game physics and level maps [15]. Other notable approaches include automatic level evaluation[16] and level generation from gameplay videos [17]. For some networks, level generation is anauxiliary task to improve an AI model performance [18]. Generating levels can be modified byparameters to adjust game difficulty or address particular needs in educational games [19], offeringthe potential to increase their effectiveness [20].

We present seven approaches to create level maps, each using statistical methods, machinelearning, or deep learning. In particular, we include:

• Generative Adversarial Networks for creating new images from existing examples.

• Super-resolution techniques for upscaling images while preserving crisp detail.

• Neural style transfer for changing visual themes.

• Image translation – turning semantic maps into images.

• Semantic segmentation for turning images into semantic masks.

• Unsupervised semantic segmentation for extracting semantic features.

• Texture synthesis – creating large patterns based on a smaller sample.

2

Page 3: Level generation and style enhancement — deep learning for

In this paper, we put a special emphasis on methods that are readily available for use. The mostpopular deep learning frameworks, including TensorFlow [21] and PyTorch [22], are open-source.Moreover, it is common that the newest models are shared as GitHub code, or even directly asGoogle Colaboratory notebooks, allowing users to use them online, without any setup. Othermodels can be accessed via backend, as a service, or with TensorFlow.js [23] using GPU supportin the browser.

2. Choice of methodsNumerous methods are (or can be) used for game development. We focused on ones, which

fulfill the following set of criteria:

• Learn from data rather than use hand-written rules.

• Work on or with images (that is, any 2D spatial data).

• Can be used for generating, enhancing, or transforming levels.

• Can directly fit into the workflow of level designers and artists [24].

• Are state of the art, in the sense that there is no other method that obviously outclassesthem.

We choose to restrict to a set of methods that can be related to each other in a meaningfulway and focus on the new statistical methods. While we consult state-of-the-art surveys [25], weare aware that there are no straightforward, numeric benchmarks for content generation. Whatmatters is the desired effect by game developers and a subjective experience by the players.

Text generation, procedural or not, is vital for a wide range of adventure or text-based games.Comparing such methods deserves another study, both due to the sheer volume of the topic andthe difficulty of relating the text to images (as it is ”apples to oranges”).

We acknowledge that practically any statistical, machine learning or deep learning method canhave a potential for game development. However, we aim to cover ones for which their use isstraightforward. Since tools used in commercial games are often proprietary secrets, we do notrestrict ourselves only to methods with known applications in video games.

It is important to emphasize that these seven method classes are, in general,non-interchangeable. As depicted in Figure 1, each requires a different input, produces adifferent output, and requires a different setup (which usually involves a specific trainingdataset). They solve different problems. Hence, our motivation is not to compare the quality ofthe results but to provide use-cases when a particular method has an edge. Even though we usethe term “image”, we are not restricted to 2D images with 3 color channels. The same methodswork for more channels (e.g. RGBA, RGB + heightmap, or feature maps representing otherphysical properties) and 3D voxel data.

3. Description of methods3.1. Generative Adversarial Networks

Regular Generative Adversarial Networks (GANs) generate new images similar to ones fromthe training set [26, 27]. The core idea is that there are two networks – one generating images

3

Page 4: Level generation and style enhancement — deep learning for

Figure 1. A conceptual summary of transformations performed by the discussed methods. We show alist of manually created input-output pairs.

(Generator) and the other distinguishing generated images from real ones (Discriminator) [28].However, their training is notoriously difficult and requires much effort to create any satisfactoryresults. GANs take as an input a vector, which might be random, fixed, or generated from thetext [29, 30, 31].

Historically, networks were restricted to generating small images (e.g. 64×64 or 128×128 pixels)[32, 33, 34]. Only recent progress allowed to generate large-scale (e.g. 1024×1024) detailed imagessuch as human faces with ProGAN [35], as seen in Figure 2. Its training process involves a seriesof neural network training stages. It starts from high-level features (such as 4× 4 images showing

4

Page 5: Level generation and style enhancement — deep learning for

Figure 2. A photorealistic face generated from a set of human-interpretable parameters [38] (left), and asimilarly created anime face [39, 40] (right).

an extremely blurred face), step-by-step learning to include high detail. This training is crucial, asotherwise convolutional neural networks tend to overemphasize local features (visible by humansor not) and neglect high-level features.

This class offers potential for one-step creation of levels, from the high-level overview to low-levelfeatures. These are extremely data and computing-intensive. However, there is an ongoing effortto make them more data-efficient [36]. Moreover, most examples are based on fairly standardizeddatasets, e.g. faces of actors or anime characters [37] – with consistent style and color. The sameapproach might not work for more diverse collections of images or photos.

3.2. Super-resolution (image upscaling)

Video games often benefit from high visual detail, unless the low resolution is a consciousdesign choice – as for some retro-styled games. Super-resolution [41, 42] offers a way to improvetexture details by a factor (typically 2x or 4x) without additional effort. Unlike more traditionalapproaches (nearest neighbor, linear, bicubic), it preserves crisp detail. It is not possible toflawlessly generate more data. However, it is possible to create plausible results — crisp detailsresembling authentic, high-resolution images. The method can be applied in two ways: upgradingtextures or improving the game resolution in real-time.

A super-resolution network ESRGAN [43] became widely used as it was released with pre-trained weights and easy-to-use code. It can improve older games and save artists’ time fornew games as only a lower resolution would be required. Fans are already doing it – both as aproof-of-principle (e.g.for a single screenshot as in Figure 3) and to create playable mods thatenhance game graphics. There are channels dedicated to upscaling games [44, 45] that resultedin fan-made textures usable in games such as the 3D shooter Return to Castle Wolfenstein [46].Compare with a seemingly manual approach of remastering old games, e.g. Diablo II: Resurrected[47] and StarCraft: Remastered [48]. In addition to generic upscaling networks, the is a PULSE[49] network that offers upscaling of faces up to x64 times as in Figure 4.

Another key application is improving resolution in real-time. Given that we have fixedcomputing power (usually restricted by the GPU), it improves the trade-off between resolutionand frames per second. That is, we can improve resolution while keeping the same frame rate or

5

Page 6: Level generation and style enhancement — deep learning for

Original Upscaled x4

Figure 3. Upscaling with ESRGAN a scene from Heroes of Might and Magic III.

Figure 4. Depixelizing Doomguy with PULSE [53], reproduced with permission. Note that there aremany plausible faces (right) consistent with the original, low-resolution version from Doom (1993) (left).

improving the frame rate while keeping the same resolution. Algorithms such as DLSS 2.0 byNvidia [50, 51] are already used by AAA games such as Cyberpunk 2077 [52].

3.3. Style transfer

Instead of increasing resolution, we may want to change the style of the image. For example,to turn foliage into dunes or daytime textures into nighttime. The simplest example of styletransfer involves taking two images (one with content and the other with style) and generating apainting-like picture [54] or video [55]. These methods work best for creating dreamy or trippyvisual styles but can go well beyond that [56, 57]. It is already accessible as a plugin for a populargame engine, Unity3D [58]. Style transfer can be used for post-processing, as depicted in Figure 5,or as part of an asset generation workflow [59].

Newer approaches (usually involving GANs) can change crucial features without sacrificingquality [60, 61]. They can be sensitive and detailed features as a gradual change of faces (e.g. byage or ethnicity) while maintaining photorealistic quality [62, 63].

These techniques look promising for changing textures or adding a real-time filter. It includesturning rendered graphics into realistic ones, reassembling real photos, e.g. with GTA V as anexample [64]. While such transformations improve the realism of some pieces (e.g. foliage), theymay reduce the saturation, cleanliness, and other desirable features of game graphics.

6

Page 7: Level generation and style enhancement — deep learning for

Figure 5. An original screenshot from Doom (1993) and three styles: The Starry Night by van Gogh,a constellation photo by Pexels, and Angels and Devils by Escher (top for the style, bottom for themodified image). Generated online with DeepArt [65].

Figure 6. Nvidia GauGAN beta [68] for turning paintbrush-like feature drawings into realistic images.Colors corresponding: sea, sky, rock, clouds and mountains (left) are turned into a photorealistic picture(right).

3.4. Image translation

Image translation is a broad subject of using neural networks to turn one image into another.Technically, a few other methods can be viewed as such: style transfer, super-resolution, or featureextraction. This section focuses on turning feature maps into levels, fixing feature maps, enrichingmaps with details (trees), or adding additional modalities (e.g. depth) from existing fields.

Notable examples include pix2pix [66] and CycleGAN [67], translating between various classesof images. GauGAN [68] is a network focused on turning semantic maps into realistic images –see Figure 6. Another similar application is turning 2D graphics into a pseudo-3D perspective –even without any data obtained from a game engine [69]. Moreover, abstract game graphics canbe turned into photorealistic landscapes – e.g. for Minecraft [70].

3.5. Supervised image segmentation

Turning real photos into game maps usually requires manual work. Semantic segmentation isa class of models extracting semantic features from images [71, 72]. For example, to change an

7

Page 8: Level generation and style enhancement — deep learning for

Figure 7. Supervised image segmentation with a custom U-Net: turning satellite images (left) into roadmaps (right) [73], reproduced with permission.

aerial image into areas with different properties as depicted in Figure 7. In a race car game, itwould be extracting the location of roads, gravel, grass, foliage, and water. A semantic map canbe applied to change the game behavior (e.g. different friction on each zone) or generate a mapwith different textures or graphics, but with overall structure taken from a real-world example.

In supervised semantic segmentation, we need to have training data in the form of pairs: theinput image and semantic features, with the same resolution, with one or more classes. Networkarchitectures such as U-Net typically need a relatively small training dataset (as for each image,they treat every pixel as a data point) and a simple training process.

3.6. Unsupervised image segmentation

Manual labeling of all data with semantic maps can be labor-intensive and rigid. For a typicalproject, one would need to label a considerable fraction of data, which makes it infeasible if wewant to lower the workload of level designers. Unsupervised feature extraction can be created ina few different ways. The most common approach is transfer learning [74] – that is, using visualpatterns extracted by pre-trained networks [75] such as ResNet-50 [76]. However, in many cases,game graphics differ substantially from real photos in style, scale, and content – see Figure 8.Given that convolutional neural networks are very sensitive to low-level patterns, any digitalartistic style may mislead them. Moreover, abstract graphics (e.g. Minesweeper) or of a differentscale (e.g. SimCity) may be non-suitable for networks primarily trained on ImageNet [77] – adataset of photos taken with regular cameras.

In this case, we may need to resort to self-supervised and unsupervised methods trained onour dataset. For that, we may use autoencoders [79], unsupervised clustering [80], or methodsextracting content based solely on visual pattern proximity such as Tile2Vec [81] and Patch2Vec[82].

At ECC Games SA, we use a similar technique [83] for turning aerial photos of race track bydrone into semantic maps that allow both to distinguish robust features (e.g. road vs foliage) anddetails (density and angle of tire trails). This technique can work even on a single image, withno pre-training, which offers extreme data efficiency rare in deep learning. In particular, it meansthat there is no extra effort on image labeling or generating large datasets. We show two diverseexamples in Figure 9.

8

Page 9: Level generation and style enhancement — deep learning for

Figure 8. An example that ResNet-50 trained on ImageNet dataset does not work well with Minesweeper,using online Keras.js [78].

Figure 9. One-shot results for a SimCity map and an aerial scan of a racetrack – input images (left) andthree largest Principal Components mapped to RGB channels (right).

3.7. Pattern synthesis

Pattern synthesis is a method used to expand a spatial pattern up to an arbitrary size.Conceptually, it relates to super-resolution, but it makes the image larger instead of goingdeeper in detail. A straightforward approach involves generating bigger textures withoutself-repetition to give them a natural feel [84, 85]. This technique can be applied statically or forcreating patterns in real-time [86]. Another way to go is to work at the level of the general map– i.e. expanding high-level semantic maps from a smaller sample. Except for deep learning, someclassical algorithms are widely used, such as WaveFunctionCollapse [87, 88, 89].

For game genres such as classic platformer games and scrolling shooters, 2D level maps canbe considered a sequence of 1D stripes. We can tackle generating 1D sequences with additional,simpler methods. We can tackle generating 1D sequences with additional, simpler methods, e.g.the classical statistical model of Hidden Markov Chain [90]. Deep learning models for sequence

9

Page 10: Level generation and style enhancement — deep learning for

Figure 10. Pattern synthesis using InGAN [85] based on a single map from SimCity. The map is treatedpurely as an image. Splitting into semantic regions (e.g. tiles related to buildings) would improve theresult.

generation, such as a Long Short-Term Memory network (LSTM) [91] and a Gated RecurrentUnit (GRU) [92], offer more flexibility and even better results, especially for longer sequences.The 1D patterns of images can be exploited by transformers generating images strip-by-strip [93].For example, they can generate Super Mario levels generated with an LSTM [94].

4. ConclusionWe presented an array of deep learning methods for the creation and enhancement of levels

and textures. They show the potential to improve the game development process, offer newexperiences to the players, or generate content in real-time. As we have seen in the past, technicaladvancements allow not only to improve existing game formulas, but also to create new genres[95].

Depending on the problem we tackle or an enhancement we want to make – we need to considerdifferent types of deep learning models.

1. We need to translate the task into an input-output problem. Both inputs and outputs canbe photos or other real-world images, in-game graphics, semantic maps, or any other formatof spatial encoding of low-level textures and high-level maps.

2. We need to check classes of models and if they provide the transformation we need. In thispaper, we discussed seven different classes, which provide a good starting point.

3. We need to see if any model within this family provides results of the desired quality. Thisstep may involve a simple plug & play setup with an existing trained model or involve amuch more complicated process of finding a specific model and training it with our data.

Quite a few deep learning techniques are already used in game development – by fancommunities, modders, professional game developers, or as a tool provided by game engines.While progress in deep learning is fast, even current models offer broad perspectives forsupporting game development – with many applications and opportunities yet to be discovered.

10

Page 11: Level generation and style enhancement — deep learning for

The abundance of publications from the last two years (and not-yet-published novel results)shows that it is a fruitful and quickly developing field. The latter resorts us to citing GitHubrepositories and other sources to reference the newest methods.

AcknowledgmentsWe would like to thank the ECC Games team, with special thanks to Marcin Prusarczyk and

Piotr Wątrucki for the project overview, and Bartłomiej ”Valery” Walendziak for sharing hisinsights on the stages of level development from the artist’s perspective. We are grateful to OlleBergman, Anna Olchowik and Jan Marucha for their comments and to Rafał Jakubanis for hisextensive editorial feedback on the draft.

We focus on creating race track maps from aerial scans of real race tracks. This task involves afew levels of complexity – from the high-level general pattern of the map to low-level details anddecorations such as tire tracks. This research is supported by the EU R&D grant POIR.01.02.00-00-0052/19 for project “GearShift – building the engine of the behavior of wheeled motor vehiclesand map generation based on artificial intelligence algorithms implemented on the Unreal Engineplatform – GAMEINN”.

References[1] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016, https://www.deeplearningbook.org/.

[2] N. Shaker, J. Togelius, and M. J. Nelson. Procedural Content Generation in Games: A Textbookand an Overview of Current Research. Springer Berlin Heidelberg, New York, NY, 2016, http://pcgbook.com/.

[3] A. Summerville, S. Snodgrass, M. Guzdial, C. Holmgard, A. K. Hoover, A. Isaksen, A. Nealen, andJ. Togelius. Procedural Content Generation via Machine Learning (PCGML). IEEE Transactionson Games, 10(3):257–270, Sept. 2018, arXiv:1702.00539, doi:10.1109/TG.2018.2846639.

[4] J. Liu, S. Snodgrass, A. Khalifa, S. Risi, G. N. Yannakakis, and J. Togelius. Deep Learning forProcedural Content Generation. Neural Comput & Applic, 33(1):19–37, Jan. 2021, arXiv:2010.04548,doi:10.1007/s00521-020-05383-8.

[5] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language Models areUnsupervised Multitask Learners. OpenAI, page 24, Feb. 2019, https://openai.com/blog/better-language-models/.

[6] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam,G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh,D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess,J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. Language Modelsare Few-Shot Learners. In 1877–1901, volume 33, pages 1877–1901. Curran Associates, Inc., July2020, arXiv:2005.14165.

[7] G. Branwen. GPT-2 Neural Network Poetry. Mar. 2019, https://www.gwern.net/GPT-2.

[8] R. Raley and M. Hua. Playing With Unicorns: AI Dungeon and Citizen NLP. Digital HumanitiesQuarterly, 14(4), Dec. 2020, https://escholarship.org/uc/item/83m0m08x.

11

Page 12: Level generation and style enhancement — deep learning for

[9] N. Walton. AI Dungeon, 2019, https://aidungeon.io. accessed: 2021-07-15.

[10] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever. Zero-ShotText-to-Image Generation. arXiv:2102.12092 [cs], Feb. 2021, arXiv:2102.12092, https://openai.com/blog/dall-e/.

[11] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre,D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis. A general reinforcementlearning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419):1140–1144, Dec. 2018, doi:10.1126/science.aar6404.

[12] O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi,R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre,T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard,D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y. Wu, R. Ring,D. Yogatama, D. Wunsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu,D. Hassabis, C. Apps, and D. Silver. Grandmaster level in StarCraft II using multi-agentreinforcement learning. Nature, 575(7782):350–354, Nov. 2019, doi:10.1038/s41586-019-1724-z.

[13] OpenAI, C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi,Q. Fischer, S. Hashme, C. Hesse, R. Józefowicz, S. Gray, C. Olsson, J. Pachocki, M. Petrov, H. P.d. O. Pinto, J. Raiman, T. Salimans, J. Schlatter, J. Schneider, S. Sidor, I. Sutskever, J. Tang,F. Wolski, and S. Zhang. Dota 2 with Large Scale Deep Reinforcement Learning. arXiv:1912.06680[cs, stat], Dec. 2019, arXiv:1912.06680, https://openai.com/projects/five/.

[14] M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castaneda, C. Beattie,N. C. Rabinowitz, A. S. Morcos, A. Ruderman, N. Sonnerat, T. Green, L. Deason, J. Z. Leibo,D. Silver, D. Hassabis, K. Kavukcuoglu, and T. Graepel. Human-level performance in 3Dmultiplayer games with population-based reinforcement learning. Science, 364(6443):859–865, May2019, arXiv:1807.01281, doi:10.1126/science.aau6249, https://deepmind.com/blog/article/capture-the-flag-science.

[15] B. Baker, I. Kanitscheider, T. Markov, Y. Wu, G. Powell, B. McGrew, and I. Mordatch. EmergentTool Use From Multi-Agent Autocurricula. arXiv:1909.07528 [cs, stat], Feb. 2020, arXiv:1909.07528.

[16] O. Davoodi, M. Ashtiani, and M. Rajabi. An Approach for the Evaluation and Correction ofManually Designed Video Game Levels Using Deep Neural Networks. The Computer Journal,(bxaa071), July 2020, doi:10.1093/comjnl/bxaa071.

[17] M. Guzdial and M. Riedl. Toward Game Level Generation from Gameplay Videos. arXiv:1602.07721[cs], Feb. 2016, arXiv:1602.07721.

[18] S. Risi and J. Togelius. Increasing generality in machine learning through procedural contentgeneration. Nat Mach Intell, 2(8):428–436, Aug. 2020, arXiv:1911.13071, doi:10.1038/s42256-020-0208-z.

[19] M. Wardaszko and B. Podgórski. Mobile Learning Game Effectiveness in Cognitive Learning byAdults. Simul. Gaming, 48(4):435–454, Aug. 2017, doi:10.1177/1046878117704350.

[20] K. Park, B. W. Mott, W. Min, K. E. Boyer, E. N. Wiebe, and J. C. Lester. Generating EducationalGame Levels with Multistep Deep Convolutional Generative Adversarial Networks. In 2019 IEEEConference on Games (CoG), pages 1–8, Aug. 2019, doi:10.1109/CIG.2019.8848085.

12

Page 13: Level generation and style enhancement — deep learning for

[21] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving,M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker,V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: A System for Large-ScaleMachine Learning. pages 265–283, 2016, arXiv:1603.04467.

[22] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein,L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy,B. Steiner, L. Fang, J. Bai, and S. Chintala. PyTorch: An Imperative Style, High-PerformanceDeep Learning Library. Jan. 2019, arXiv:1912.01703.

[23] D. Smilkov, N. Thorat, Y. Assogba, A. Yuan, N. Kreeger, P. Yu, K. Zhang, S. Cai, E. Nielsen,D. Soergel, S. Bileschi, M. Terry, C. Nicholson, S. N. Gupta, S. Sirajuddin, D. Sculley, R. Monga,G. Corrado, F. B. Viegas, and M. Wattenberg. TensorFlow.js: Machine Learning for the Web andBeyond. arXiv:1901.05350 [cs], Feb. 2019, arXiv:1901.05350.

[24] M. Guzdial, N. Liao, J. Chen, S.-Y. Chen, S. Shah, V. Shah, J. Reno, G. Smith, and M. O.Riedl. Friend, Collaborator, Student, Manager: How Design of an AI-Driven Game Level EditorAffects Creators. In Proceedings of the 2019 CHI Conference on Human Factors in ComputingSystems, pages 1–13. Association for Computing Machinery, New York, NY, USA, May 2019, doi:10.1145/3290605.3300854.

[25] R. Stojnic, R. Taylor, M. Kardas, V. Kerkez, L. Viaud, E. Saravia, and G. Cucurull. Browsethe State-of-the-Art in Machine Learning, Papers with Code, https://paperswithcode.com/sota.accessed: 2021-07-15.

[26] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, andY. Bengio. Generative Adversarial Networks. In NIPS 2014, June 2014, arXiv:1406.2661.

[27] I. Goodfellow. NIPS 2016 Tutorial: Generative Adversarial Networks. arXiv:1701.00160 [cs], Apr.2017, arXiv:1701.00160.

[28] M. Kahng, N. Thorat, D. H. P. Chau, F. B. Viegas, and M. Wattenberg. GAN Lab: UnderstandingComplex Deep Generative Models using Interactive Visual Experimentation. IEEE Transactions onVisualization and Computer Graphics, 25(1):310–320, Jan. 2019, arXiv:1809.01587, doi:10.1109/TVCG.2018.2864500, https://poloclub.github.io/ganlab/.

[29] S. Reed, Z. Akata, S. Mohan, S. Tenka, B. Schiele, and H. Lee. Learning what and where to draw.In Proceedings of the 30th International Conference on Neural Information Processing Systems,NIPS’16, pages 217–225, Red Hook, NY, USA, Dec. 2016. Curran Associates Inc., arXiv:1610.02454.

[30] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. Metaxas. StackGAN:Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In2017 IEEE International Conference on Computer Vision (ICCV), pages 5908–5916, Oct. 2017,arXiv:1612.03242, doi:10.1109/ICCV.2017.629.

[31] A. Dash, J. C. B. Gamboa, S. Ahmed, M. Liwicki, and M. Z. Afzal. TAC-GAN - TextConditioned Auxiliary Classifier Generative Adversarial Network. arXiv:1703.06412 [cs], Mar. 2017,arXiv:1703.06412.

[32] Y. Jin, J. Zhang, M. Li, Y. Tian, H. Zhu, and Z. Fang. Towards the Automatic AnimeCharacters Creation with Generative Adversarial Networks. arXiv:1708.05509 [cs], Aug. 2017,arXiv:1708.05509.

13

Page 14: Level generation and style enhancement — deep learning for

[33] M. Brundage, S. Avin, J. Clark, H. Toner, P. Eckersley, B. Garfinkel, A. Dafoe, P. Scharre,T. Zeitzoff, B. Filar, H. Anderson, H. Roff, G. C. Allen, J. Steinhardt, C. Flynn, S. O. hEigeartaigh,S. Beard, H. Belfield, S. Farquhar, C. Lyle, R. Crootof, O. Evans, M. Page, J. Bryson, R. Yampolskiy,and D. Amodei. The Malicious Use of Artificial Intelligence: Forecasting, Prevention, andMitigation. Report, Apr. 2018, arXiv:1802.07228, https://www.repository.cam.ac.uk/handle/1810/275332.

[34] A. Brock, J. Donahue, and K. Simonyan. Large Scale GAN Training for High Fidelity NaturalImage Synthesis. Sept. 2018, arXiv:1809.11096.

[35] T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive Growing of GANs for Improved Quality,Stability, and Variation. Feb. 2018, arXiv:1710.10196.

[36] T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila. Training GenerativeAdversarial Networks with Limited Data. In Proc. NeurIPS, Oct. 2020, arXiv:2006.06676.

[37] G. Branwen. Making Anime Faces With StyleGAN, Gwern.net, Feb. 2019, https://www.gwern.net/Faces. last modified: 2021-01-30, accessed: 2021-07-16.

[38] S. Guan. SummitKwan/transparent latent gan, Sept. 2018, https://github.com/SummitKwan/transparent_latent_gan. last modified: 2018-11-01.

[39] G. Branwen. This Waifu Does Not Exist, Gwern.net, Feb. 2019, https://www.gwern.net/TWDNE.last modified: 2020-01-20, accessed: 2021-07-16.

[40] G. Branwen. This Waifu Does Not Exist v3.5 (TWDNEv3.5) - Gwern, Feb. 2019, https://www.thiswaifudoesnotexist.net. last modified: 2020-09-06, accessed: 2021-07-15.

[41] W. Sun and Z. Chen. Learned Image Downscaling for Upscaling using Content Adaptive Resampler.IEEE Trans. on Image Process., 29:4027–4040, 2020, arXiv:1907.12904, doi:10.1109/TIP.2020.2970248.

[42] S. Anwar, S. Khan, and N. Barnes. A Deep Journey into Super-resolution: A survey.arXiv:1904.07523 [cs], Mar. 2020, arXiv:1904.07523.

[43] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy. ESRGAN: EnhancedSuper-Resolution Generative Adversarial Networks. In L. Leal-Taixe and S. Roth, editors, ComputerVision – ECCV 2018 Workshops, Lecture Notes in Computer Science, pages 63–79, Cham, 2019.Springer International Publishing, arXiv:1809.00219, doi:10.1007/978-3-030-11021-5_5.

[44] AI Neural Networks being used to generate HQ textures for older games (You can do it yourself!),ResetEra, Dec. 2018, https://www.resetera.com/threads/ai-neural-networks-being-used-to-generate-hq-textures-for-older-games-you-can-do-it-yourself.88272/.

[45] GameUpscale - Reddit, Dec. 2018, https://www.reddit.com/r/GameUpscale. accessed: 2021-07-15.

[46] C. Hart. Krischan74/RTCWHQ, May 2019, https://github.com/Krischan74/RTCWHQ. lastmodified: 2019-08-26.

[47] R. McCaffrey. Diablo 2 Resurrected Announced for PC, PS5, Xbox Series X, PS4, XboxOne, and Nintendo Switch - IGN. IGN, Feb. 2021, https://www.ign.com/articles/diablo-2-resurrected-announced-for-pc-ps5-xbox-series-x-ps4-xbox-one-and-nintendo-switch.

14

Page 15: Level generation and style enhancement — deep learning for

[48] L. Hafer. StarCraft Remastered Review. IGN, Aug. 2017, https://www.ign.com/articles/2017/08/19/starcraft-remastered-review.

[49] S. Menon, A. Damian, S. Hu, N. Ravi, and C. Rudin. PULSE: Self-Supervised Photo Upsamplingvia Latent Space Exploration of Generative Models. pages 2437–2445, 2020, arXiv:2003.03808.

[50] A. Burnes. NVIDIA DLSS 2.0: A Big Leap In AI Rendering, Mar. 2020, https://www.nvidia.com/en-us/geforce/news/nvidia-dlss-2-0-a-big-leap-in-ai-rendering/. accessed: 2021-07-15.

[51] J. Burgess. RTX on—The NVIDIA Turing GPU. IEEE Micro, 40(2):36–44, Mar. 2020, doi:10.1109/MM.2020.2971677.

[52] K. Castle. Here are all the confirmed ray tracing and DLSS games so far. Rock Paper Shotgun, June2021, https://www.rockpapershotgun.com/confirmed-ray-tracing-and-dlss-games-2021.

[53] bycloud. Depixelizing Doom Guy? Mona Lisa in Real Life? The ”Upscaling” AI: PULSE, June2020, https://www.youtube.com/watch?v=CSoHaO3YqH8. accessed: 2021-07-16.

[54] L. Gatys, A. Ecker, and M. Bethge. A Neural Algorithm of Artistic Style. Journal of Vision,16(12):326–326, Aug. 2016, arXiv:1508.06576, doi:10.1167/16.12.326.

[55] M. Ruder, A. Dosovitskiy, and T. Brox. Artistic Style Transfer for Videos. In B. Rosenhahn andB. Andres, editors, Pattern Recognition, Lecture Notes in Computer Science, pages 26–36, Cham,2016. Springer International Publishing, arXiv:1604.08610, doi:10.1007/978-3-319-45886-1_3.

[56] A. Semmo, T. Isenberg, and J. Dollner. Neural Style Transfer: A Paradigm Shift for Image-basedArtistic Rendering? pages 5:1–5:13. ACM, July 2017, doi:10.1145/3092919.3092920.

[57] Y. Jing, Y. Yang, Z. Feng, J. Ye, Y. Yu, and M. Song. Neural Style Transfer: A Review.IEEE Transactions on Visualization and Computer Graphics, 26(11):3365–3385, Nov. 2020,arXiv:1705.04058, doi:10.1109/TVCG.2019.2921336.

[58] T. Deliot, T. Guinier, and K. Vanhoey. Real-time style transfer in Unity using deep neural networks,Unity Blog, Nov. 2020, https://blog.unity.com/technology/real-time-style-transfer-in-unity-using-deep-neural-networks. accessed: 2021-07-15.

[59] K. Allen. Neural Net Style Transfer Game Art, Nov. 2016, http://katieamazing.com/blog/2016/12/15/art-generation-style-transfer. accessed: 2021-07-16.

[60] F. Luan, S. Paris, E. Shechtman, and K. Bala. Deep Photo Style Transfer. In 2017 IEEEConference on Computer Vision and Pattern Recognition (CVPR), pages 6997–7005, July 2017,arXiv:1703.07511, doi:10.1109/CVPR.2017.740.

[61] J. An, H. Xiong, J. Huan, and J. Luo. Ultrafast Photorealistic Style Transfer via Neural ArchitectureSearch. AAAI, 34(07):10443–10450, Apr. 2020, arXiv:1912.02398, doi:10.1609/aaai.v34i07.6614.

[62] T. Karras, S. Laine, and T. Aila. A Style-Based Generator Architecture for Generative AdversarialNetworks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),pages 4396–4405, June 2019, arXiv:1812.04948, doi:10.1109/CVPR.2019.00453.

15

Page 16: Level generation and style enhancement — deep learning for

[63] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila. Analyzing and Improvingthe Image Quality of StyleGAN. In 2020 IEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), pages 8107–8116, June 2020, arXiv:1912.04958, doi:10.1109/CVPR42600.2020.00813.

[64] S. R. Richter, H. A. AlHaija, and V. Koltun. Enhancing Photorealism Enhancement.arXiv:2105.04619 [cs], May 2021, arXiv:2105.04619, https://intel-isl.github.io/PhotorealismEnhancement/.

[65] M. Bethge, A. Ecker, L. Gatys, Ł. Kidziński, and M. Warchoł. deepart.io - become a digital artist,Oct. 2015, https://deepart.io/. accessed: 2021-07-15.

[66] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-Image Translation with ConditionalAdversarial Networks. pages 5967–5976. IEEE Computer Society, July 2017, arXiv:1611.07004,doi:10.1109/CVPR.2017.632, https://phillipi.github.io/pix2pix/.

[67] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In 2017 IEEE International Conference on Computer Vision(ICCV), pages 2242–2251, Oct. 2017, arXiv:1703.10593, doi:10.1109/ICCV.2017.244, https://junyanz.github.io/CycleGAN/.

[68] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu. Semantic Image Synthesis With Spatially-AdaptiveNormalization. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pages 2332–2341, June 2019, arXiv:1903.07291, doi:10.1109/CVPR.2019.00244, https://nvlabs.github.io/SPADE/.

[69] M.-L. Shih, S.-Y. Su, J. Kopf, and J.-B. Huang. 3D Photography Using Context-Aware LayeredDepth Inpainting. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pages 8025–8035, June 2020, arXiv:2004.04727, doi:10.1109/CVPR42600.2020.00805,https://shihmengli.github.io/3D-Photo-Inpainting/.

[70] Z. Hao, A. Mallya, S. Belongie, and M.-Y. Liu. GANcraft: Unsupervised 3D Neural Rendering ofMinecraft Worlds. arXiv:2104.07659 [cs], Apr. 2021, arXiv:2104.07659, https://nvlabs.github.io/GANcraft/.

[71] O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional Networks for Biomedical ImageSegmentation. In N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, editors, MedicalImage Computing and Computer-Assisted Intervention – MICCAI 2015, Lecture Notes in ComputerScience, pages 234–241, Cham, 2015. Springer International Publishing, arXiv:1505.04597, doi:10.1007/978-3-319-24574-4_28, https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/.

[72] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. DeepLab: SemanticImage Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully ConnectedCRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848, Apr. 2018,arXiv:1606.00915, doi:10.1109/TPAMI.2017.2699184.

[73] A. Nowaczynski. Deep learning for satellite imagery via image segmentation, deepsense.ai, Apr. 2017,https://deepsense.ai/deep-learning-for-satellite-imagery-via-image-segmentation/.accessed: 2021-07-16.

16

Page 17: Level generation and style enhancement — deep learning for

[74] Y. Bengio. Deep learning of representations for unsupervised and transfer learning. In Proceedingsof the 2011 International Conference on Unsupervised and Transfer Learning workshop - Volume27, UTLW’11, pages 17–37, Washington, USA, July 2011. JMLR.org, http://proceedings.mlr.press/v27/bengio12a.html.

[75] C. Olah, A. Mordvintsev, and L. Schubert. Feature Visualization. Distill, 2(11):e7, Nov. 2017,doi:10.23915/distill.00007, https://distill.pub/2017/feature-visualization.

[76] K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. pages 770–778.IEEE Computer Society, June 2016, arXiv:1512.03385, doi:10.1109/CVPR.2016.90.

[77] S. Kornblith, J. Shlens, and Q. V. Le. Do Better ImageNet Models Transfer Better? In 2019IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2656–2666,June 2019, arXiv:1805.08974, doi:10.1109/CVPR.2019.00277.

[78] L. Chen. Keras.js - Run Keras models in the browser, May 2017, https://transcranial.github.io/keras-js/. accessed: 2021-07-15.

[79] J. Masci, U. Meier, D. Ciresan, and J. Schmidhuber. Stacked Convolutional Auto-Encoders forHierarchical Feature Extraction. In T. Honkela, W. Duch, M. Girolami, and S. Kaski, editors,Artificial Neural Networks and Machine Learning – ICANN 2011, Lecture Notes in ComputerScience, pages 52–59, Berlin, Heidelberg, 2011. Springer, doi:10.1007/978-3-642-21735-7_7.

[80] M. Caron, P. Bojanowski, A. Joulin, and M. Douze. Deep Clustering for Unsupervised Learningof Visual Features. In V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, editors, ComputerVision – ECCV 2018, Lecture Notes in Computer Science, pages 139–156, Cham, 2018. SpringerInternational Publishing, arXiv:1807.05520, doi:10.1007/978-3-030-01264-9_9.

[81] N. Jean, S. Wang, A. Samar, G. Azzari, D. Lobell, and S. Ermon. Tile2Vec: UnsupervisedRepresentation Learning for Spatially Distributed Data. In Proceedings of the AAAI Conferenceon Artificial Intelligence, volume 33, pages 3967–3974, July 2019, arXiv:1805.02855, doi:10.1609/aaai.v33i01.33013967. Number: 01.

[82] O. Fried, S. Avidan, and D. Cohen-Or. Patch2Vec: Globally Consistent Image Patch Representation.Computer Graphics Forum, 36(7):183–194, 2017, doi:10.1111/cgf.13284.

[83] P. Migdał and B. Olechno. Unsupervised segmentation - inspecting examples, interactively, W3CWorkshop on Web and Machine Learning, Aug. 2020, https://www.w3.org/2020/Talks/mlws/piotr_migdal/slides.html. accessed: 2021-07-15.

[84] Y. Zhou, Z. Zhu, X. Bai, D. Lischinski, D. Cohen-Or, and H. Huang. Non-stationary texture synthesisby adversarial expansion. ACM Trans. Graph., 37(4):49:1–49:13, July 2018, arXiv:1805.04487, doi:10.1145/3197517.3201285, https://vcc.tech/research/2018/TexSyn.

[85] A. Shocher, S. Bagon, P. Isola, and M. Irani. InGAN: Capturing and Remapping the ”DNA” ofa Natural Image. In The IEEE International Conference on Computer Vision (ICCV), Apr. 2019,arXiv:1812.00231, http://www.wisdom.weizmann.ac.il/~vision/ingan/.

[86] L. Liang, C. Liu, Y.-Q. Xu, B. Guo, and H.-Y. Shum. Real-time texture synthesis by patch-basedsampling. ACM Trans. Graph., 20(3):127–150, July 2001, doi:10.1145/501786.501787.

17

Page 18: Level generation and style enhancement — deep learning for

[87] M. Gumin. mxgmn/WaveFunctionCollapse, Sept. 2016, https://github.com/mxgmn/WaveFunctionCollapse. last modified: 2021-07-07.

[88] H. Kim, S. Lee, H. Lee, T. Hahn, and S. Kang. Automatic Generation of Game Content usinga Graph-based Wave Function Collapse Algorithm. In 2019 IEEE Conference on Games (CoG),pages 1–4, Aug. 2019, doi:10.1109/CIG.2019.8848019. ISSN: 2325-4289.

[89] I. Karth and A. M. Smith. WaveFunctionCollapse is constraint solving in the wild. In Proceedings ofthe 12th International Conference on the Foundations of Digital Games, FDG ’17, pages 1–10, NewYork, NY, USA, Aug. 2017. Association for Computing Machinery, doi:10.1145/3102071.3110566.

[90] L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition.Proceedings of the IEEE, 77(2):257–286, Feb. 1989, doi:10.1109/5.18626.

[91] S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735–1780,Nov. 1997, doi:10.1162/neco.1997.9.8.1735.

[92] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio.Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation.In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP), pages 1724–1734, Doha, Qatar, Oct. 2014. Association for Computational Linguistics,arXiv:1406.1078, doi:10.3115/v1/D14-1179.

[93] M. Chen, A. Radford, R. Child, J. Wu, H. Jun, D. Luan, and I. Sutskever. Generative PretrainingFrom Pixels. In International Conference on Machine Learning, pages 1691–1703. PMLR, Nov.2020, http://proceedings.mlr.press/v119/chen20s.html. ISSN: 2640-3498.

[94] A. Summerville and M. Mateas. Super Mario as a String: Platformer Level Generation Via LSTMs.In DiGRA/FDG ’16 - Proceedings of the First International Joint Conference of DiGRA and FDG,volume 13 of 1, Aug. 2016, arXiv:1603.00930.

[95] D. Kushner. Masters of Doom: How Two Guys Created an Empire and Transformed Pop Culture.Random House, May 2003, https://en.wikipedia.org/wiki/Masters_of_Doom.

18